Giving up on multiple backups to same B2 repo, using one repo per backup


#1

I was backing up 16 small databases to the same B2 repo. Things start off well, with each backup taking a minute or less. I was careful to retain a cache.

Over a month though, as the number of snapshots increased backups slowed down. Then when two or more backups were running at the same time, they seems to interfere and slow each other down. This effect cascaded, until the previous 1-minute backups starting taking from 10-40 minutes.

Apart from running slow, backups started logging lots of lock errors like:
Load(<lock/f7eeb40b91>, 0, 0) returned error, retrying after 27.204164437s: b2_download_file_by_name: 404: bucket foo-backup does not have file: restic/databases/locks/f7eeb40b91fe8b164d9a39abfa5e35018cc5b1bbb06d3ed1fc2d56889693c8b3

Here are the times I started getting:

processed 1 files, 0 B in 10:14
processed 1 files, 0 B in 10:43
processed 1 files, 0 B in 12:15
processed 1 files, 0 B in 12:35
processed 1 files, 0 B in 13:49
processed 1 files, 0 B in 14:33
processed 1 files, 0 B in 15:55
processed 1 files, 0 B in 16:20
processed 1 files, 0 B in 16:33
processed 1 files, 0 B in 17:03
processed 1 files, 0 B in 17:41
processed 1 files, 0 B in 19:48
processed 1 files, 0 B in 20:08
processed 1 files, 0 B in 21:26
processed 1 files, 0 B in 21:27
processed 1 files, 0 B in 22:23
processed 1 files, 0 B in 23:46
processed 1 files, 0 B in 24:00
processed 1 files, 0 B in 26:41
processed 1 files, 0 B in 27:12
processed 1 files, 0 B in 27:17
processed 1 files, 0 B in 27:50
processed 1 files, 0 B in 28:08
processed 1 files, 0 B in 32:36
processed 1 files, 0 B in 34:31
processed 1 files, 0 B in 34:35
processed 1 files, 0 B in 35:11
processed 1 files, 0 B in 38:55
processed 1 files, 0 B in 39:11
processed 1 files, 0 B in 39:17
processed 1 files, 0 B in 39:55

Instead I have switched to create one repo per database. Greatly reducing the parallel actions. Now my backup times are back under 1 minute:

processed 1 files, 0 B in 0:15
processed 1 files, 0 B in 0:16
processed 1 files, 0 B in 0:17
processed 1 files, 0 B in 0:18
processed 1 files, 0 B in 0:19
processed 1 files, 0 B in 0:20
processed 1 files, 0 B in 0:20
processed 1 files, 0 B in 0:21
processed 1 files, 0 B in 0:21
processed 1 files, 0 B in 0:21
processed 1 files, 0 B in 0:21
processed 1 files, 0 B in 0:22
processed 1 files, 0 B in 0:22
processed 1 files, 0 B in 0:22
processed 1 files, 0 B in 0:25
processed 1 files, 0 B in 0:26
processed 1 files, 0 B in 0:27
processed 1 files, 0 B in 0:27
processed 1 files, 0 B in 0:28
processed 1 files, 0 B in 0:30

I’ll aim to forget and prune these often to keep the number of snapshots down and avoid the slow downs.

Q: When more that one backup was running, they were sharing the same repo cache. It that ok, or does each process needs it own private local cache?


#2

No, that should be mostly fine. I’m not aware of any issues here.

Do you still have the repo? I’d be very interested in a debug log of one of the backup runs which took a lot of time. Were there maybe many stale lock files left somehow?


#3

Yes, I still have the repository. I can run some stuff against if you like. The log I have from a run that took 40 mins instead of 1 minutes is at bottom. Initially back-ups were so quick that two backups seldom if ever ran at once, but as they took longer to complete, up to around six would be running at once.

It did have one stale lock, because I had to restic unlock it in order to run a restic prune on it and it remove one stale lock (which did look like it was there for a while).

I was going to test forget on it, but failed because all the paths were different, so nothing was forgotten. I realize now I needed --group-by tags. So this prune actually had nothing to remove. Though it looks like it consolidated 17854 indices into about 14 new indices? Is that right and does consolidation significantly help performance of other tasks?

The prune took almost 8 hours, but the majority of that time was removing old index files. That stage is something else that I think ought to be multi-threaded, since I assume you know all the names that need deleting and the order you delete them probably doesn’t matter?

I often try setting b2.connections but it is not clear which operations that actually impacts? Would it help with prunes?

$ time restic prune
repository 88b2b1b2 opened successfully, password is correct
counting files in repo
building new index for repo
[1:45:29] 100.00%  39377 / 39377 packs
repository contains 39377 packs (67853 blobs) with 54.278 GiB
processed 67853 blobs: 0 duplicate blobs, 0B duplicate
load all snapshots
find data that is still in use for 17854 snapshots
[4:03] 100.00%  17854 / 17854 snapshots
found 67853 of 67853 data blobs still in use, removing 0 blobs
will remove 0 invalid files
will delete 0 packs and rewrite 0 packs, this frees 0B
counting files in repo
[52:29] 100.00%  39377 / 39377 packs
finding old index files
saved new indexes as [a6fcda4b 785b9556 82079593 0476dd31 23124853 cd0a2e43 513b77f1 7ce9435c e9618549 dd7a34d0 d2ba118f f90a4bb8 fd2c591a 97705eb2]
remove 17854 old index files
done

real	461m19.283s
user	4m34.314s
sys	0m44.614s
2018-08-30 18:56:29+12:00: Backing up MySQL database 'foo'...
Load(<lock/f7eeb40b91>, 0, 0) returned error, retrying after 282.124449ms: b2_download_file_by_name: 404: bucket foo-backup does not have file: restic/databases/locks/f7eeb40b91fe8b164d9a39abfa5e35018cc5b1bbb06d3ed1fc2d56889693c8b3
Load(<lock/f7eeb40b91>, 0, 0) returned error, retrying after 921.950869ms: b2_download_file_by_name: 404: bucket foo-backup does not have file: restic/databases/locks/f7eeb40b91fe8b164d9a39abfa5e35018cc5b1bbb06d3ed1fc2d56889693c8b3
Load(<lock/f7eeb40b91>, 0, 0) returned error, retrying after 1.103703713s: b2_download_file_by_name: 404: bucket foo-backup does not have file: restic/databases/locks/f7eeb40b91fe8b164d9a39abfa5e35018cc5b1bbb06d3ed1fc2d56889693c8b3
Load(<lock/f7eeb40b91>, 0, 0) returned error, retrying after 987.812036ms: b2_download_file_by_name: 404: bucket foo-backup does not have file: restic/databases/locks/f7eeb40b91fe8b164d9a39abfa5e35018cc5b1bbb06d3ed1fc2d56889693c8b3
Load(<lock/f7eeb40b91>, 0, 0) returned error, retrying after 1.368557177s: b2_download_file_by_name: 404: bucket foo-backup does not have file: restic/databases/locks/f7eeb40b91fe8b164d9a39abfa5e35018cc5b1bbb06d3ed1fc2d56889693c8b3
Load(<lock/f7eeb40b91>, 0, 0) returned error, retrying after 5.506539331s: b2_download_file_by_name: 404: bucket foo-backup does not have file: restic/databases/locks/f7eeb40b91fe8b164d9a39abfa5e35018cc5b1bbb06d3ed1fc2d56889693c8b3
Load(<lock/f7eeb40b91>, 0, 0) returned error, retrying after 5.404895106s: b2_download_file_by_name: 404: bucket foo-backup does not have file: restic/databases/locks/f7eeb40b91fe8b164d9a39abfa5e35018cc5b1bbb06d3ed1fc2d56889693c8b3
Load(<lock/f7eeb40b91>, 0, 0) returned error, retrying after 4.799484004s: b2_download_file_by_name: 404: bucket foo-backup does not have file: restic/databases/locks/f7eeb40b91fe8b164d9a39abfa5e35018cc5b1bbb06d3ed1fc2d56889693c8b3
Load(<lock/f7eeb40b91>, 0, 0) returned error, retrying after 13.589122457s: b2_download_file_by_name: 404: bucket foo-backup does not have file: restic/databases/locks/f7eeb40b91fe8b164d9a39abfa5e35018cc5b1bbb06d3ed1fc2d56889693c8b3
Load(<lock/f7eeb40b91>, 0, 0) returned error, retrying after 9.765553474s: b2_download_file_by_name: 404: bucket foo-backup does not have file: restic/databases/locks/f7eeb40b91fe8b164d9a39abfa5e35018cc5b1bbb06d3ed1fc2d56889693c8b3
Load(<index/54c8ede2cd>, 0, 0) returned error, retrying after 453.301987ms: Create: open /var/cache/restic/restic/88b2b1b29d8ee557e09ce3af66910944e8956ffe6163a745465cb156b503d00a/index/54/54c8ede2cd9d1b38c2238582f95f1276a9c40a2cb41207f933d575c435d9d978: file exists
Load(<index/54c8ede2cd>, 0, 0) returned error, retrying after 601.704321ms: Create: open /var/cache/restic/restic/88b2b1b29d8ee557e09ce3af66910944e8956ffe6163a745465cb156b503d00a/index/54/54c8ede2cd9d1b38c2238582f95f1276a9c40a2cb41207f933d575c435d9d978: file exists
Load(<index/54c8ede2cd>, 0, 0) returned error, retrying after 992.190357ms: Create: open /var/cache/restic/restic/88b2b1b29d8ee557e09ce3af66910944e8956ffe6163a745465cb156b503d00a/index/54/54c8ede2cd9d1b38c2238582f95f1276a9c40a2cb41207f933d575c435d9d978: file exists
Load(<index/54c8ede2cd>, 0, 0) returned error, retrying after 2.514316474s: Create: open /var/cache/restic/restic/88b2b1b29d8ee557e09ce3af66910944e8956ffe6163a745465cb156b503d00a/index/54/54c8ede2cd9d1b38c2238582f95f1276a9c40a2cb41207f933d575c435d9d978: file exists
Load(<index/54c8ede2cd>, 0, 0) returned error, retrying after 2.744409146s: Create: open /var/cache/restic/restic/88b2b1b29d8ee557e09ce3af66910944e8956ffe6163a745465cb156b503d00a/index/54/54c8ede2cd9d1b38c2238582f95f1276a9c40a2cb41207f933d575c435d9d978: file exists
Load(<index/54c8ede2cd>, 0, 0) returned error, retrying after 4.187550861s: Create: open /var/cache/restic/restic/88b2b1b29d8ee557e09ce3af66910944e8956ffe6163a745465cb156b503d00a/index/54/54c8ede2cd9d1b38c2238582f95f1276a9c40a2cb41207f933d575c435d9d978: file exists
Load(<index/54c8ede2cd>, 0, 0) returned error, retrying after 6.196049677s: Create: open /var/cache/restic/restic/88b2b1b29d8ee557e09ce3af66910944e8956ffe6163a745465cb156b503d00a/index/54/54c8ede2cd9d1b38c2238582f95f1276a9c40a2cb41207f933d575c435d9d978: file exists
Load(<index/54c8ede2cd>, 0, 0) returned error, retrying after 10.743109693s: Create: open /var/cache/restic/restic/88b2b1b29d8ee557e09ce3af66910944e8956ffe6163a745465cb156b503d00a/index/54/54c8ede2cd9d1b38c2238582f95f1276a9c40a2cb41207f933d575c435d9d978: file exists
Load(<index/54c8ede2cd>, 0, 0) returned error, retrying after 11.605431782s: Create: open /var/cache/restic/restic/88b2b1b29d8ee557e09ce3af66910944e8956ffe6163a745465cb156b503d00a/index/54/54c8ede2cd9d1b38c2238582f95f1276a9c40a2cb41207f933d575c435d9d978: file exists
Load(<index/54c8ede2cd>, 0, 0) returned error, retrying after 18.87147923s: Create: open /var/cache/restic/restic/88b2b1b29d8ee557e09ce3af66910944e8956ffe6163a745465cb156b503d00a/index/54/54c8ede2cd9d1b38c2238582f95f1276a9c40a2cb41207f933d575c435d9d978: file exists

Files:           1 new,     0 changed,     0 unmodified
Dirs:            0 new,     0 changed,     0 unmodified
Added:      11.599 MiB

processed 1 files, 0 B in 39:17
snapshot 75e9b71b saved

#4

So far using a separate repository per backup to avoid multiple backups running at once is paying off. Backup times for these databases to B2 are consistently under 30 seconds, versus the 10-40 minutes I was getting before when multiple backups could be backing up to the same repository.

I’ll keep on eye on it as the repositories accumulate more snapshots.

processed 1 files, 0 B in 0:27
processed 1 files, 0 B in 0:21
processed 1 files, 0 B in 0:23
processed 1 files, 0 B in 0:32
processed 1 files, 0 B in 0:32
processed 1 files, 0 B in 0:27
processed 1 files, 0 B in 0:20
processed 1 files, 0 B in 0:24
processed 1 files, 0 B in 0:24
processed 1 files, 0 B in 0:27
processed 1 files, 0 B in 0:28
processed 1 files, 0 B in 0:32
processed 1 files, 0 B in 0:23
processed 1 files, 0 B in 0:19
processed 1 files, 0 B in 0:21
processed 1 files, 0 B in 0:16
processed 1 files, 0 B in 0:14
processed 1 files, 0 B in 0:22
processed 1 files, 0 B in 0:22
processed 1 files, 0 B in 0:22
processed 1 files, 0 B in 0:21
processed 1 files, 0 B in 0:20
processed 1 files, 0 B in 0:20
processed 1 files, 0 B in 0:20
processed 1 files, 0 B in 0:23
processed 1 files, 0 B in 0:28
processed 1 files, 0 B in 0:32
processed 1 files, 0 B in 0:24
processed 1 files, 0 B in 0:23
processed 1 files, 0 B in 0:24
processed 1 files, 0 B in 0:28
processed 1 files, 0 B in 0:26
processed 1 files, 0 B in 0:28
processed 1 files, 0 B in 0:26
processed 1 files, 0 B in 0:28
processed 1 files, 0 B in 0:22
processed 1 files, 0 B in 0:21
processed 1 files, 0 B in 0:21
processed 1 files, 0 B in 0:24
processed 1 files, 0 B in 0:20
processed 1 files, 0 B in 0:21
processed 1 files, 0 B in 0:23
processed 1 files, 0 B in 0:26
processed 1 files, 0 B in 0:19
processed 1 files, 0 B in 0:23
processed 1 files, 0 B in 0:21
processed 1 files, 0 B in 0:22
processed 1 files, 0 B in 0:23
processed 1 files, 0 B in 0:28

#5

Hey,

I am in the process of setting up my backup system and am hesitating between going for one repo / multiple folders VS multiple repos.

I have the following data to backup:

  • frequently updated files needing fast access (stored on a SSD) ~300gb
  • infrequently updated files (mostly photos/videos) that don’t need fast access (stored on a HDD) ~400gb
    I plan to run the backup on the infrequently updated files less often in order to put less stress on my small home server.

@fd0 is there really a problem with the single repo solution or the slow down only comes from not using the --group-by tags?

@whereisaaron everything still working smoothly so far for you?


#6

Hi @trompx after six months I can confirm that for me, have lots of repositories for different frequent backups works well.

I did think the problem was the number of snapshots, but now these many repositories all have thousands of snapshots each with little slowdowns. Maybe an extra 30s per backup for repos with thousands (rather than hundreds) of snapshots. I think the problem was probably multiple, overlapping back-ups going to the same repository. Perhaps made worse on high-latency storage like B2 or S3.

But not proven either way as my original problem repo had >15k snapshots with multiple frequent backups. The new frequent backups with their own repo have only >3k snapshots. So I have not really replicated the same conditions.

processed 1 files, 0 B in 0:20
processed 1 files, 0 B in 0:27
processed 1 files, 0 B in 0:18
processed 1 files, 0 B in 0:14
processed 1 files, 0 B in 0:14
processed 1 files, 0 B in 1:13
processed 1 files, 0 B in 1:12
processed 1 files, 0 B in 1:21
processed 1 files, 0 B in 0:34
processed 1 files, 0 B in 0:30
processed 1 files, 0 B in 0:31
processed 1 files, 0 B in 1:14
processed 1 files, 0 B in 1:03
processed 1 files, 0 B in 1:04
processed 1 files, 0 B in 0:35
processed 1 files, 0 B in 0:42
processed 1 files, 0 B in 0:40
processed 1 files, 0 B in 1:17
processed 1 files, 0 B in 1:03
processed 1 files, 0 B in 1:03
processed 1 files, 0 B in 0:40
processed 1 files, 0 B in 0:30
processed 1 files, 0 B in 1:12
processed 1 files, 0 B in 1:07
processed 1 files, 0 B in 1:05
processed 1 files, 0 B in 1:04
processed 1 files, 0 B in 0:20
processed 1 files, 0 B in 0:24
processed 1 files, 0 B in 0:25
processed 1 files, 0 B in 1:01
processed 1 files, 0 B in 1:07
processed 1 files, 0 B in 1:01
processed 1 files, 0 B in 0:23
processed 1 files, 0 B in 0:29
processed 1 files, 0 B in 0:19
processed 1 files, 0 B in 1:11
processed 1 files, 0 B in 1:01
processed 1 files, 0 B in 1:00
processed 1 files, 0 B in 0:30
processed 1 files, 0 B in 0:32
processed 1 files, 0 B in 0:33
processed 1 files, 0 B in 1:04
processed 1 files, 0 B in 1:01
processed 1 files, 0 B in 1:04
processed 1 files, 0 B in 0:24
processed 1 files, 0 B in 0:24
processed 1 files, 0 B in 0:38
processed 1 files, 0 B in 1:10
processed 1 files, 0 B in 1:08
processed 1 files, 0 B in 1:30
processed 1 files, 0 B in 1:13
processed 1 files, 0 B in 1:10
processed 1 files, 0 B in 1:11
processed 1 files, 0 B in 1:07
processed 1 files, 0 B in 1:04
processed 1 files, 0 B in 1:15
processed 1 files, 0 B in 1:11
processed 1 files, 0 B in 1:09
processed 1 files, 0 B in 1:04
processed 1 files, 0 B in 1:04
processed 1 files, 0 B in 0:31
processed 1 files, 0 B in 0:25
processed 1 files, 0 B in 0:24