Prune time has grown 28x overnight


#1

Hi all, I’ve been using Restic (version 0.9.2) pretty successfully for about a month now, backing up 40 different linux servers to the same repository. Performance is decent, I guess. It takes about 4.5 hours to backup all servers (they run sequentially, one at a time, which I have Ansible coordinating), 2 hours to do a forget and prune, and about 5-10 minutes to synchronize the local repo to a bucket in Backblaze B2 storage… about 7 hours total.

The repository is about 1.6 TB, and contains around 450 snapshots on average.

The problem is… I just came in this morning and found the backups hadn’t completed. It was still working on the prune. I cancelled it and found that the step of “find data that is still in use for 443 snapshots” had taken 5 hours and 45 minutes!! That step usually only takes 12 minutes.

As far as I can tell, nothing really is all that different. I ran a check on the repo, everything was fine (it took about 6 hours). I then went and manually ran a prune, and it’s currently at:
find data that is still in use for 443 snapshots
[1:40:31] 4.29% 19 / 443 snapshots

So there’s something wrong here, but I’m not sure what. Here’s a previous days successful prune:

62 snapshots have been removed, running prune
counting files in repo
building new index for repo
[44:51] 100.00%  340372 / 340372 packs

repository contains 340372 packs (5944829 blobs) with 1.602 TiB
processed 5944829 blobs: 0 duplicate blobs, 0B duplicate
load all snapshots
find data that is still in use for 419 snapshots
[11:46] 100.00%  419 / 419 snapshots

found 5926940 of 5944829 data blobs still in use, removing 17889 blobs
will remove 0 invalid files
will delete 1359 packs and rewrite 731 packs, this frees 8.306 GiB
[2:33] 100.00%  731 / 731 packs rewritten

counting files in repo
[45:27] 100.00%  338643 / 338643 packs

finding old index files
saved new indexes as [eee00c90 b278f277 25933f37 e7f30cf7 174f2bfc 4f8d0c08 b25da2ac a935d930 30d3ce85 9f985afe 53f918ed 732acf28 924c0b47 ae20efaa 7cf8aeb8 931702f5 131b46ef 0e61a9be 19953183 95a3b330 f4eb4122 a8d82119 fe796f54 bc267c54 aa7037cb 133af857 34a5715a c28139c1 0aeddfa3 93fd715a 7f60d0c2 3a7dda5b 87b126ff d906e6db e8a701de e5fd2081 bd9ae11f 779a0086 2168924a 98ac8bb3 92035389 931a2511 6974a49a 66109474 4297a364 929ca3e5 79db7375 68102137 8ef9afc2 7e4665ae 690d6724 dcf291c0 2203f887 9f34f576 5bb50341 73713479 b9362b5b 7afaa23b 8d651c05 a12db283 fbb501ea 20d9ac80 ee38dad1 03fbfcd3 90c8d098 7992ea14 60bf0ac9 9fec90e5 a91e653a 367a75f5 e9656f1b c81c1703 3a66f70a a580c032 0bdbc213 5168a93e 4d3f22d7 9b67faa0 9bfd1d2d e076f0c1 f78dede9 0b3634d4 a2f63c61 4e32dfec 4aea65e2 96f41cce 65cb962b 274a2b6d b92642b1 fd1707b1 02dfcd00 6073df32 736c6fa6 984ce0ac 588595a5 7b1ee288 907adfb2 c428157e 43ca05a6 375773be bd589978 2a2de899 9e2ec505 b5ab01db 15f2b654 242004f9 d301a8b2 ad9bbba8 9bd14314 fe6b94c8 3d7110f8 bc9c5be2 001b7c2c]
remove 158 old index files
[0:03] 100.00%  2090 / 2090 packs deleted

done

And then here’s the prune operation from last night that I cancelled:

19 snapshots have been removed, running prune
counting files in repo
building new index for repo
[53:58] 100.00%  339707 / 339707 packs

repository contains 339707 packs (5936757 blobs) with 1.599 TiB
processed 5936757 blobs: 0 duplicate blobs, 0B duplicate
load all snapshots
find data that is still in use for 443 snapshots
[5:45:49] 100.00%  443 / 443 snapshots

found 5934903 of 5936757 data blobs still in use, removing 1854 blobs
will remove 0 invalid files
will delete 205 packs and rewrite 111 packs, this frees 1.128 GiB

Number of blobs and packs and GB’s of data etc etc seem very similar between nights, so I’m not sure what has caused this HUGE increase in time for the prune operation. Any ideas?


#2

Huh, thanks for the report! That stage should be pretty fast, I don’t really know what’s going on, but you may have run into a bug with the local cache: https://github.com/restic/restic/pull/2019

This issue is resolved with restic 0.9.3, can you please retry and see if it still happens?


#3

Hey there, updated to 0.9.3, and it did indeed finish in a normal amount of time, and was just a bit quicker. I’ll keep an eye on it for the next couple of nights and see if the problem crops back up. Thanks!


#4

Nice, please let me know if it happens again!