Prune incredible delays

Hi,

I think I have an extreme case of prune on restic 0.9.2.
Here is what I’m getting after a few forgets…

restic -r s3:https://us-east-1@s3.wasabisys.com/myrepo prune --cleanup-cache
repository xxxxxxxx opened successfully, password is correct
counting files in repo
building new index for repo
[2:42:50] 100.00%  181230 / 181230 packs
repository contains 181230 packs (4316033 blobs) with 791.943 GiB
processed 4316033 blobs: 70413 duplicate blobs, 15.426 GiB duplicate
load all snapshots
find data that is still in use for 3155 snapshots
[28:52] 100.00%  3155 / 3155 snapshots
found 4100014 of 4316033 data blobs still in use, removing 216019 blobs
will remove 0 invalid files
will delete 8524 packs and rewrite 14864 packs, this frees 58.097 GiB
[29:48:33] 17.81%  2648 / 14864 packs rewritten

and still running…

If I ask what is in the cache I get

$ du -d 1 -h ~/Library/Caches/restic/xxxxx/
7,0G	/Users/vartkat/Library/Caches/restic/xxxxx//data
513M	/Users/vartkat/Library/Caches/restic/xxxxx//index
 12M	/Users/vartkat/Library/Caches/restic/xxxxx//snapshots
7,5G	/Users/vartkat/Library/Caches/restic/xxxxx/

Did I do something wrong or is it normal to get such delays ?

According to these numbers there is still
137 hours, means 5 days and 17 hours to go.

What is restic doing ? Is it downloading everything and re-uploading it ?
Why such a delay, can you detail the operation ?

The prune process still needs to be optimized. There’s a PR in the making (#1994) which starts this process. For now, each file in the repo that contains only a tiny amount of unreferenced data is accessed, downloaded, and the still used data is the re-uploaded. This is done sequentially, and for remote backends with a (relative) high latency compared to e.g. local storage, it’ll take some time. We’ll get there, eventually.

Thanks for your answer but… but…

means you compare data in repo to the index.

means you compare data in repo to data on the client, no ?

If so, as my repo contains 3 hosts, and prune is done from only one of these hosts, how do you know it’s still in use (as you can’t compare to the two other hosts) ?

The code in prune only operates on data in the repository, it does not need any local data. For each snapshot, restic builds a list of all referenced (=still needed) data blobs in the repo by loading all the metadata. This includes all hosts which save their backups into that particular repository.

I hope this helps you to understand what’s going on :slight_smile:

Am I well understanding ? Does this means my prunes will allways take this much time ? Whatever is forget ?

The runtime for prune mostly depends on your latency to the server, how much data is stored in the repository overall, and how much unreferenced data there is.

Unless a large number of files in your dataset change often, I don’t think pruning is really needed that often and in that case, you may be able to prune from a VM closer to your backup site manually.

For example, if you are backing up to Wasabi (us-east-1 is Ashburn) from Europe, likely you have a RTT exceeding 100ms, but when you prune from a DigitalOcean droplet in New York, your latency is likely below 10ms. And spawning that droplet for a few hours probably just costs a few cents.

Of course this is inconvenient, not ideal anyway, requires manually work and additional costs, but until prune is optimized there won’t be much you can do for this high latency destinations.