Very slow restic prune

sumarnaz · March 8, 2022, 1:45pm

Hello.
We are using restic to backup a very large amount of data on a remote minio cluster using s3 protocol
We currently have:
134 TB of data spread over 26,150 Buckets.

The problem comes when we try to purge our repositories as t takes a huge amount of time to perform this operation.
To make an example I attach a screenshot of a running prune operation on a bucket:

restic_purge

As you can see, this prune operation is going to take about 8 hours.

We are already using the last restic version:

# restic version
restic 0.12.1 compiled with go1.16.12 on linux/amd64
# rpm -qa | grep restic
restic-0.12.1-3.el8.x86_64or paste code here

From our analysis, the bottleneck is not the network.

Can you help us to understand which could be the reason of such behavior? Or is it somehow expected according to how restic works?

Thanks a lot
Riccardo

gurkan · March 9, 2022, 1:17pm

Hi

That’s more than what I am dealing with, I guess I have ~100TB with ~200 buckets max. But I just wanted to suggest tuning prune command if you’re not doing it already. E.g. I am using --max-unused 20% --max-repack-size 50G to gain some speed (of course by wasting a bit of disk space).
IMHO default values are a bit too strict for this scale, but totally understandable

alexweiss · March 9, 2022, 7:03pm

In your example, restic takes ~7 hours to remove a lot of unreferenced pack files.
Usually no unreferenced pack files should be present.

You should ask yourself where these pack files come from. Did you abort some prune runs? Or do you have lots of aborted backup runs?

If this is (for whatever reason) a typical state, a next step would be to speed up deleting in your backend. Currently the file removal rate is about 0.2s per file-to-remove, and this is already parallelized in restic. Maybe you can improve the deletion rate in minio?

Can you also give the full output of this prune run? I’m quite curious how long the repacking and the index creation was…

MichaelEischer · March 10, 2022, 8:37pm

There’s something seriously wrong with the backend performance. Taking nearly an hour to simply list 23671 files is far too slow (or actually 23671+106388, but that’s still far too slow). For that step there’s nearly no computation required by the prune command, so it’s only limited by the backend. How long does restic list packs take?

How high are the ping times between the host running restic and the minio cluster?

[Edit]Adding -o s3.connections=8 to the restic command line should provide a bit of speed-up (like 60% faster for the delete step). Besides that this seems to related to Uncap delete worker concurrency for dramatic prune speed-ups [0.12.1] · Issue #3632 · restic/restic · GitHub [/Edit]

sumarnaz · March 12, 2022, 11:23am

Thank you guys for your replies.
I will try to have a look at the state of our back-end storage since I understand that the issue is probably there,
@MichaelEischer: I didn’t understand whether that option is already embedded in the last restic version or I shoud specify it as a command option.

Thanks
Riccardo

MichaelEischer · March 12, 2022, 2:41pm

My suggestion was to call restic prune -o s3.connections=8 instead of restic prune. That should speed-up things slightly.

sumarnaz · March 15, 2022, 9:13am

Hello.
With the suggested option it seems that the prune operation is slightly faster.
Unfortunately now I’m having in a number of repos the following error:

The index references 1931 needed pack files which are missing from the repository:
....... a long list of hashes
.......
Fatal: packs from index missing in repo

what does it mean?
Thanks
Riccardo