Hello.
We are using restic to backup a very large amount of data on a remote minio cluster using s3 protocol
We currently have:
134 TB of data spread over 26,150 Buckets.
The problem comes when we try to purge our repositories as t takes a huge amount of time to perform this operation.
To make an example I attach a screenshot of a running prune operation on a bucket:
As you can see, this prune operation is going to take about 8 hours.
We are already using the last restic version:
# restic version
restic 0.12.1 compiled with go1.16.12 on linux/amd64
# rpm -qa | grep restic
restic-0.12.1-3.el8.x86_64or paste code here
From our analysis, the bottleneck is not the network.
Can you help us to understand which could be the reason of such behavior? Or is it somehow expected according to how restic works?
That’s more than what I am dealing with, I guess I have ~100TB with ~200 buckets max. But I just wanted to suggest tuning prune command if you’re not doing it already. E.g. I am using --max-unused 20% --max-repack-size 50G to gain some speed (of course by wasting a bit of disk space).
IMHO default values are a bit too strict for this scale, but totally understandable
In your example, restic takes ~7 hours to remove a lot of unreferenced pack files.
Usually no unreferenced pack files should be present.
You should ask yourself where these pack files come from. Did you abort some prune runs? Or do you have lots of aborted backup runs?
If this is (for whatever reason) a typical state, a next step would be to speed up deleting in your backend. Currently the file removal rate is about 0.2s per file-to-remove, and this is already parallelized in restic. Maybe you can improve the deletion rate in minio?
Can you also give the full output of this prune run? I’m quite curious how long the repacking and the index creation was…
There’s something seriously wrong with the backend performance. Taking nearly an hour to simply list 23671 files is far too slow (or actually 23671+106388, but that’s still far too slow). For that step there’s nearly no computation required by the prune command, so it’s only limited by the backend. How long does restic list packs take?
How high are the ping times between the host running restic and the minio cluster?
Thank you guys for your replies.
I will try to have a look at the state of our back-end storage since I understand that the issue is probably there, @MichaelEischer: I didn’t understand whether that option is already embedded in the last restic version or I shoud specify it as a command option.
Hello.
With the suggested option it seems that the prune operation is slightly faster.
Unfortunately now I’m having in a number of repos the following error:
The index references 1931 needed pack files which are missing from the repository:
....... a long list of hashes
.......
Fatal: packs from index missing in repo