Questions for large-ish scale deployment

Chris1 · November 8, 2024, 12:11pm

Hello everyone!

I’m considering using restic (& resticprofile) on a ~60TB share, backing up to S3, and I was wondering if anyone could shed some light on some good tuning parameters, and a couple of other questions. Even just to point me in the right direction, as testing on a volume this large isn’t that easy (it’ll take around 6 days to upload the volume in the first place, assuming almost perfect speed).

The share is made up of lots of individual folders (a media asset manager, so unideal structure). As it’s all media clips, I know that compression won’t do anything and is therefore a waste of time.

We can provision pretty much whatever resources are necessary (obviously within limits), I’ve so far just stuck 8 cores, 32GB of RAM, and a 32GB disk in the VM, but that could be either overkill or underpowered for this size, I’ve no idea.

Questions:

What sort of pack size should I use? They’re all somewhat chunky media clips, very very few small files. As I said, resources can be increased, but at a point we’re limited by the dedicated (sort of) 1Gb link to S3 for this.
The storage itself is fast, but it’s a gluster volume where each node has a 20Gb link. What sort of values should I try for read-concurrency?
Considering this is on S3, how often should I run the ‘check’ operation?
When I run the check command, how will this affect our S3 bill? Does it download & read files? Does it just check files exist (just API calls)? Am I going to get a $20,000 AWS bill next month, considering we have so many individual files?
Also pertaining to S3, how often should I run prune? I’ve read you need to do it fairly often, otherwise it’s just a bigger task next time, but equally… S3.
Are there any further S3 optimisations I can make? I suspect it’ll all boil down to pushing as fast as possible, since we aren’t compressing anything.
Also, whilst I’m here, what encryption does it use?

To be clear, I don’t need exact values from anyone, just some ballpark figures would be good. Like, packsize defaults to 16MiB, the guide says you could, for example, make it 64MiB, but what’s outlandish? Is 256MiB useless, or nothing for this workload?

Any help, answers, or pointers for those questions would be greatly appreciated.

Thanks everyone,
Chris

kapitainsky · November 8, 2024, 3:02pm

Have a read:

Also check some other posts from original author.

It should give you some good food for thought.

Chris1 · November 8, 2024, 3:34pm

Looks like a great resource, thanks

Any further responses re S3 in particular are very welcomed.

MichaelEischer · November 9, 2024, 11:33am

Just use the current maximum of 128MB

With the latest restic versions the rule of thumb is 1GB per 7 million files + 1GB per 7TB data.
That should be enough to run backup, check and prune. There’s currently one major exception: if you have millions of files in a single folder, then the memory usage will be higher.

That could be too small for the repository metadata cache, although without knowing more about the data set that’s hard to tell.

Check by default only downloads the repository metadata. That is the S3 operations by the check command are roughly a list operation for each folder in the repository and downloads of all metadata files (basically everything that also gets stored in the cache folder used by restic). As all blobs (except snapshots) are merged into packs files, this will only download a few thousand files even for a 60TB repo.

The actual file contents are only checked if you pass one of the --read-data* options to the check command. As the upload to S3 is integrity checked and S3 ensures that files don’t get corrupted during storage, the only thing that check --read-data-subset ... can find is when the pack files gets corrupted during the backup but before the upload. Restic versions >= 0.16.4 are pretty good at detecting such cases already before the upload. So checking a small fraction of the data set should be enough.

By now, restic is also fairly battle tested so that it’s sufficient to run check only every few weeks or once a month.

That entirely depends on how fast your dataset changes. If there’s barely any change to existing files over weeks or moths, then a prune every few months is also sufficient. prune is optimized to only download the pack file parts that are relevant for repacking. You can also set --max-unused unlimited to a higher value than the default of 5% to decrease downloads at the cost of additional storage overhead.

MichaelEischer · November 9, 2024, 11:34am

see References — restic 0.17.3 documentation