Strategy for 15TB NAS and Win 10 desktop

pau · November 30, 2019, 7:19pm

Hi,
The old adage of “Backup and restore are two different projects. Don’t confuse them.” applies here, I think. Also: “No one cares about backups, people care about being able to restore the data.”

Currently restic has some issues with restore speed (see `restic restore` speeds asymptotically dropping towards zero! and https://github.com/restic/restic/pull/2195). Your data set is bigger than the one of the poster in that thread, so my expectation is you are going to hit the same issue with data restore speed, or lack thereof.
Edit 2: This restore speed degradation seems to affect large files only (200 MiB files definitely affected), so this may or may not apply to your case.
Another issue with large remote repositories is the time it takes to prune the repository: https://github.com/restic/restic/issues/2162 Note, that prune takes an exclusive lock on repository, so you can’t do new backups while prune is running.
Unless you are OK with compiling by yourself and running in production a patched restic, I think the driving criterion for the repository setup should be the restore time.

Assuming you want to run the official binaries, my recommendation would be to set up several repositories for specific data subsets/NAS paths, aiming for 3-5 TiB of data per repository (the number picked by the highly scientific method of staring at the ceiling; however, I didn’t see people with repository sizes of a few TiB complaining about restore speeds). That way if you need to restore the data, you should be able to do it in a reasonable time.

The downsides are:
o) Maintenance of multiple repositories.
o) Manually picking which data sets should be put in which repository.
o) Less benefit from deduplication.
o) No guarantee that there will be a way to merge the repos once the scalability issues are resolved.

The upside is:
o) You should be to actually restore the data in a reasonable time.
o) You should be able to prune the data in a reasonable time (Edit 2).
o) restic will need less memory to run the restore, reducing the chance of you running OOM during the restore process.

Edit: If you are OK with running a patched restic in a production environment, my recommendation would be to set up a single repository. You get least maintenance overhead and maximum deduplication benefits. restic + PR2195 does fast restores and uses less memory, so it is a usable set up, that costs you least (maximum deduplication => lowest storage cost) and is simplest to use, because you don’t have to figure out in which repository is the file that you need to restore.

Edit 2:
Personally I run 2x ~1 TiB repositories, but they are on a local server running rest-server and I only replicate the repositories to B2.
You may take a look here: https://kdecherf.com/blog/2018/12/28/restic-and-backblaze-b2-short-cost-analysis-of-prune-operations/#fn-1
Depending on your backup schedule, snapshot sizes, prune policy, and the time you plan to run the setup, it may make sense to set up a small server dedicated for restic and only replicate data to B2. That way you should be able to side-step the repository prune time and avoid B2 data download and class B transactions billing.