Organization of backups for large amount of servers

davidheijkamp · April 19, 2018, 9:54am

Hi there,

At my work at the Dutch natural history institute (Naturalis) we’re looking at candidates to replace our current backup solution. We’re looking at backup solutions that are open source and work well with both private and public cloud storage backends. Restic checks most of our boxes and is our primary candidate.

At the moment we’re determining what would be the best way to organize our backups with restic. We have quite a few number of servers (300+) that host data for which we need to make backups. Some of those servers are fileservers with fileshares containing large numbers of files.

We’re looking for an organization of our backups that:

Is optimized for overall deduplication.
Reduces the risk of data corruption (a huge number of backup processes in one repo might increase the risk of locking issues).
Makes it easy to move a file share from one file server to another, while keeping the relation to the existing backup set.

Roughly we see two scenarios:

We use a small amount of repositories and use those for a big number of backups (provided with appropriate tags).
We use a big amount of repositories and use those for a relatively small number of backups.

I can imagine there aren’t any strict rules, but I’m interested in any of your experiences and tips.

Thanks!

matt · April 19, 2018, 2:42pm

Use a single, large repository then: a repository is the boundary for archive de-duplication.

Multiple backups can happen at a time; but you will want to run restic prune once in a while to remove unneeded data. The prune operation requires an exclusive lock, but doesn’t need to happen as often.

This one I’m not 100% sure about – someone else would have to confirm.

Hope that helps a little bit though!

davidheijkamp · April 19, 2018, 3:10pm

Thanks, that was indeed my impression based on the docs.

Based on the docs I knew it was possible to do multiple backups at the same time. But do you, or others have experience with really large numbers (let’s say 300+) of concurrent backups? Do people run into integrity issues on this scale or is the general experience that (also on this scale) restic is rock solid?

For sure, thanks a lot!

fd0 · April 20, 2018, 11:44am

Hey David, thanks a lot for considering restic! It feels great to know that large institutions are considering restic, which started as my little side project!

I don’t know of anybody who tried this, but Matt is right: It should not cause problems during backup, just maybe when running prune or check.

I’m confident that you will not run into any integrity issues. You may end up with duplicate data in the repo, but that will be cleaned up when restic prune is run.

Please report back how it goes! I’m very interested in your results, even if you decide to use some other backup solution out there!

davidheijkamp · April 24, 2018, 7:13am

Nice, and we’re happy to spread the word.

Ok, that’s good to hear. I think we will start with a minimum set of repositories (for example one for home folders, one for group folders and one for web servers). We will use config management tools to automate the backups (looking at this Ansible role and this Kubernetes operator) and will make sure those properly handle / report possible errors.

We definitely will! I’ll post about our experiences here at the forum.

moritzdietz · August 18, 2019, 6:41pm

It’s been quiet some time, but I was wondering if you actually got to implement restic as a solution within your organization. I bet there are a couple people here that would love to hear if you have any feedback