Backing up many servers into one repository

samuryan89 · October 2, 2018, 5:32pm

Hi there, I’m a sys admin at a non-profit organization, and came across Restic while looking for replacements to our current Bacula system. My initial testing of Restic went well, deduplication was great, backups were speedy. After initial testing (with promising results), I had to wipe out our current Bacula system and dive into Restic (lack of funding means lack of space to test things out on separate servers).

So, I’m adding more and more things to the repository (I’m now backing up about 37 servers, amounting to a 635GB repo, although the servers themselves amount to much more than that), but things are starting to choke. Some of our smaller servers with 4GB of RAM are doing an incredible amount of swapping, and because of it, previously speedy backups (in the realm of a couple minutes) can take an hour or more to complete.

What worries me is that I still need to start backing up a couple more servers, and our file server, which amounts to about 1 TB of data. I’m worried that adding that extra data to the repository will start causing more and problems. Any ideas on what I should do to remedy the situation? Do I need to just start from scratch and create separate repositories for different clusters of servers to keep the repository size down? The deduplication is working wonders on our environment, so I would hate to reduce the effectiveness, but this doesn’t seem to scale well, and I’m entirely unsure what will happen once I get the file server data into the repo.

NovacomExperts · October 2, 2018, 9:35pm

@samuryan89

My main Restic repo machine is a Ubuntu 16.04 server with around 20 differents accounts (for data isolation) chrooted between themselves. SSH is disabled after initial setup for those account and only sftp is permitted via RSA key. The whole data is around 5.2 TB on two 8TB 7200 rpm drives on BTRFS. This machine is synced to another clone server via rsync each morning. The whole thing is also snapshotted via BTRFS. This has been very effective for fast pruning, fast restore, and failsafe datastorage.

I backup Linux servers, Windows servers and some Windows 10 machines too. Backups are taking around 2 to 15 mins. for each endpoints, and one repo is 1.3 TB large, (3 windows 2016 servers + databases)

The dedup is great but remember that it is a security /confidentiality issue. One machine being backup up has access to all data in the repo…You would not want the guy in the shipping figure out access the CEO document folders via Restic repo. That’s why I had to do some chrooting. Losing Dedup is ok if confidential stuff is kept isolated.

Sadly I don’t have many “host” in a repo like you do, but I do have very large repos in size. They work blazing fast with this kind of local/physical setup. In my testing,I did not found that remote repos like Digital Ocean spaces were very efficient. Too many timeouts and error. I would do a final rclone of a local each day just to sync data out, but no remote restic check or restic prune.

vinicius · October 3, 2018, 12:45am

samuryan89:

Hi there, I’m a sys admin at a non-profit organization, and came across Restic while looking for replacements to our current Bacula system. My initial testing of Restic went well, deduplication was great, backups were speedy. After initial testing (with promising results), I had to wipe out our current Bacula system and dive into Restic (lack of funding means lack of space to test things out on separate servers).

So, I’m adding more and more things to the repository (I’m now backing up about 37 servers, amounting to a 635GB repo, although the servers themselves amount to much more than that), but things are starting to choke. Some of our smaller servers with 4GB of RAM are doing an incredible amount of swapping, and because of it, previously speedy backups (in the realm of a couple minutes) can take an hour or more to complete.

What worries me is that I still need to start backing up a couple more servers, and our file server, which amounts to about 1 TB of data. I’m worried that adding that extra data to the repository will start causing more and problems. Any ideas on what I should do to remedy the situation? Do I need to just start from scratch and create separate repositories for different clusters of servers to keep the repository size down? The deduplication is working wonders on our environment, so I would hate to reduce the effectiveness, but this doesn’t seem to scale well, and I’m entirely unsure what will happen once I get the file server data into the repo.

Hello, I had the same need as you and from what I’ve seen in other posts and in the documentation, it is possible that several servers simultaneously back up to the same repository, however, if it is not possible for one server to wait for the other to complete the backup to start your own backup, that is, if it is really necessary that they back up simultaneously to the same repository, it may be that for you to take better advantage of deduplication you need to use the “check” and the “prune” scheduled from time to time (weekly I believe) so that there is a greater deduplication, because during deduplication the restic analyzes the data already saved in the backup, and if there are servers copying simultaneously there is a probability that the same data will go to the same repository, but that is where the prune comes into play.
Do not forget that the prune is time consuming and it does a lock in the repository, that is, while the prune is running no server will be able to back up to this repository.
For security access isolation I recommend that you use the “key” feature to create an access key (to the same repository) for each server, because if security is compromised, you will be able to block the access of that server by deactivating your key.

Dj0k3 · October 3, 2018, 5:49am

Check out these other topics to see if you can find some answers there.

Edit:
Also, for total isolation I don’t know if the best approach is to use one repository for everything because you can create keys but every key will still have access to all files in the repo unless you remove them.

fd0 · October 3, 2018, 8:28am

That’s correct, at least for now, having access to a repo (a password and the means to access the files) allows accessing all data stored in the repo, even if it was made by other hosts. That’s important to keep in mind.

This is a common problem with Digital Ocean, it works much better with S3 or B2.

fd0 · October 3, 2018, 8:29am

That’s sadly a known limitation restic has at the moment. It reads the whole index (what data is stored in which file and where exactly) into memory. We’ll work on that next.

ProactiveServices · October 4, 2018, 2:49am

This could be achieved by checking for the presence of lock files in the repo, optionally after a restic unlock to remove stale locks.