Help me optimize my Restic backups

Hello all!

I’ve been using restic for a couple of years now to push some local backups up to S3 storage. Our current backup structure works like the following:

  1. Backup server uses rsync to pull a copy of the remote filesystem to a local storage drive. The directory tree looks like:

/backup/machine1
/backup/machine2
etc.

  1. Once all of the above rsync’s finish, the backup server fires off restic against the /backup directory, pushing the entire tree up to S3. This dataset is approximately 2.7T of space, but much of it is duplicated (since we are backing up the entire OS).

The problem I’m running into right now is that our S3 costs have grown to a point where we need to prune out some older backups. I was trying to do that over the weekend, and ran into a lot of trouble - the memory footprint of Restic got outrageously large, to the point even a M6i.4xlarge EC2 instance (64G of ram) was still having restic get OOM’d). Plus, the purge takes so long to complete that we would end up needing to skip backup cycles, since the purge locks the Repository (we currently run backups every Monday, Wednesday, and Friday evening).

Anyone got any suggestions on how to optimize this? One thing I thought of is instead of taking the entire /backup in one shot, running individual snapshots for each machine - but I’m not sure how well deduplication would work in that scenario, and plus, I worry we might end up crossing over into the next backup cycle if we do that.

Which restic version are you using?

The prune I’m currently running is using 0.14.0. I just tried to launch it with GOGC=20, since I saw references to that perhaps helping with high memory usage.

Can you provide more information on the size of the repository and the number of snapshots? The total size, number of files in the repository and the size of the index folder would be particularly interesting.

How far did the prune command get? v0.15.0 which has just been released should reduce the memory requirements quite a bit (probably 30% maybe more), it should also increase the performance a bit further.

If all snapshot end up in the same repository, it won’t make a difference for prune whether the snapshot is a single large one or multiple smaller ones.

1 Like

Maybe you should reconsider adding something like Add new command `repoinfo` by aawsome · Pull Request #2543 · restic/restic · GitHub to restic - for support cases like this, it is much easier to ask for “please give the output of restic ...” - just my 2ct…

1 Like