Hello all!
I’ve been using restic for a couple of years now to push some local backups up to S3 storage. Our current backup structure works like the following:
- Backup server uses rsync to pull a copy of the remote filesystem to a local storage drive. The directory tree looks like:
/backup/machine1
/backup/machine2
etc.
- Once all of the above rsync’s finish, the backup server fires off restic against the /backup directory, pushing the entire tree up to S3. This dataset is approximately 2.7T of space, but much of it is duplicated (since we are backing up the entire OS).
The problem I’m running into right now is that our S3 costs have grown to a point where we need to prune out some older backups. I was trying to do that over the weekend, and ran into a lot of trouble - the memory footprint of Restic got outrageously large, to the point even a M6i.4xlarge EC2 instance (64G of ram) was still having restic get OOM’d). Plus, the purge takes so long to complete that we would end up needing to skip backup cycles, since the purge locks the Repository (we currently run backups every Monday, Wednesday, and Friday evening).
Anyone got any suggestions on how to optimize this? One thing I thought of is instead of taking the entire /backup in one shot, running individual snapshots for each machine - but I’m not sure how well deduplication would work in that scenario, and plus, I worry we might end up crossing over into the next backup cycle if we do that.