Snapshots Dependencies - Backup vs Replication Tool

burtbailey · November 26, 2022, 10:38pm

I have been reading through the Restic documentation and it sounds fantastic. I have a few Linux servers that I usually backup via BackupPC current but it’s a pain to setup and it’s time for me to replace the back up server so I am looking at options.

The word “snapshot” is used a lot. It concerned me some because I am looking for long term to disk storage not a replication solution. Do the snapshots have a child/parent relationship? If my 1 year backup corrupted, would all the rest following be unusable? If so what protections are in place to prevent against something like this? I appreciate any information you may share.

Last, I probably will need to just try it but is Restic a viable option for large storage backups? 9TB?

nicnab · November 27, 2022, 1:47pm

I have been using restic for a few years now and find it very reliable. Not only backing up but also restoring, which I have done a few times already. In fact I rely so much on it that when I switch my personal box, I just install Linux and then copy stuff over from the mounted snapshot that I need.

Yes they do. Each snapshot can have a parent and when you backup the same path from the same client, the last one is automatically used as a parent.

If one of restic’s files is corrupted, the file it’s part of will be broken - and all files that contain the same part as restic deduplicates even parts of files.

If you want to avoid corruption, simply rsync the whole restic repo to a second location and regularily run restic check --read-data(-subset) on the repos.

My biggest repo is somewhere north of 2TB but I don’t see a reason why 9TB shouldn’t be possible. I think there is some limit regarding cache and RAM but that might have already been adressed in the current version. Just search the forum and you’ll find discussions about it.

burtbailey · November 27, 2022, 7:55pm

Thanks for the information. I will have to decide if that is the path for me or not. Backups are something I want to be really reliable in case the sources break.

doscott · November 27, 2022, 11:31pm

I would like to clarify that each snapshot is independent. While there is a parent relationship in terms of a backup to optimize performance, a snapshot is just information on your file system (as defined in your backup command) at the point in time the snapshot is taken, not the data at that point in time.
Thus, if a snapshot is corrupted from 1 year ago, none of the data is corrupted and if that data existed at the time other snapshots were made it is easily recoverable. However if data is corrupted from one year ago, it doesn’t matter how many snapshots have been made, the data is gone.

rawtaz · November 28, 2022, 12:05am

I was just about to clarify snapshots, that they are just a point in time and generally not something you have to consider much, but aside from that, you really can’t get much more solid backups than with restic. It’s a backup software that not only lets you verify the integrity of your backups, but can even indicate to you when your own hardware is faulty (this has happened numerous times). If this isn’t enough for you, I’m not sure what would be. So if I were you, i’d just start trying it out.

nicnab · November 28, 2022, 7:55am

Thanks @doscott and @rawtaz for clarifying. What I really wanted to say (and I guess what @burtbailey asked): a snapshot is not a full backup every time. So if a file was backed up a year ago, hasn’t changed since then and is corrupted, it’s gone.

NoahD · May 12, 2023, 5:17pm

If data is that important to you I’d probably make use of two repos. One for daily and another for weekly. That way it’s independent from each other. Maybe repo on different storage devices.

Hence the expression “Don’t put all your eggs in one basket.”

Restic makes this real easy via scripts.

kapitainsky · May 12, 2023, 6:27pm

100% true. Restic seems to be solid and stable but you will never know 100%. I use two different cloud backup programs. In general if you really care about your data these are steps I follow:

use RAID so that your system stays up during the rebuild from a disk failure
Filesystem snapshots so that “oh shit I accidentally deleted the whole folder” can be rolled back.
Nearline backups (automatic) so that you can quickly recover from a complete server failure, ransomware attack, etc.
Offsite backups (automatic) so that you can recover, in a few days, from fire or flood or theft (this is what Backblaze, AWS Glacier, etc. are good at).
And, if you’re really paranoid, offsite backups (manual, suitcase full of HDDs) so that you aren’t at the mercy of the cloud backup company if things go really wrong.