Force Full Backup after X time

dhilgarth · May 6, 2019, 12:21pm

To improve restore time and to improve resilience, I would like to implement something like the “Force Full Backup after X” that duplicity supports.
The rational is this:

If I do incremental backups for years, a restore will take really long as it has to read all the changes since the initial full backup (is this actually correct with restic?)
With multiple, independent full backups over time, a corruption of the files in the storage location of the backup doesn’t potentially destroy all data after that date the file belonged to.
I can just store these old backups in a cold storage

Does this make sense or am I missing something?

dionorgua · May 6, 2019, 12:39pm

There is no ‘incremental’ concept in restic at all. Every backup is ‘full’ snapshot that represent whole filesystem tree at backup time. Restore of ‘latest’ and ‘10 years old’ snapshot should take same amount of time.

As about file corruption: restic splits file content to blobs and stores every blob only once. So data corruption will cause corruption of all snapshots. This is price you need to pay for deduplication. As a workaround you can run 'restic check --read-data` periodically and have more than one mirror of your backup

dhilgarth · May 6, 2019, 1:09pm

Thanks for the explanation, it makes sense.

restic check --read-data will not really help as it can only detect that something went wrong. And as you said, that will potentially affect all snapshots, not just the ones in the future.

So, to get an independent backup, I would need to create a new repository, right?

dionorgua · May 6, 2019, 2:48pm

That depends… Surely you can have two different repositories. But you will need to backup every machine twice.

I decided to have one repository ‘on-site’ (with good connectivity). And just mirror existing repo to B2. With this setup I’m getting much faster repository maintenance (check+prune over local LAN).

Sure it’s still possible that ‘data corruption’ will be replicated to B2. If you really want to handle this you can try to use fact that restic never change existing files in repository. So I see two possible solutions (but don’t use them right now):

Somehow adjust replication script to avoid overwriting of existing files.
Ask B2 to keep multiple file versions for more than restic check --read-data interval. So in case of check failure you’ll be able to find correct file version on remote.

Or you can just use something like ZFS/btrfs or other filesystem with data checksumming to make sure errors will not be replicated.

Dj0k3 · May 6, 2019, 3:27pm

You can use --force flag when backing up. It will “force re-reading the target files/directories”, as it says in the manual. It is not the same as a new full backup but you can use it. You could also restore the latest snapshot from time to time and compare files with the source files. It is not ideal but it can give you that peace of mind that you could be looking for. check --read-data --read-data-subset also gives you another way to verify backup integrity as mentioned here by @fd0.

cdhowie · May 6, 2019, 3:29pm

If you use rclone to sync the repository to B2, you just need to add the --immutable flag to achieve this.

--immutable  Do not modify files. Fail if existing files have been modified.

dhilgarth · May 6, 2019, 3:56pm

Great, thanks a lot for all of your suggestions. I think that’s what I am going to do:

Have a backup to a local destination with restic and sync that with rclone to two different cloud backends with the immutable flag.
Use the --force flag once a month to ensure that no changes were missed by restic
Run check --read-data --read-data-subset periodically
(Perform a restore periodically)

dionorgua · May 6, 2019, 7:14pm

Ah… Thanks a lot!. I’m actually using custom script around rclone that tries to minimize number of B2 transactions by providing list of files to upload. Will try to add --immutable to it.

You should use --read-data OR --read-data-subset. Both args do exactly same thing.

dhilgarth · May 7, 2019, 6:22am

Right, thanks for the hint

dhilgarth · May 7, 2019, 6:22am

The periodic restore could be achieved by this solution, similar to the --read-data-subset to check only parts.

Dj0k3 · May 7, 2019, 2:42pm

Yes, it is very similar. I like restic-runner, it’s a nice script. But my question about this is, what is the benefit on restoring and using cmp VS using the --read-data-subset?

dhilgarth · May 9, 2019, 6:30am

It’s two different abstraction levels, I think.
--read-data-subset reads parts from every single file, but it is still all “inside” restic.
The restore only reads a few files, but it reads them and puts them on the disk, so it is outside of restic, so it tests that an actual restore is in fact possible.

Dj0k3 · May 9, 2019, 5:02pm

That’s what I thought. So it is just a matter of trust at some level, so if you trust that restic it is doing its work well then a --read-data should be enough but if you want to go further, then you need to restore files and make your own tests. Seems alright. I might do it every X months or something like that.