Force Full Backup after X time

#1

To improve restore time and to improve resilience, I would like to implement something like the “Force Full Backup after X” that duplicity supports.
The rational is this:

  • If I do incremental backups for years, a restore will take really long as it has to read all the changes since the initial full backup (is this actually correct with restic?)
  • With multiple, independent full backups over time, a corruption of the files in the storage location of the backup doesn’t potentially destroy all data after that date the file belonged to.
  • I can just store these old backups in a cold storage

Does this make sense or am I missing something?

#2

There is no ‘incremental’ concept in restic at all. Every backup is ‘full’ snapshot that represent whole filesystem tree at backup time. Restore of ‘latest’ and ‘10 years old’ snapshot should take same amount of time.

As about file corruption: restic splits file content to blobs and stores every blob only once. So data corruption will cause corruption of all snapshots. This is price you need to pay for deduplication. As a workaround you can run 'restic check --read-data` periodically and have more than one mirror of your backup

1 Like
#3

Thanks for the explanation, it makes sense.

restic check --read-data will not really help as it can only detect that something went wrong. And as you said, that will potentially affect all snapshots, not just the ones in the future.

So, to get an independent backup, I would need to create a new repository, right?

#4

That depends… Surely you can have two different repositories. But you will need to backup every machine twice.

I decided to have one repository ‘on-site’ (with good connectivity). And just mirror existing repo to B2. With this setup I’m getting much faster repository maintenance (check+prune over local LAN).

Sure it’s still possible that ‘data corruption’ will be replicated to B2. If you really want to handle this you can try to use fact that restic never change existing files in repository. So I see two possible solutions (but don’t use them right now):

  • Somehow adjust replication script to avoid overwriting of existing files.
  • Ask B2 to keep multiple file versions for more than restic check --read-data interval. So in case of check failure you’ll be able to find correct file version on remote.

Or you can just use something like ZFS/btrfs or other filesystem with data checksumming to make sure errors will not be replicated.

#5

You can use --force flag when backing up. It will “force re-reading the target files/directories”, as it says in the manual. It is not the same as a new full backup but you can use it. You could also restore the latest snapshot from time to time and compare files with the source files. It is not ideal but it can give you that peace of mind that you could be looking for. check --read-data --read-data-subset also gives you another way to verify backup integrity as mentioned here by @fd0.

#6

If you use rclone to sync the repository to B2, you just need to add the --immutable flag to achieve this.

--immutable  Do not modify files. Fail if existing files have been modified.
#7

Great, thanks a lot for all of your suggestions. I think that’s what I am going to do:

  • Have a backup to a local destination with restic and sync that with rclone to two different cloud backends with the immutable flag.
  • Use the --force flag once a month to ensure that no changes were missed by restic
  • Run check --read-data --read-data-subset periodically
  • (Perform a restore periodically)
#8

Ah… Thanks a lot!. I’m actually using custom script around rclone that tries to minimize number of B2 transactions by providing list of files to upload. Will try to add --immutable to it.

You should use --read-data OR --read-data-subset. Both args do exactly same thing.

#9

Right, thanks for the hint

#10

The periodic restore could be achieved by this solution, similar to the --read-data-subset to check only parts.

#11

Yes, it is very similar. I like restic-runner, it’s a nice script. But my question about this is, what is the benefit on restoring and using cmp VS using the --read-data-subset?

#12

It’s two different abstraction levels, I think.
--read-data-subset reads parts from every single file, but it is still all “inside” restic.
The restore only reads a few files, but it reads them and puts them on the disk, so it is outside of restic, so it tests that an actual restore is in fact possible.

1 Like
#13

That’s what I thought. So it is just a matter of trust at some level, so if you trust that restic it is doing its work well then a --read-data should be enough but if you want to go further, then you need to restore files and make your own tests. Seems alright. I might do it every X months or something like that.