New verify-randomly command in restic-runner

alphapapa · November 18, 2018, 5:11am

Hi friends,

FYI, I’ve just added a verify-randomly command in restic-runner: https://github.com/alphapapa/restic-runner#verify-randomly-n For former Obnam users, this will be familiar. For example:

# Restore and compare 5 random files from latest snapshot in the 
# local-disk repo in the me/music backup set with verbose output.
$ restic-runner -v --repo local-disk --set me/music --compare verify-randomly 5
LOG (2018-11-17 23:12:06): REPO:local-disk SET:me/music
LOG (2018-11-17 23:12:09): VERIFYING 5 files from snapshot e4d70c15...
repository 06d38433 opened successfully, password is correct
restoring <Snapshot e4d70c15 of [/home/me/Music] at 2018-11-17 05:09:03.317062072 -0600 CST by @localhost> to /tmp/tmp.2Ad6SF3MbA
LOG (2018-11-17 23:12:13): COMPARING with live versions...
VERBOSE: Comparing file: /home/me/Music/a.mp3
VERBOSE: Comparing file: /home/me/Music/b.mp3
VERBOSE: Comparing file: /home/me/Music/c.mp3
VERBOSE: Comparing file: /home/me/Music/d.mp3
VERBOSE: Comparing file: /home/me/Music/e.mp3
LOG (2018-11-17 23:12:14): verify-randomly FINISHED.  Duration: 8s

I hope someday Restic has this functionality built-in, but until then, this works well.

fd0 · November 18, 2018, 9:57am

Nice idea. As far as I understand it the script randomly selects a few files that should be unmodified (or from the latest snapshot just made?) and tests if they can be restored and contain the same content. Is that about right?

For the reference: There’s something similar on a different level already integrated in restic check: when you specify --read-data --read-data-subset 1/10 , restic will read the first tenth of the files in the backend and verify the SHA256 hash. Calling --read-data-subset 2/10 will then read the second tenth, and eventually you’ve downloaded and checked all files in the backend.

alphapapa · November 18, 2018, 10:03am

Basically, yes:

The snapshot defaults to the latest one, but can be specified.
The number of files defaults to 10, but can be specified.
Without the --compare option, the files are restored to a temporary directory, and if restic exits without errors, the script does also.
With the --compare option, the restored files are compared to the live files at the original location with cmp. If any fail, the error is printed, and the script exits with the number of files that differ.

Thanks, I wasn’t aware of that.

How are subsets chosen? i.e. if I specify 1/10, which tenth is chosen? Would it be possible for restic to choose a random subset of a given ratio? e.g. restic check --read-random-subset 0.1 to read and check 10% of backend files, chosen at random?

Thanks for your work on restic!

fd0 · November 18, 2018, 11:27am

Good question, the code is here

The idea is: For each data file in the repo, take the first byte of the ID (the first two chars of the file name), compute the remainder and compare it. So, for --read-data-subset k/n and the first byte C of the file name, the file is read and checked iff C % n == k-1. Examples:

--read-data-subset 1/10 will e.g. read the files starting with 00 (0x00 % 10 = 0) and 14 (0x14 % 10 == 0) but not ff (0xff % 10 == 5)
--read-data-subset 2/23 will e.g. read the files starting with 2f (0x2f % 23 = 1), but not 10 (0x10 % 23 == 16)

It’s possible to implement this within restic, but you can also do that with shell (mind the +1):

$ restic check --read-data --read-data-subset $(($RANDOM % 10 + 1))/10