Nice idea. As far as I understand it the script randomly selects a few files that should be unmodified (or from the latest snapshot just made?) and tests if they can be restored and contain the same content. Is that about right?
For the reference: There’s something similar on a different level already integrated in restic check: when you specify --read-data --read-data-subset 1/10 , restic will read the first tenth of the files in the backend and verify the SHA256 hash. Calling --read-data-subset 2/10 will then read the second tenth, and eventually you’ve downloaded and checked all files in the backend.
The snapshot defaults to the latest one, but can be specified.
The number of files defaults to 10, but can be specified.
Without the --compare option, the files are restored to a temporary directory, and if restic exits without errors, the script does also.
With the --compare option, the restored files are compared to the live files at the original location with cmp. If any fail, the error is printed, and the script exits with the number of files that differ.
Thanks, I wasn’t aware of that.
How are subsets chosen? i.e. if I specify 1/10, which tenth is chosen? Would it be possible for restic to choose a random subset of a given ratio? e.g. restic check --read-random-subset 0.1 to read and check 10% of backend files, chosen at random?
The idea is: For each data file in the repo, take the first byte of the ID (the first two chars of the file name), compute the remainder and compare it. So, for --read-data-subset k/n and the first byte C of the file name, the file is read and checked iff C % n == k-1. Examples:
--read-data-subset 1/10 will e.g. read the files starting with 00 (0x00 % 10 = 0) and 14 (0x14 % 10 == 0) but not ff (0xff % 10 == 5)
--read-data-subset 2/23 will e.g. read the files starting with 2f (0x2f % 23 = 1), but not 10 (0x10 % 23 == 16)
It’s possible to implement this within restic, but you can also do that with shell (mind the +1):