Should I run restic check?

cberni · December 10, 2018, 12:40am

I run restic backup every day and restic forget --prune once a week (sunday).
Should I run restic check? When?

cdhowie · December 10, 2018, 1:33am

I’m honestly not aware of anything check does (without options) that prune does not. (Ping to @fd0 to confirm this post; I don’t want to spread misinformation and if I’m wrong with anything in this post I’d like to correct my knowledge.)

Comparing a no-options check with prune: both operations check linkage of all snapshots and trees (prune has to do this to figure out what objects can be deleted), and both operations check the validity (but not integrity!) of all packs (prune will delete packs without a valid header as invalid/incomplete).

(Side note: prune also includes a rebuild-index operation.)

Check does offer a --read-data option which will read all objects in the repository and make sure that they are not damaged. If you are storing your repository on an HDD, this operation is generally referred to as “scrubbing” and you should probably do this weekly unless your repo is stored in a redundant RAID; in that case, you should rely on your operating system’s (hopefully automated weekly) RAID scrubbing process instead. If you are using cloud storage, --read-data is likely unnecessary as the provider hopefully does their own scrubbing.

tl;dr:

A no-options check is mostly redundant after running prune so if you regularly run prune then a no-options check is unnecessary.
check --read-data should be performed weekly on HDDs, only if the HDD is not part of a RAID with a level providing redundancy.
- RAIDs with redundancy should be scrubbed automatically by the OS. Consult your OS documentation to be sure.
- Cloud storage should not need to be scrubbed at all, as the provider should scrub on their end.

cberni · December 10, 2018, 2:46am

I’m using B2 to backup…

cdhowie · December 10, 2018, 3:43am

Then I would not worry yourself with running check unless another command reports some error.

prune will accomplish the same things (and more) as a bare check.
B2 will do their own scrubbing.
Scrubbing (check --read-data) will fetch all data in the repository, and cost you the B2 egress rate of $0.01 per GB of scrubbed data.

fd0 · December 10, 2018, 9:37am

I confirm that you understood everything correctly. The original idea was to have an operation (check) which does not modify the repo and can be run to find out if everything is good.

fbarbeira · December 10, 2018, 11:45am

That’s very interesting because every sunday I used to run “check” after “forget --prune”. After read this thread I understand I can safely remove “check” command. It is good to hear that because “check” operation is CPU hungry.

The only thing I did not fully understand is “check” operation is very CPU intensive, and “forget --prune” not, why is that? maybe “check” uses multicore and “forget --prune” not?

fd0 · December 12, 2018, 7:14am

The reason is that forget isn’t optimized well (yet) and mostly runs single-threaded, so you won’t notice the CPU usage that much. On the other hand, check is concurrent and does things in parallel as fast as possible.

fbarbeira · December 12, 2018, 12:14pm

Ok, thanks for the explanation

Ataraxy · August 4, 2019, 8:07am

I’m a little confused…

If I understand this thread, then a check after either a prune or forget --prune is totally unnecessary.

Why then would the documentation say:

It is advisable to run restic check after pruning, to make sure you are alerted, should the internal data structures of the repository be damaged.

quinncom · November 3, 2023, 11:54pm

Is anyone available to address the concern @Ataraxy raised? I too am confused why the docs say “It is advisable to run restic check after pruning” when the advice earlier in this thread says the opposite. Perhaps this is a documentation bug?

MichaelEischer · November 4, 2023, 2:14pm

prune indeed performs most of the checks also executed by check (verifying file sizes, the index and the snapshot structure). As prune also writes new files, it could in theory introduce new problems which would be detected by a subsequent check run. However, at least in somewhat recent restic versions, prune seems to be reliable enough that running check afterwards is not necessary.

It’s essentially outdated documentation and should be toned down a bit.

check --read-data (or one of it’s variants that read only a part of the data) also verifies the actual content of the data files. That can help detecting bitrot that somehow has occurred on the repository storage.