Should I run restic check?


#1

I run restic backup every day and restic forget --prune once a week (sunday).
Should I run restic check? When?


#2

I’m honestly not aware of anything check does (without options) that prune does not. (Ping to @fd0 to confirm this post; I don’t want to spread misinformation and if I’m wrong with anything in this post I’d like to correct my knowledge.)

Comparing a no-options check with prune: both operations check linkage of all snapshots and trees (prune has to do this to figure out what objects can be deleted), and both operations check the validity (but not integrity!) of all packs (prune will delete packs without a valid header as invalid/incomplete).

(Side note: prune also includes a rebuild-index operation.)

Check does offer a --read-data option which will read all objects in the repository and make sure that they are not damaged. If you are storing your repository on an HDD, this operation is generally referred to as “scrubbing” and you should probably do this weekly unless your repo is stored in a redundant RAID; in that case, you should rely on your operating system’s (hopefully automated weekly) RAID scrubbing process instead. If you are using cloud storage, --read-data is likely unnecessary as the provider hopefully does their own scrubbing.

tl;dr:

  • A no-options check is mostly redundant after running prune so if you regularly run prune then a no-options check is unnecessary.
  • check --read-data should be performed weekly on HDDs, only if the HDD is not part of a RAID with a level providing redundancy.
    • RAIDs with redundancy should be scrubbed automatically by the OS. Consult your OS documentation to be sure.
    • Cloud storage should not need to be scrubbed at all, as the provider should scrub on their end.

#3

I’m using B2 to backup…


#4

Then I would not worry yourself with running check unless another command reports some error.

  • prune will accomplish the same things (and more) as a bare check.
  • B2 will do their own scrubbing.
  • Scrubbing (check --read-data) will fetch all data in the repository, and cost you the B2 egress rate of $0.01 per GB of scrubbed data.

#5

I confirm that you understood everything correctly. The original idea was to have an operation (check) which does not modify the repo and can be run to find out if everything is good.


#6

That’s very interesting because every sunday I used to run “check” after “forget --prune”. After read this thread I understand I can safely remove “check” command. It is good to hear that because “check” operation is CPU hungry.

The only thing I did not fully understand is “check” operation is very CPU intensive, and “forget --prune” not, why is that? maybe “check” uses multicore and “forget --prune” not?


#7

The reason is that forget isn’t optimized well (yet) and mostly runs single-threaded, so you won’t notice the CPU usage that much. On the other hand, check is concurrent and does things in parallel as fast as possible.


#8

Ok, thanks for the explanation :grinning: