Do you use check --read-data?

CodingChipmunk · January 2, 2025, 5:42am

Normal check after each prune according to the manual

What about check --read-data? How often do you do that, if at all? Curious about best practice. Thanks!

kapitainsky · January 2, 2025, 6:28am

Depends on how important your data is for you and how much you can afford (in terms of time and maybe download cost). Cats’ pictures vs fat bitcoin wallet backups might warrant different strategies.

If you can run it every time you run backup then great but it is rarely the case. Otherwise you have to find your balance - you can use partial checks (--read-data --read-data-subset) and tune it accordingly to your budget (time and money).

Generally speaking more you check then more likely you detect potential error(s) earlier.

I think more important than checking (still has to be done) is to make sure that you do not rely on single backup instance.

rawtaz · January 2, 2025, 9:10am

I store my backups on ZFS wich happens to also be on top of Ceph, so I hardly ever --read-data. But I do it once in a while anyway, and never had any corruption luckily.

Other backups where I haven’t got this nice filesystem I generally --read-data once a month in relation to monthly maintenance of the systems.

CodingChipmunk · January 2, 2025, 10:28pm

Thanks for sharing @rawtaz and @kapitainsky!

I found this from another backup software, Kopia:

If you are unable to, or do not want to, regularly run kopia snapshot verify --verify-files-percent=100, then it is recommended to at least run `kopia snapshot verify --verify-files-percent=1 […]. If you run this command daily, statistically over the course of a year you have a roughly 98% likelihood to have tested 100% of your backed up data.

For restic users that would be: check --read-data-subset=1%.

rawtaz · January 3, 2025, 12:25am

That’s documented here: Working with repositories — restic 0.17.3 documentation

tjh · January 3, 2025, 2:50am

I store my backups on remote 3rd parties (backblaze, borgbase) so yea, every night I check a % of the data, so that in theory every week I should have checked it all.
I’ve never had any issues except for the bug introduced in I think 0.16.3 with max compression. And the check-data picked that up. I thought I was having storage issues though, it wasn’t until a few days later I read that it was a bug in the compression library.

Like @rawtaz says though, I think it depends on how much you know/trust your storage layer to report errors.

If your data is important enough to backup, I don’t think it hurts every now and then to check that it’s still in a healthy state.

creativeprojects · January 3, 2025, 5:51pm

Like the others said, it depends on your storage medium.

I run my backup to 3 locations:

a local backup on a NAS (zfs)
one on Azure storage
one on a storage VPS

I trust the first one (zfs) as I run a scrub regularly, so I would know pretty quickly when my disks are failing.

I also trust the second one (Microsoft Azure losing data would make the headlines!)

For the third one (VPS), I do run a check daily. I have a configuration that generates a --read-data-subset restic flag for every day of the week. At the end of the week it has checked the whole repository (from 1/7 to 7/7).

fede · January 3, 2025, 6:37pm

I check 5% of the data from all backups once a week using a script

nicnab · January 4, 2025, 10:26am

Standard hardware here. Every Sunday morning, I check part (date +%W + 1) / 54. So that is calendar week plus 1 of 54 parts. +1 because, in date, week one at the beginning of the year can be 0 and 54 because the end of the year can be week 53 (plus one).

Technically this isn’t 100% of the repo but it’s pretty close and even on big repos my machine has all of sunday to check 1/52 of all data.

noeck · January 4, 2025, 5:04pm

FWIW, I use this script with restic check. It checks the next chunk whenever you call it:

LASTFILE=/path/to/file
NCHUNKS=50  # number of chunks
test -f $LASTFILE || echo $NCHUNKS > $LASTFILE
NCHUNK=$(< $LASTFILE)
NCHUNK=$((($NCHUNK % $NCHUNKS) + 1))
restic check --read-data-subset $NCHUNK/$NCHUNKS && echo $NCHUNK > $LASTFILE

webcaptcha · January 11, 2025, 3:12pm

check --read-data will download all pack files in repo according to docs. Does it mean it just download all repo as it is? And what’s next? How to check data? Manually?

rawtaz · January 11, 2025, 3:32pm

What does this question even mean? The entire purpose of check --read-data is that check verifies not only the integrity of the repository structure, but also the integrity of the actual backed up data stored in the repository. There’s no manual step needed in addition to this.

kapitainsky · January 11, 2025, 4:10pm

Without --read-data restic will only check whether all files are in the repo.

With --read-data every file will be downloaded and attempt will be made to “restore” it - effectively checking if no single bite is corrupted. I put restore in quotes as no actual restore is performed - nothing is written to a disk - it is all done in memory and discarded. Without such functionality you would have to physically restore all repo snapshots in order to check for potential corruptions.