Data integrity testing

Hi Folks

It’s the owner of a stupidly big archive here and I want to make sure I’m testing integrity regularly and correctly. I plan on scripting this and will share with everyone once I have completed it. I’m a sucker for automation including emailing results, and I like to use the sendEmail tool, so expect this soon.

Anyway, with such a HUGE archive (over 200TB), how would you best think I should handle data integrity tests? I run backups every day, there’s sometimes up to 1TB of updates or changes. I’d like to randomly test about 1% of the archive daily and after doing the 1st of the month prune and forgets, I’m thinking maybe 5%.

The servers doing this work are not used for anything other than data storage and not heavily hit by anything. They’re also TrueNAS Scale, ZFS, stacks of RAM with a load average of 1.63, 1.52, 1.36 and 48 cores available.

Thoughts?

2 Likes

restic check --read-data-subset x%

2 Likes

Where x = the percentage, but is it RANDOM?

My thought is that I’d put 1% and it would just test the first 1% of the data every time.

1 Like

from restic check -h:

--read-data-subset subset read a subset of data packs, specified as 'n/t' for specific part, or either 'x%' or 'x.y%' or a size in bytes with suffixes k/K, m/M, g/G, t/T for a random subset

1 Like

Like @alexweiss said use the x of y approach. Depending on when your machine is used the least you can probably even check bigger parts.

What I do with a storage server that has little to do on weekends: every Sunday I have it check weeknumber of 52 parts (e.g. 8/52, then 9/52 and so on). It’s not 100% of course because things change but that way you should make your way through the whole repo within the year.

You might adapt that to your use case and maybe check dayofmonth/30 or even dayofweek/7 or something like that.

1 Like

2 posts were merged into an existing topic: A restic client written in rust

Hello, on Linux or Windows?

If you choose to be a little more paranoid then have your backup script pick one changed file and restore it. Then compare the “restored” file to the actual file. I did this with a non-restic backup for small MS Sql Server database by doing a backup in prod and restore into development on a daily basis for a year just to see if the backup and restore were stable. It was fine.

2 Likes

I don’t get this. To my understanding you would check 1/52 = 1.92% of whole repository the first Sunday in the year and then 2/52 = 3.85% (would need double the time) and then the third Sunday 3/52 = 5.77% (would need 3x the time of first Sunday) and so on. On the last Sunday in the year you would check 52/52 = 100% of you repository which means checking everything… why that way?

1 Like

Check this explanation out. It’s not a division. “3/52” means “the third of 52 equal parts”.

1 Like

Ahh, that’s interesting. Thanks.

1 Like