Data integrity testing

g33kphr33k · February 20, 2024, 9:18am

Hi Folks

It’s the owner of a stupidly big archive here and I want to make sure I’m testing integrity regularly and correctly. I plan on scripting this and will share with everyone once I have completed it. I’m a sucker for automation including emailing results, and I like to use the sendEmail tool, so expect this soon.

Anyway, with such a HUGE archive (over 200TB), how would you best think I should handle data integrity tests? I run backups every day, there’s sometimes up to 1TB of updates or changes. I’d like to randomly test about 1% of the archive daily and after doing the 1st of the month prune and forgets, I’m thinking maybe 5%.

The servers doing this work are not used for anything other than data storage and not heavily hit by anything. They’re also TrueNAS Scale, ZFS, stacks of RAM with a load average of 1.63, 1.52, 1.36 and 48 cores available.

Thoughts?

alexweiss · February 20, 2024, 9:21am

restic check --read-data-subset x%

g33kphr33k · February 20, 2024, 2:52pm

Where x = the percentage, but is it RANDOM?

My thought is that I’d put 1% and it would just test the first 1% of the data every time.

alexweiss · February 20, 2024, 3:04pm

from restic check -h:

--read-data-subset subset read a subset of data packs, specified as 'n/t' for specific part, or either 'x%' or 'x.y%' or a size in bytes with suffixes k/K, m/M, g/G, t/T for a random subset

nicnab · February 20, 2024, 7:55pm

Like @alexweiss said use the x of y approach. Depending on when your machine is used the least you can probably even check bigger parts.

What I do with a storage server that has little to do on weekends: every Sunday I have it check weeknumber of 52 parts (e.g. 8/52, then 9/52 and so on). It’s not 100% of course because things change but that way you should make your way through the whole repo within the year.

You might adapt that to your use case and maybe check dayofmonth/30 or even dayofweek/7 or something like that.

rawtaz · February 20, 2024, 8:39pm

2 posts were merged into an existing topic: A restic client written in rust

fede · February 21, 2024, 5:26pm

Hello, on Linux or Windows?

punchcard · February 23, 2024, 12:23am

If you choose to be a little more paranoid then have your backup script pick one changed file and restore it. Then compare the “restored” file to the actual file. I did this with a non-restic backup for small MS Sql Server database by doing a backup in prod and restore into development on a daily basis for a year just to see if the backup and restore were stable. It was fine.

RYTD29 · February 23, 2024, 10:14am

I don’t get this. To my understanding you would check 1/52 = 1.92% of whole repository the first Sunday in the year and then 2/52 = 3.85% (would need double the time) and then the third Sunday 3/52 = 5.77% (would need 3x the time of first Sunday) and so on. On the last Sunday in the year you would check 52/52 = 100% of you repository which means checking everything… why that way?

nicnab · February 23, 2024, 10:20am

Check this explanation out. It’s not a division. “3/52” means “the third of 52 equal parts”.

RYTD29 · February 23, 2024, 11:24am

Ahh, that’s interesting. Thanks.