Hello,
To verify the integrity of my backups, I’m using the check command with the --read-data-subset=10% flag.
I know this is selects a random subset of all packs to be checked and that there’s no guarantee that all packs will be checked with sufficient invocations.
Now, I also know a bout the n/t syntax but even if the documentation doesn’t say so, I have the feeling this could miss packs to be checked in the following situation:
- Invocation 1, 1000 packs in the repository, called with 1/10 so the first 100 are checked
- New backup, there are now 1200 packs in the repository
- Invocation 2, called with 2/10 then packs from 121 to 240 are checked, missing those from 101 to 120
Am I right in my assumption here?
To avoid this situation, would it be possible to amend the x% method with a directed random selection? By that I mean that each pack gets assigned a check counter and that only those with the lowest value are considered for random selection on each invocation. This would give this situation:
- Invocation 1, 1000 packs in the repository, called with 10%, the selected packs get their counter set to 1.
- New backup, 1200 packs are now in the repository, the newly added packs get a counter set to 0.
- Invocation 2, called again with 10%, only those with their counter set at 0 are considered for inclusion in the random draw.
If there are not enough packs with their counter set at the lowest value, then those with their value set to the lowest value + 1 are considered and so on until enough packs are selected (or all remaining packs are).
With this, I would get the assurance that all packs are checked at least once before they are checked again.
Please do not hesitate to comment on this, I’m not sure I have covered every corner case here.