but it does not really answer my question: Might there be errors in the repository which are fixable but become unfixable when running prune or rebuild-index or forget or just about any other command?
check --read-data says the repo has errors if it finds “unused blob …”. This can be fixed by prune. But I really do not want to do any write operation if there is a serious error in the repository unless I am absolutely certain it is safe to do.
So what is the correct thing to do for a weekly automated cronjob for cleaning things up?
check --read-data | grep -v ‘unused blob’ | grep -v ‘not referenced’ | grep -v any normal output
if output remains, the repository has errors that need to be fixed
only if the repository has no errors, run prune
check --read-data again, it should not find errors anymore
run forget
prune again
check # unless it is absolutely certain that forget and prune cannot introduce errors
If the repo has a real error, what is recommended? For a defect disk with a defect filesystem, I should make a full 1:1 copy of the disk and only try to fix the copy. Is that also true for a restic repository?
Making a copy is always recommended if errors are suspected. Note that in restic’s case, you can run cp -al repository repository-backup which will hard-link all of the files in the repository. Since restic never modifies files (only creates and deletes) you have a backup copy of the prior state in case things go sideways, but without consuming all of the extra disk space.
So, one possible approach would be:
If a directory exists at the backup copy location, abort and do nothing because the last invocation failed or has not finished.
Run cp -al to make a backup copy of the repository.
Run forget.
Run prune.
Run check --read-data.
Delete the backup copy.
If any step prior to 5 fails with a non-zero exit code, immediately abort the script and mail the admin. They’ll have both a backup copy and the working repository to investigate.
One problem is that restic 0.9.6 returns exit 0, even if restic check finds errors. Did that at least once when I checked. But then I believe I saw a discussion in this forum where restic check shows errors which are really only warnings and can be ignored. Maybe it exits <>0 for real errors? Will have to check that.