Which restic commands are always guaranteed not to make fixable errors in the repository unfixable?

wrohdewald · December 3, 2019, 5:27am

I found https://forum.restic.net/t/unable-to-complete-check-fatal-repository-contains-errors/1244

but it does not really answer my question: Might there be errors in the repository which are fixable but become unfixable when running prune or rebuild-index or forget or just about any other command?

check --read-data says the repo has errors if it finds “unused blob …”. This can be fixed by prune. But I really do not want to do any write operation if there is a serious error in the repository unless I am absolutely certain it is safe to do.

So what is the correct thing to do for a weekly automated cronjob for cleaning things up?

check --read-data | grep -v ‘unused blob’ | grep -v ‘not referenced’ | grep -v any normal output
if output remains, the repository has errors that need to be fixed
only if the repository has no errors, run prune
check --read-data again, it should not find errors anymore
run forget
prune again
check # unless it is absolutely certain that forget and prune cannot introduce errors

If the repo has a real error, what is recommended? For a defect disk with a defect filesystem, I should make a full 1:1 copy of the disk and only try to fix the copy. Is that also true for a restic repository?

cdhowie · December 3, 2019, 5:43am

Making a copy is always recommended if errors are suspected. Note that in restic’s case, you can run cp -al repository repository-backup which will hard-link all of the files in the repository. Since restic never modifies files (only creates and deletes) you have a backup copy of the prior state in case things go sideways, but without consuming all of the extra disk space.

So, one possible approach would be:

If a directory exists at the backup copy location, abort and do nothing because the last invocation failed or has not finished.
Run cp -al to make a backup copy of the repository.
Run forget.
Run prune.
Run check --read-data.
Delete the backup copy.

If any step prior to 5 fails with a non-zero exit code, immediately abort the script and mail the admin. They’ll have both a backup copy and the working repository to investigate.

wrohdewald · December 5, 2019, 4:15pm

Sorry for not answering sooner.

I like your approach. To make it easier for me, I will extend my wrapper restaround in that direction so I can then do

restaround prof cpal && restaround prof forget … && restaround prof prune && restaround prof check --read-data && restaround prof rmcpal

One problem is that restic 0.9.6 returns exit 0, even if restic check finds errors. Did that at least once when I checked. But then I believe I saw a discussion in this forum where restic check shows errors which are really only warnings and can be ignored. Maybe it exits <>0 for real errors? Will have to check that.