Which restic commands are always guaranteed not to make fixable errors in the repository unfixable?

I found https://forum.restic.net/t/unable-to-complete-check-fatal-repository-contains-errors/1244

but it does not really answer my question: Might there be errors in the repository which are fixable but become unfixable when running prune or rebuild-index or forget or just about any other command?

check --read-data says the repo has errors if it finds “unused blob …”. This can be fixed by prune. But I really do not want to do any write operation if there is a serious error in the repository unless I am absolutely certain it is safe to do.

So what is the correct thing to do for a weekly automated cronjob for cleaning things up?

  1. check --read-data | grep -v ‘unused blob’ | grep -v ‘not referenced’ | grep -v any normal output
  2. if output remains, the repository has errors that need to be fixed
  3. only if the repository has no errors, run prune
  4. check --read-data again, it should not find errors anymore
  5. run forget
  6. prune again
  7. check # unless it is absolutely certain that forget and prune cannot introduce errors

If the repo has a real error, what is recommended? For a defect disk with a defect filesystem, I should make a full 1:1 copy of the disk and only try to fix the copy. Is that also true for a restic repository?

Making a copy is always recommended if errors are suspected. Note that in restic’s case, you can run cp -al repository repository-backup which will hard-link all of the files in the repository. Since restic never modifies files (only creates and deletes) you have a backup copy of the prior state in case things go sideways, but without consuming all of the extra disk space.

So, one possible approach would be:

  1. If a directory exists at the backup copy location, abort and do nothing because the last invocation failed or has not finished.
  2. Run cp -al to make a backup copy of the repository.
  3. Run forget.
  4. Run prune.
  5. Run check --read-data.
  6. Delete the backup copy.

If any step prior to 5 fails with a non-zero exit code, immediately abort the script and mail the admin. They’ll have both a backup copy and the working repository to investigate.

Sorry for not answering sooner.

I like your approach. To make it easier for me, I will extend my wrapper restaround in that direction so I can then do

restaround prof cpal && restaround prof forget … && restaround prof prune && restaround prof check --read-data && restaround prof rmcpal

One problem is that restic 0.9.6 returns exit 0, even if restic check finds errors. Did that at least once when I checked. But then I believe I saw a discussion in this forum where restic check shows errors which are really only warnings and can be ignored. Maybe it exits <>0 for real errors? Will have to check that.