This is a Python script. It’s doing “restic backup” to two local repos, one on an internal HD the other on an external HD. Just discovered the local HD repo has been failing for the past 7 hours. No snapshots have been created over this period. The repo on the external HD is working fine.
Every time the script runs it fails in the same way, complaining about the same offending snapshot, e717c1417c.
On running “restic check” on the offending repo I get “no errors were found”.
I tried “restic forget”:
>restic -r "E:\Backups\restic\My documents" --verbose -p "D:\..." forget e717c1417c
Ignoring "e717c1417c4298792fa21938accb913e80c3a30ad631c95429ba2963d5901284", could not load snapshot: load <snapshot/e717c1417c>: invalid data returned
What’s the best thing to do here?
NB Seems slightly strange that this backup process involves “loading” a snapshot… or could this be a reference to the new snapshot? This seems unlikely given the repeated complaints about e717c1417c each time. So why should this prevent executing a new backup on what is reported as a healthy repo …?
Obviously a new snapshot needs to make reference to the repo’s existing blobs, but I’d have thought that if one snapshot was found to be invalid, but the overall repo was healthy, it would then just move on to another “reference snapshot”, and merely issue a warning about the dud one. Not that I’m claiming to understand anything about the nuts and bolts of this magnificent creation!
restic has to inspect all snapshots while looking for a parent snapshot for the new backup.
It would be possible to handle this error case more gracefully, but I’m not sure how much that’s worth. If an HDD breaks snapshots, then it might also break the whole backup. Another downside of adding such a special case is that it complicates the code, which has to be weighed against the potential benefit.
That’s rather unexpected. Which size does the snapshot “e717c1417c4298792fa21938accb913e80c3a30ad631c95429ba2963d5901284” have on the filesystem?
In a quick test with a damaged snapshot, check behaves as expected for me.
You can manually delete the file named “e717c1417c4298792fa21938accb913e80c3a30ad631c95429ba2963d5901284” from the snapshots folder. But please create a copy of that file as it might still be useful to debug why check didn’t report an error.
This is an internal hard drive, as I say, so the “host” is W10, and this repo has been operational for over a year, and to my knowledge such an error has never occurred before: I have arranged things so that any restic error will result in an error file (icon) appearing on my desktop. And I also check these repos each week, doing partial restores.