Need suggestions on to recover my corrupted repository

mdw · February 15, 2020, 6:59pm

I’m having terrible trouble with my restic repo. It’s been perfect for over a year but now I seem to be in a unrecoverable state. Here is the history.

‘restic check’ return many errors about a few files with blobs not found.

So I removed the snapshots associated with those blobs, followed by ‘restic check’ again. Check found, expectedly, unassociated blobs, but otherwise completed successfully. So I ran ‘restic prune’ to remove them. Prune could not load a particular tree and crashed. So I tried check again. And I now get 100s of lines like these below:

tree 774ac7d9bf1da11ba13dc0884790f27555ca6cc615adb3f087158713d0a7cf9c not foun
d in repository
error for tree 5220509a:
tree 5220509ace0c84cfb856284415381e03874fc65a447b071eb7289a504ad5caf5 not foun
d in repository
error for tree c7f50625:
tree c7f506254c9592edf8dbc8220429152b890d12d16b94358a6dabcb794200a306 not foun
d in repository
error for tree 7dd7de39:
tree 7dd7de3944c7854d1a1b55e8c20873295a29c3527ccb454589f2757d16077dd3 not foun
d in repository
error for tree 1ec423c3:
tree 1ec423c32c00de160727feaf0493b6a659289729e41417d798dfb540290c7fab not foun
d in repository
error for tree 1613a443:
tree 1613a443c8e11d8cfdcc355517ffd8e6d97861e99239e2fb7bc99685486a542a not foun
d in repository
Fatal: repository contains errors

Any Idea how to proceed with this? I’m out of ideas.

MichaelEischer · February 15, 2020, 10:12pm

Do you have a log of the prune run? I’m not completely sure from your description whether prune failed while looking for used blobs or whether this happened while rewriting pack files. However, the repository shouldn’t end up damaged in either of these cases as prune only deletes old pack files after the new ones were written.

You could try to recreate the repository index by running restic rebuild-index. But please create a backup copy of the index folder in your repository to have a way back, in case things get worse.

mdw · February 16, 2020, 2:17pm

Thanks @MichaelEischer for the reply. Here’s what I did:

I ran restic rebuild-index. That completed successfully.
I then tried restic prune again. This time it worked just fine.
After than I tried a restic check which also completed without error, e.g. ending with “no errors found”.
Finally, I tried a restic mount and checked that I could see files, etc. I know that this is not much of a test but it made me feel better.

I did not do anything to repair the previous round of errors after my post and your reply yesterday. I should say: this is a 340GB repo, and three clients add to this repo every 4 hours (when the machines are up). I did not disable the restic backup commands (systemd services) on the clients, and all seemed to be adding files successfully. Even though, in retrospect, I probably should have suspended the backups.

So, while very pleased that the repo is up and running and checks clean, I’m just mystified about what happened.

MichaelEischer · February 16, 2020, 10:37pm

Hmm, this sound like some pack files were missing from the index. Did the check run from your first post complain about pack ... not referenced in any index?

Which backend are you using? I wonder whether this might be caused by restic not seeing all pack files when rebuilding the index.

The prune and check commands create an exclusive lock in the repository and thus backup runs should just fail when trying to get a lock.

mdw · February 17, 2020, 1:43am

restic check did complain about some data that was not part of any index and suggested a restic prune. Sorry, didn’t have my script command running at that point, so I can’t recall the exact error, but the check did finish without error. It was only after running prune that I got 100s of errors about tree not found.

I’m using the sftp backend from three clients.

Yes, thanks, I understand the exclusive locking. I was just remarking that if I knew that my repository was in trouble, it might not have been so smart to let the clients add to it.

In the end, everything worked out. I’ve only had a few glitches since starting to use restic a year ago. The one before this was solved with a rebuild-index.

MichaelEischer · February 17, 2020, 7:53pm

I’d suggest to restic check --read-data recheck the whole repository to make sure that your backup is in fact ok.

The worst thing that can happen when backing up to a damaged repository, is that the new snapshots miss a few files when they refer to missing blobs. In case restic erroneously misses files in the index, the worst thing that can happen is that the blob is added a second time to the repository. The nice thing about restic is that backups only add files, but never delete them. This ensures that a repository is not damaged any further.

I still wonder how the index got damaged during pruning. sftp should be able to list all files in a directory immediately after they were written… Do you still have a log of the failed prune run? It would be useful to know where exactly things went wrong.

mdw · February 17, 2020, 10:11pm

I’m running restic check --read-data now. It’s been running for nearly two hours and is only 65% done. But so far so good.

Sadly, I don’t have a log of the restic prune output that failed. All my production goes into log files, but I was trying to ‘fix’ this on the fly. Should have started script, I know. All I can recall was that there was a long go runtime traceback. Yes, without the details, that is not helpful, I know. I’ll keep a record in the future.

mdw · February 18, 2020, 2:05am

Okay, restic check --read-data completed successfully. And I ran another restic prune, just to make sure that whatever happened earlier did not happen again, and that completed correctly as well.

I sure wish I knew what happened previously. But I have no easy way of reconstructing that.