Corrupted repo, how to repair?


I started making regular backups to Hubic. It worked well until a few days ago, when on another machine, getting the snapshot list would send resting in a big loop of read errors.
I then created an account on Wasabi, used rclone to copy data and resumed making backups.

On the good side, Wasabi seems to be working more correctly than Hubic and I didn’t get a network error yet.

I’m using the word correctly because rclone can list files it can’t read. So I do expect and accept data loss.

I could start a new repo, but…

  • I have 200 GB already uploaded, so making new backups now should be faster.
  • Failures also happen in real life, so it’s an opportunity to see how restic reacts when thigs go bad.

I would like to be able to have the check command succeed, by trimming tress that are incomplete or link to incomplete files.

Result of restic check:

error for tree bc2e8dc5:
  tree bc2e8dc5: file "IUPnPCDSAdapter.cpp" blob 0 size could not be found
  tree bc2e8dc5: file "IUPnPCDSAdapter.cpp": metadata size (1600) and sum of blob sizes (0) do not match
  tree bc2e8dc5: file "UPnPDMS.cpp" blob 0 size could not be found
  tree bc2e8dc5: file "UPnPDMS.cpp": metadata size (3773) and sum of blob sizes (0) do not match
  tree bc2e8dc5, blob 8438c98b: not found in index
  tree bc2e8dc5, blob 789ff404: not found in index
Fatal: repository contains errors

My history so far:

restic rebuild-index
restic prune
restic check --check-unused --read-data
restic check
restic rebuild-index

This indeed looks like there’s still referenced data missing from the repo, so you’d need to find out which snapshots reference the data and then forget these snapshots. There’s no builtin way to do that yet, you can try the code in PR #1780 if you like.

Besides the data that’s missing, you should be able to restore any other file, and browse the files via the fuse mount.

1 Like

Interesting; does this mean a pack file was likely lost/deleted from the storage device somehow?

And which of those error messages gives it away? Is it the “blob … not found in index”? I imagine that is correlated with “blob 0 size could not be found”, since if the blob is missing, its size is too? (Apparently the metadata records and expects a size of 1600.)

Just trying to understand so I can avoid this myself. :slight_smile:

That’s the most likely explanation. The restic check would have succeeded before rebuild-index, because the data was probably still listed in the index, but the new index only contains data that’s actually there. So now it’s an issue which check reports.

Correct, the size could not be found because the blob could not be found. The warning about the size is just the consequence of missing data blobs.

1 Like

Is this a good thing or a bad thing? :thinking: Although this theory makes it sound as if rebuild-index corrupted the repo, I think it’s actually a good thing? If I understand correctly: it means that rebuild-index exposed the corruption (missing blobs) that was undetected before? (Sorry for my relentless questions! So much to learn.)

It’s clearly a good thing, for the reason you already mentioned: without creating a new index, restic doesn’t know there’s a whole file missing. While we could add a check for the presence of a file and the file size, there’s no way to know if the correct data is in the file, so the ultimate test would be to re-download and very all files (with restic check --read-data), but that’s rather expensive. It would have also shown the error that a file is missing.

Btw, there’s also restic check --read-data --read-data-subset 1/10, not sure if you’re aware of it. It’ll take two numbers separated by a slash, so for this run it’ll read the first tenth of all files. When you pass --read-data-subset 2/10 it’ll read the next tenth of the files. So you could download and check one tenth (or whatever you chose) each week and eventually you’ve read all data in the repo. Not just all at once :slight_smile:

1 Like

I have fiber, so doing restic check --read-data is feasible.

Would sanitizing snapshots by removing trees/data that are no available anymore be desirable ?

I recently proposed a PR to exactly do this:

Feel free to test it out!

As this thread seems to be read quite often: This check is now included (was added to the 0.12.0 release)