Corrupted repo, how to repair?

thebrave · July 11, 2018, 3:17pm

Hi,

I started making regular backups to Hubic. It worked well until a few days ago, when on another machine, getting the snapshot list would send resting in a big loop of read errors.
I then created an account on Wasabi, used rclone to copy data and resumed making backups.

On the good side, Wasabi seems to be working more correctly than Hubic and I didn’t get a network error yet.

I’m using the word correctly because rclone can list files it can’t read. So I do expect and accept data loss.

I could start a new repo, but…

I have 200 GB already uploaded, so making new backups now should be faster.
Failures also happen in real life, so it’s an opportunity to see how restic reacts when thigs go bad.

I would like to be able to have the check command succeed, by trimming tress that are incomplete or link to incomplete files.

Result of restic check:

(...)
error for tree bc2e8dc5:
  tree bc2e8dc5: file "IUPnPCDSAdapter.cpp" blob 0 size could not be found
  tree bc2e8dc5: file "IUPnPCDSAdapter.cpp": metadata size (1600) and sum of blob sizes (0) do not match
  tree bc2e8dc5: file "UPnPDMS.cpp" blob 0 size could not be found
  tree bc2e8dc5: file "UPnPDMS.cpp": metadata size (3773) and sum of blob sizes (0) do not match
  tree bc2e8dc5, blob 8438c98b: not found in index
  tree bc2e8dc5, blob 789ff404: not found in index
Fatal: repository contains errors

My history so far:

restic rebuild-index
restic prune
restic check --check-unused --read-data
restic check
restic rebuild-index

fd0 · July 11, 2018, 6:47pm

This indeed looks like there’s still referenced data missing from the repo, so you’d need to find out which snapshots reference the data and then forget these snapshots. There’s no builtin way to do that yet, you can try the code in PR #1780 if you like.

Besides the data that’s missing, you should be able to restore any other file, and browse the files via the fuse mount.

matt · July 11, 2018, 6:56pm

Interesting; does this mean a pack file was likely lost/deleted from the storage device somehow?

And which of those error messages gives it away? Is it the “blob … not found in index”? I imagine that is correlated with “blob 0 size could not be found”, since if the blob is missing, its size is too? (Apparently the metadata records and expects a size of 1600.)

Just trying to understand so I can avoid this myself.

fd0 · July 11, 2018, 7:09pm

That’s the most likely explanation. The restic check would have succeeded before rebuild-index, because the data was probably still listed in the index, but the new index only contains data that’s actually there. So now it’s an issue which check reports.

Correct, the size could not be found because the blob could not be found. The warning about the size is just the consequence of missing data blobs.

matt · July 11, 2018, 7:15pm

Is this a good thing or a bad thing? Although this theory makes it sound as if rebuild-index corrupted the repo, I think it’s actually a good thing? If I understand correctly: it means that rebuild-index exposed the corruption (missing blobs) that was undetected before? (Sorry for my relentless questions! So much to learn.)

fd0 · July 11, 2018, 7:47pm

It’s clearly a good thing, for the reason you already mentioned: without creating a new index, restic doesn’t know there’s a whole file missing. While we could add a check for the presence of a file and the file size, there’s no way to know if the correct data is in the file, so the ultimate test would be to re-download and very all files (with restic check --read-data), but that’s rather expensive. It would have also shown the error that a file is missing.

Btw, there’s also restic check --read-data --read-data-subset 1/10, not sure if you’re aware of it. It’ll take two numbers separated by a slash, so for this run it’ll read the first tenth of all files. When you pass --read-data-subset 2/10 it’ll read the next tenth of the files. So you could download and check one tenth (or whatever you chose) each week and eventually you’ve read all data in the repo. Not just all at once

thebrave · July 12, 2018, 11:48am

I have fiber, so doing restic check --read-data is feasible.

Would sanitizing snapshots by removing trees/data that are no available anymore be desirable ?

alexweiss · August 6, 2020, 10:43am

I recently proposed a PR to exactly do this:

github.com/restic/restic

Add repair command

restic:master ← aawsome:new-repair-command

opened 07:42PM - 05 Aug 20 UTC

aawsome

+900 -107

What does this PR change? What problem does it solve? -------------------------…---------------------------- Allow users to recover from broken repositories/snapshots while still salvaging the sane parts of the repository/snapshot. For given snapshots (selection identical to, e.g., `forget`) the command tries to read all trees and checks if the needed blobs are contained in the index. If blobs are missing or trees cannot be read, it will create new trees and snapshots which only miss these "defect" parts. Those newly generated snapshots can be used to recover needed data. Also, after removing the "defect" snapshots, `prune` is able to clean up the repo again. While this command is able to cause data loss, special care is taken such that the default flags won't do any harm - in fact, users have to explicitly specify `--dry-run=false --delete` to loose data. Output looks like: ``` ./restic -r /home/thinkpad/repo.index-missingblob2/repo repair note: --dry-run is set -> repair will only show what it would do. enter password for repository: repository b270637f opened successfully, password is correct check and repair 1 snapshots <Snapshot f22c6d3a of [/home/thinkpad/data] at 2020-07-09 11:18:26.501071439 +0200 CEST by alex@thinkpad>: removed defect file '/home/thinkpad/data/test' would have modified tree 705adc0d would have modified tree 1ee0d0e1 would have modified tree 98c948be would have modified tree 9d89d7fe would have repaired snpshot f22c6d3a. [0:00] 100.00% 1 / 1 snapshots ``` Depends on #2878 for the troubleshooting docu update. Was the change discussed in an issue or in the forum before? ------------------------------------------------------------ Closes #1759 Closes #1798 Closes #2334 Checklist --------- - [x] I have read the [Contribution Guidelines](https://github.com/restic/restic/blob/master/CONTRIBUTING.md#providing-patches) - [x] I have enabled [maintainer edits for this PR](https://help.github.com/en/github/collaborating-with-issues-and-pull-requests/allowing-changes-to-a-pull-request-branch-created-from-a-fork) - [x] I have added tests for all changes in this PR - [x] I have added documentation for the changes (in the manual) - [x] There's a new file in `changelog/unreleased/` that describes the changes for our users (template [here](https://github.com/restic/restic/blob/master/changelog/TEMPLATE)) - [x] I have run `gofmt` on the code in all commits - [x] All commit messages are formatted in the same style as [the other commits in the repo](https://github.com/restic/restic/blob/master/CONTRIBUTING.md#git-commits) - [x] I'm done, this Pull Request is ready for review

Feel free to test it out!

alexweiss · February 21, 2021, 5:26am

As this thread seems to be read quite often: This check is now included (was added to the 0.12.0 release)