One data file corrupted - now what?

I occasionally run an external check of my repository data consistency by calculating the sha256sum of each file and comparing it to the filename. (Not as good as --read-data, but much much faster).

Today, I found an error. One of my data files is corrupted. Only one. Index and snapshot files are OK.

At this point, I don’t want to focus on HOW it became corrupted… I want to understand how to return my repository to a healthy status.

  • Is there a process for identifying the data that is contained in that data file, and removing references to it?
  • Or can I identify which files in which snapshots are affected?

Thanks!

If you can take a backup of the repository (using hard links or LVM/btrfs/ZFS snapshots to avoid actually copying all of the data) you could try running restic prune. It’s entirely possible that this pack isn’t even used. If so, prune will just discard it and we don’t have to dig any further.

Thanks for responding!

It’s a good idea, but no luck. It’s in use. restic prune completed successfully, but this file survived, and it’s still corrupted.

Any convenient ways to identify which snapshots use it, and which files are impacted by this corruption?

I noticed that i can restic cat snapshot [snapshotId] which gives me a tree.

Then I can restic cat blob [treeId] which gives me subtrees.

I also found my packfile in an index. It contains 3 blobs.

I guess I could walk each of these subtrees until I find a reference to these blobs. Is there a better way?

If you move the pack out of the repository and then run restic rebuild-index, either restic check or restic prune (I forget which) should dump a list of which snapshots depend on the objects in that pack.

Thanks for the continued advice.

Unfortunately, this still didn’t get me to my goal (identifying snapshots that rely upon the corrupted packfile). After removing the bad pack and rebuilding the index:

  • restic check reports the trees and hashes that have missing blobs, but not the snapshots. (sample output below)
  • restic prune fails. (“number of used blobs is larger than number of available blobs”)

Appreciate the guidance, and open to trying other methods to identify the impact of this corrupted pack.

    # restic -r . check
    using temporary cache in /tmp/restic-check-cache-12345678 
    repository 12345678 opened successfully, password is correct
    created new cache in /tmp/restic-check-cache-12345678 
    create exclusive lock for repository
    load indexes
    check all packs
    check snapshots, trees and blobs
    error for tree 12345678:
      tree 12345678: file "filename" blob 117 size could not be found
      tree 12345678: file "filename" blob 119 size could not be found
      tree 12345678: file "filename" blob 120 size could not be found
      tree 12345678 , blob 3e191caf: not found in index
      tree 12345678 , blob 9ab1a5fa: not found in index
      tree 12345678 , blob 5ed8479f: not found in index
    error for tree 9abcdef:
      tree 9abcdef: file "filename" blob 117 size could not be found
      tree 9abcdef: file "filename" blob 119 size could not be found
      tree 9abcdef: file "filename" blob 120 size could not be found
      tree 9abcdef, blob 3e191caf: not found in index
      tree 9abcdef, blob 9ab1a5fa: not found in index
      tree 9abcdef, blob 5ed8479f: not found in index
    Fatal: repository contains errors

If you use one of the latest beta builds, it will self-heal the repo if you can manage to make a new backup which contains the file(s) which in turn contain the missing blobs.

Maybe restic cat blob 12345678 and restic cat blob 9abcdef (these are the trees check reported errors for) can give you a hint about what files this may be…
However, I’m puzzled that the tree blob IDs should be exactly 12345678 and 9abcdef- are these the real IDs or did you replace them?

There is another option, if you are willing to compile a PR:


Then running restic repair will scan all snapshots for files and trees to repair. It will thus tell you which snapshots are affected and which files within these would be deleted or modified to repair your repo.

Again, after identifying the best option is to re-run a backup given that you can manage to re-backup the damaged files. This will automatically correct all affected snapshots!

Only if you do not have these files any longer, either run repair without dry run or simply delete (restic forget) the affected snapshots if you can spare them. Then prune should work again and will leave a sane repository.

Thanks for the detailed response!

Comments:

  1. Self-healing and repair options sound interesting. I look forward to seeing them in an upcoming release!
  2. I replaced the blob IDs in my message. Pointless, no doubt, but wtf :slight_smile:

The self-healing part already works in previous restic versions if you run backup --force ..., the new part is that restic should handle all cases completely automatically.

You can use the find --tree treeID to search for snapshots+path which contain a certain tree.

Wow, both of those are really, really helpful - thank you!

restic find --tree would have been exactly what I needed last night. Not having that, I instead wrote a quick & dirty C# library to open the repository, parse the snapshots and walk the tree to determine which snapshots referenced my broken packfile.

Still, it was an interesting learning project. Now I know a lot more about the repo structure.

Backups deduplicate against the index; first you need to remove the offending pack and run restic rebuild-index or future backups still won’t add the missing/corrupt data since they think the repository still has it.

Thanks to all of you for your help. My repository has now been repaired and is checking successfully.

In summary, here’s what I did:

  • Identified the snapshots that referenced my broken packfile
  • Determined that I could live without those snapshots
  • restic forget [all those snapshot IDs] --prune
  • restic check --read-data
3 Likes