I occasionally run an external check of my repository data consistency by calculating the
sha256sum of each file and comparing it to the filename. (Not as good as --read-data, but much much faster).
Today, I found an error. One of my data files is corrupted. Only one. Index and snapshot files are OK.
At this point, I don’t want to focus on HOW it became corrupted… I want to understand how to return my repository to a healthy status.
- Is there a process for identifying the data that is contained in that data file, and removing references to it?
- Or can I identify which files in which snapshots are affected?
If you can take a backup of the repository (using hard links or LVM/btrfs/ZFS snapshots to avoid actually copying all of the data) you could try running
restic prune. It’s entirely possible that this pack isn’t even used. If so, prune will just discard it and we don’t have to dig any further.
Thanks for responding!
It’s a good idea, but no luck. It’s in use.
restic prune completed successfully, but this file survived, and it’s still corrupted.
Any convenient ways to identify which snapshots use it, and which files are impacted by this corruption?
I noticed that i can
restic cat snapshot [snapshotId] which gives me a tree.
Then I can
restic cat blob [treeId] which gives me subtrees.
I also found my packfile in an index. It contains 3 blobs.
I guess I could walk each of these subtrees until I find a reference to these blobs. Is there a better way?
If you move the pack out of the repository and then run
restic rebuild-index, either
restic check or
restic prune (I forget which) should dump a list of which snapshots depend on the objects in that pack.
Thanks for the continued advice.
Unfortunately, this still didn’t get me to my goal (identifying snapshots that rely upon the corrupted packfile). After removing the bad pack and rebuilding the index:
restic check reports the trees and hashes that have missing blobs, but not the snapshots. (sample output below)
restic prune fails. (“number of used blobs is larger than number of available blobs”)
Appreciate the guidance, and open to trying other methods to identify the impact of this corrupted pack.
# restic -r . check
using temporary cache in /tmp/restic-check-cache-12345678
repository 12345678 opened successfully, password is correct
created new cache in /tmp/restic-check-cache-12345678
create exclusive lock for repository
check all packs
check snapshots, trees and blobs
error for tree 12345678:
tree 12345678: file "filename" blob 117 size could not be found
tree 12345678: file "filename" blob 119 size could not be found
tree 12345678: file "filename" blob 120 size could not be found
tree 12345678 , blob 3e191caf: not found in index
tree 12345678 , blob 9ab1a5fa: not found in index
tree 12345678 , blob 5ed8479f: not found in index
error for tree 9abcdef:
tree 9abcdef: file "filename" blob 117 size could not be found
tree 9abcdef: file "filename" blob 119 size could not be found
tree 9abcdef: file "filename" blob 120 size could not be found
tree 9abcdef, blob 3e191caf: not found in index
tree 9abcdef, blob 9ab1a5fa: not found in index
tree 9abcdef, blob 5ed8479f: not found in index
Fatal: repository contains errors
If you use one of the latest beta builds, it will self-heal the repo if you can manage to make a new backup which contains the file(s) which in turn contain the missing blobs.
restic cat blob 12345678 and
restic cat blob 9abcdef (these are the trees check reported errors for) can give you a hint about what files this may be…
However, I’m puzzled that the tree blob IDs should be exactly
9abcdef- are these the real IDs or did you replace them?
There is another option, if you are willing to compile a PR:
will scan all snapshots for files and trees to repair. It will thus tell you which snapshots are affected and which files within these would be deleted or modified to repair your repo.
Again, after identifying the best option is to re-run a backup given that you can manage to re-backup the damaged files. This will automatically correct all affected snapshots!
Only if you do not have these files any longer, either run repair without dry run or simply delete (
restic forget) the affected snapshots if you can spare them. Then prune should work again and will leave a sane repository.
Thanks for the detailed response!
- Self-healing and repair options sound interesting. I look forward to seeing them in an upcoming release!
- I replaced the blob IDs in my message. Pointless, no doubt, but wtf
The self-healing part already works in previous restic versions if you run
backup --force ..., the new part is that restic should handle all cases completely automatically.
You can use the
find --tree treeID to search for snapshots+path which contain a certain tree.
Wow, both of those are really, really helpful - thank you!
restic find --tree would have been exactly what I needed last night. Not having that, I instead wrote a quick & dirty C# library to open the repository, parse the snapshots and walk the tree to determine which snapshots referenced my broken packfile.
Still, it was an interesting learning project. Now I know a lot more about the repo structure.
Backups deduplicate against the index; first you need to remove the offending pack and run
restic rebuild-index or future backups still won’t add the missing/corrupt data since they think the repository still has it.
Thanks to all of you for your help. My repository has now been repaired and is checking successfully.
In summary, here’s what I did:
- Identified the snapshots that referenced my broken packfile
- Determined that I could live without those snapshots
restic forget [all those snapshot IDs] --prune
restic check --read-data