Identify pack / specific file related to a corrupt blob

I have a machine that was hosting my Restic repository which is known to have some form of memory or other hardware failure. I was in the process of migrating to a new machine but it would appear something happened during this migration which led to more file corruption?

I ran restic check --read-data and got the following output:

restic check  --read-data
using temporary cache in /var/folders/f1/rkqb_s7533g8xyc4zy499g3h0000gn/T/restic-check-cache-3256317274
repository 121d28ef opened (version 2, compression level auto)
created new cache in /var/folders/f1/rkqb_s7533g8xyc4zy499g3h0000gn/T/restic-check-cache-3256317274
create exclusive lock for repository
load indexes
[0:03] 100.00%  29 / 29 index files loaded
check all packs
check snapshots, trees and blobs
error for tree 3a239693:
  decrypting blob 3a2396930013f7dd43e2f9c8d0beb9696b1e76c31484ff9c006c26a35e3cbafa failed: ciphertext verification failed
[2:43] 100.00%  3409 / 3409 snapshots
read all data
Pack ID does not match, want 07f36801ff044c1b1836c36da1c0f6b883dd692794164f720e878aedb49946c9, got 8a2398d54c055eead0e920d34b7ea3ddbc410f476c9203396a38c28d268495b1
Pack ID does not match, want 4c0383eaf0ba19150d35772bcb96efa33105bcf2875b0529486bc38e485b0eaf, got 73e1841998b6d1f65f86ced8727cd9c4390f90ede5ae58a53acc32c15d05c8a9
[2:35:42] 100.00%  68059 / 68059 packs
Fatal: repository contains errors

Luckily, I have daily snapshots of my restic repository with hashes of these snapshot files and I know I have known good copies of everything except the last two days of snapshots so I believe I could hypothetically replace a bad pack with a copy from a disk snapshot.

So my question – is there a way for me to identify which file(s) in the Restic repository are corrupted so I could replace them with known good copies?

I just spent the last minutes trying to figure out exactly that! I found this thread that helped me locate a bad file.

I hope it helps!

Edit: Restic just found another hardware problem. :frowning_face: check --read-data works fine on other machines, so it wasn’t a bad file. At least my repo is fine!

3 Likes

=== Note added afterwards: Skip all of this and jump to the “edits” section at the bottom for a better solution ===

Thanks! That tip helped me figure it out.

For others looking here’s what I did to figure this out:

First I ran restic list index

I then ran this from command line

list="
a77d9b4ebc39686dabe69a49a3667fc9d02e500198219792a5750b8a1829dbf7
...
<complete list of ids output by restic list index>
...
2fe144058a7d1d9b38057d85b3d62c5613fb901e3ae976c83a9fee7e52b70740
"

for item in $list
do
  ./restic cat index $item | grep 3a2396930013f7dd43e2f9c8d0beb9696b1e76c31484ff9c006c26a35e3cbafa >> found.json
done

where 3a2396930013f7dd43e2f9c8d0beb9696b1e76c31484ff9c006c26a35e3cbafa is the originally corrupted blob

I then had a really big file in found.json that I used a json prettifier on (jq could work but I used a plugin in vscode). Then just searched for my missing blob. It helped to have vscode because the json was so big that I just wrote down the line number of my corrupted blob id, then did a search for "blobs" to and looked for the “id” next to the “blobs” element that was in the file with the highest line number without being greater than my found blob line number.

That was the file name of the corrupt file – I then just did a find in the repository directory for that file name and found it was ./repo/data/07/07f36801ff044c1b1836c36da1c0f6b883dd692794164f720e878aedb49946c9 – once I found it I was able to grab a backed up file to replace it with and double checked that the binaries were different by running diff <source filename> <backed up filename> on both to ensure my backup copy was in-fact different (first backup i checked was not … i was worried at that point but found an earlier one was fine). One tip on this was to check the created at date on the file in the repository to make sure you can find which backups would have the file in the first place.

I then saved a copy of the bad file and replaced it with the backed up one and reran restic check and found I now had a clean repository.

======
Second edit afterwards…: I could have just entirely skipped this entire process and just focused on the read data output which said Pack ID does not match, want 07... – that first sha256 on each of those two lines are the files which were corrupted. If I just started by finding non-corrupt versions of those two files I could have skipped the whole script to find the bad pack files (I wouldn’t have known which pack that corrupt blob was in but I would have fixed the issue as both packs were bad and the bad blob was in one of them).

First edit afterwards: see next comment, restic check isn’t good enough if you suspect hardware failure caused disk corruption – run restic check --read-data instead. Also, use sha256sum on the pack instead of diff as 256 should return the filename itself of the pack if its not corrupt

As an aside, I did a restic check initially but it appears that wasn’t good enough to find all the errors. I found another unrelated error by doing another restic check --read-all – so if you think you have hardware failure, that might find additional corrupted file contents so be sure to run this too!

I got the additional output:

[16:16][Pallando:~] % restic check --read-data
using temporary cache in /tmp/restic-check-cache-425441198
repository 121d28ef opened (version 2, compression level auto)
created new cache in /tmp/restic-check-cache-425441198
create exclusive lock for repository
load indexes
[0:04] 100.00%  193 / 193 index files loaded
check all packs
check snapshots, trees and blobs
[1:52] 100.00%  3573 / 3573 snapshots
read all data
Pack ID does not match, want 4c0383eaf0ba19150d35772bcb96efa33105bcf2875b0529486bc38e485b0eaf, got 73e1841998b6d1f65f86ced8727cd9c4390f90ede5ae58a53acc32c15d05c8a9
[1:03:48] 100.00%  68610 / 68610 packs
Fatal: repository contains errors

I luckily found I had a correct version of the corrupt pack. Also note in my earlier comment I was using sha1sum and diff to check between packs to find the good copy but it turns out the file name of the pack IS the sha256sum so that’s an easy way to validate the pack is ok yourself – so in this case I went to find my ./repo/data/4c/4c0383eaf0ba19150d35772bcb96efa33105bcf2875b0529486bc38e485b0eaf file, did a sha256sum of it to see it didn’t match the filename, and then checked earlier versions of backups until I found a good earlier copy at a prior date.

1 Like