Help debugging a blob invalid hash

u_x · September 4, 2021, 11:58pm

Hi,

Issue

I made a ~1T snapshot, but trying to copy the repository from another machine caused this after several hours (after importing about 260G of data):

LoadBlob(9f02880a10db561af97a6e1d69d3cc85936951fc6eb020c638f5422ea2268c27) returned error blob 9f02880a10db561af97a6e1d69d3cc85936951fc6eb020c638f5422ea2268c27 returned invalid hash

Context

the host machine is an anemic mips machine, not very reliable
the snapshot took about a week to make
the backup machine is remote and much more reliable
the backup machine uses rest-server (append only)

Investigation

% restic -v stats
repository 390a6747 opened successfully, password is correct
scanning...
Stats in restore-size mode:
Snapshots processed:   1
   Total File Count:   157211
         Total Size:   991.686 GiB

% restic -v check
using temporary cache in /tmp/restic-check-cache-341950296
repository 390a6747 opened successfully, password is correct
created new cache in /tmp/restic-check-cache-341950296
create exclusive lock for repository
load indexes
check all packs
check snapshots, trees and blobs
[0:07] 100.00%  1 / 1 snapshots
no errors were found

% restic -v find --blob 9f02880a10db561af97a6e1d69d3cc85936951fc6eb020c638f5422ea2268c27
Found blob 9f02880a10db561af97a6e1d69d3cc85936951fc6eb020c638f5422ea2268c27
 ... in file /cen/so/red.mp3
     (tree f73fb24fa4f8c0885452a51c3d97912efe44fd8f72907eda446bcada4463a309)
 ... in snapshot cd60b511 (2021-08-29 00:57:08)

I did check the integrity of the file on the host machine (compared to another reference of that specific file I had backed up somewhere else) and it’s correct.

Trying to figure out what the data looks like on the repository to compare how it is altered, but:

% restic -v cat blob
9f02880a10db561af97a6e1d69d3cc85936951fc6eb020c638f5422ea2268c27
repository 390a6747 opened successfully, password is correct
blob 9f02880a10db561af97a6e1d69d3cc85936951fc6eb020c638f5422ea2268c27 returned invalid hash
github.com/restic/restic/internal/repository.(*Repository).LoadBlob
	github.com/restic/restic/internal/repository/repository.go:210
main.runCat
	github.com/restic/restic/cmd/restic/cmd_cat.go:172
main.glob..func4
	github.com/restic/restic/cmd/restic/cmd_cat.go:27
github.com/spf13/cobra.(*Command).execute
	github.com/spf13/cobra@v1.2.1/command.go:856
github.com/spf13/cobra.(*Command).ExecuteC
	github.com/spf13/cobra@v1.2.1/command.go:974
github.com/spf13/cobra.(*Command).Execute
	github.com/spf13/cobra@v1.2.1/command.go:902
main.main
	github.com/restic/restic/cmd/restic/main.go:98
runtime.main
	runtime/proc.go:225
runtime.goexit
	runtime/asm_amd64.s:1371

Questions

Any chance I could force a dump of that blob anyway?
Can I copy the repo while skipping that invalid blob (to see if there are more)
What could be the cause of the such corruption?

MichaelEischer · September 5, 2021, 10:48am

You can use the restic debug examine command, see https://github.com/restic/restic/issues/828#issuecomment-706186047 for more details.

The copy command cannot skip invalid blobs, as that would essentially cause the new repository to be broken. What you can do is run restic check --read-data to let restic verify every singe blob.

As the error is “invalid hash” and not a decryption error, this means that it most likely was a bitflip on the host creating the backup.

u_x · September 5, 2021, 11:49am

Thanks, this is very helpful.

So I identified the pack with find --show-pack-id --blob 9f02880a:

Object belongs to pack fdd48b5c364ad5004324312e10c78bc0101095de141022c8775d14485fd77e73

Then extracted the pack with debug examine --extract-pack fdd48b5c364ad5004324312e10c78bc0101095de141022c8775d14485fd77e73

From here, I was in possession of a wrong-hash-f99b85dbc25b54e1fa16fe75f33118e4a347644f62602913c41907878e902f47.bin file.

Doing a binary diff with my reference doesn’t show a single bitflip as I was expecting but an aligned chunk of 32 bytes of difference:

2021-09-05-133655-cheN9thu

For now I’ll assume the hardware is simply defective and will try to recover from it.

MichaelEischer · September 5, 2021, 1:44pm

That corruption pattern sounds like the bitflip occurred during encryption, but before calculating the authentication code for the encrypted ciphertext.

The simplest way to fix the repository would be to fix the content of the file with the wrong hash. Then remove the damaged pack-file, run rebuild-index and then backup the extracted pack contents. Afterwards the repository should be fine.

u_x · September 6, 2021, 9:43pm

Followup story: Saving a restic backup the hard way

rawtaz · September 6, 2021, 10:10pm

That’s a superb writeup, very very cool! Thanks so much for sharing your story and in particular the details of how you figured it all out and manually patched your repository!

Eli6 · September 7, 2021, 6:53am

Great post!! I have similar errors and I am going to follow what you did this weekend!

u_x · September 8, 2021, 7:40am

Thanks.

Careful, this is not a tutorial, your mileage may vary. Typically, what the story doesn’t tell is that I got a 2nd similar bitflip in that same backup/snapshot later on. In that 2nd scenario, it was the last out of 4 blobs again, but it wasn’t starting at the beginning of the file, so I had to truncate the reference file with a skip option after identifying the correct offset (which I did simply by searching a binary string).

Just make sure you understand every step.

MichaelEischer · September 9, 2021, 7:39pm

@u_x I’ve a few small remarks on the blog post:
As you know the full blob id, I’d recommend to call find with the full id:

restic find --show-pack-id --blob 9f02880a10db561af97a6e1d69d3cc85936951fc6eb020c638f5422ea2268c27

After changing the content of the pack file, you also have to update the pack filename (and run rebuild-index). The pack file name is expected to match the sha256 hash of the pack file content. Right now restic check --read-data-subset 254/256 should report an error.

The nice part about using the high-level repair workflow is that there are next to no special code paths involved. Actually the only special code is that which check whether a file should be read again due to missing blobs. However, when just running backup on the folder which contains the extracted blobs, that is also not relevant either. And the code to rebuild the repository index is also nearly identical to what is used for the prune command.

u_x · September 11, 2021, 2:08pm

As you know the full blob id, I’d recommend to call find with the full id

Yeah I actually did but it was easier to understand with the short version (more presentable for a blog post).

After changing the content of the pack file, you also have to update the pack filename (and run rebuild-index). The pack file name is expected to match the sha256 hash of the pack file content. Right now restic check --read-data-subset 254/256 should report an error.

Ah that’s correct. I just tried that: copied the pack file to its correct hash in the appropriate directory, and ran rebuild-index, but the check still fails: Pack ID does not match, want fdd48b5c, got 81b4816c. (81b4816c is the new name, both packs are present).

The nice part about using the high-level repair workflow is that there are next to no special code paths involved. Actually the only special code is that which check whether a file should be read again due to missing blobs. However, when just running backup on the folder which contains the extracted blobs, that is also not relevant either. And the code to rebuild the repository index is also nearly identical to what is used for the prune command.

Yeah as explained it was also a learning experiment, because I do not understand what the top level tools do. Typically, I should have made the indexing myself, because now I have no idea what changes were made and I have little clue how to move on…

MichaelEischer · September 11, 2021, 10:42pm

Do I understand you correctly that the pack file is now contained twice in the repository? Once with the old, wrong name and once with the new, correct one? If yes, the just remove the file with the old name, then run rebuild-index and then your repository should be fine.

u_x · September 12, 2021, 7:44am

Do I understand you correctly that the pack file is now contained twice in the repository? Once with the old, wrong name and once with the new, correct one? If yes, the just remove the file with the old name, then run rebuild-index and then your repository should be fine.

Yeah they are present twice, but since last time there is a new plot twist: the invalid blob are also present in the tree now (I did make a new backup snapshot, which includes a directory with the invalid + valid blobs manually crafted, because I wanted to keep a trace of them).

rebuild-index was not enough to fix the problem. As far as I can tell there is a double reference somehow.

I guess I’m going to have to inspect the index file(s?) and try to fix that manually.

MichaelEischer · September 12, 2021, 10:29am

What errors does check report at the moment? I don’t see why it should be a problem to have a snapshot that includes the invalid+valid blobs. From restics perspective blobs are either stored in a pack file or not. And in the latter case, the backup will add invalid blobs as new blobs with their correct sha256 hash as blob id to the repository.

Assuming the pack file with the wrong filename is no longer in the repository, then all references to that pack file should have been removed by rebuild-index (which using restic >= 0.12.0 will report which changes were made). If that were not the case, then check will report that a pack file is missing.

I’d strongly recommend not to manually modify the index files. This will most likely either damage the repository, let check report errors and prevent prune from working. If a plain rebuild-index does not work, you could try rebuild-index --read-all-packs which recreates the index from scratch. If that doesn’t help then the problem is very likely that some pack file is still messed up.

The index essentially contains only the information that’s also stored in the pack file headers. That is which blob exists at a certain position in a pack file.

u_x · September 12, 2021, 11:14am

Oh my bad: I was keeping the original pack files because I though they were needed because I was keeping a copy of the broken dumps into the backup itself.

Removing them and re-indexing did indeed fixed the problem, thanks!

I will run a complete read check just to make sure, and make an edit on the blog to mention the rename of the pack file + re-indexing.

Thanks again

MichaelEischer · September 12, 2021, 2:45pm

The important thing to understand is that restic doesn’t care in which pack a blob is stored, as long as the blob exists somewhere in the repository. And that is also the reason why the repository index is required in the first place. Otherwise restic wouldn’t know where to look for a blob.