Blob returned invalid hash check error

ajs · November 20, 2019, 5:00am

Hi,

I’ve been backing up to my restic repo on backblaze b2 for about a year now. Just the other day after doing my weekly restic prune/check I get the following error during the check phase:

Running restic 0.9.5 compiled with go1.12.4 on linux/amd64

load indexes
check all packs
check snapshots, trees and blobs
error for tree d980d7b9:
  blob d980d7b9692633e1373381d736b9511c052fc47ee611514e3bf1fa807e7aa3e0 returned invalid hash
Fatal: repository contains errors

I’ve attempted to rebuild index, that did not fix the issue. Can anyone help me figure out how to fix this error in my repo? I have around 600GB of data backed up to B2, I’d prefer not having to re-upload everything.

Thanks

764287 · November 20, 2019, 9:41am

Restic computes a hash of every blob it saves. The error indicates the the block was somehow modified either in transit or when it was written to the repository. Can you try to manually download the blob any verify the hash?

If the hash is indeed invalid or incorrect you could try deleting the blob and making a backup of files which contain exactly this blob. TBH, I’m not sure it this works - maybe someone with more knowledge can confirm.

Another solution is to forget & prune every snapshot which contains this blob.

ajs · November 20, 2019, 9:31pm

Thanks for the reply. I was able to list all blobs and find that the corrupted one is a “tree” blob (opposed to data blob). By accident I found the snapshot the blob belonged to (did a find on the blob id and got an error saying it failed to open the bad snapshot). Is there a better way to determine what snapshots own a given blob?

Another concern of mine, when restic does a check, does it ever check the hash of data blobs, or just tree blobs? I am curious what caused this corruption. Could this be a Backblaze failure? Software bug? I don’t see any system errors on the backup host.

Thanks, I’m glad I was able to at least clear out the bad snapshot and resolve the check error.

cdhowie · November 21, 2019, 5:31am

Yes, this can work. However, you must first:

Remove the blob from the repository. If this is the only damaged object in its containing pack, there is no way I know of to retain the good objects and only discard the bad object.
Run restic rebuild-index to remove the bad object from the index. If you don’t do this, backups that would reintroduce the corrupt object will assume that the object is present and not add it. (This is how restic’s deduplication works.)

Nope, that’s exactly how you’re supposed to do it. If there is only one snapshot and you don’t need it anymore, forgetting that snapshot and pruning the repository should remove the bad object.

restic check by itself does not check any data blobs. restic check --read-data will, but note that this necessarily requires downloading the entirety of every pack. If your repository is large, be prepared to see significant egress traffic fees on your next B2 invoice.

I would immediately burn the memtest86+ ISO to a CD from a different system, shut down the system that created the backup, and boot from this CD. (Or use a USB drive if it doesn’t have an optical drive.) The most likely culprit in my experience is bad RAM on the machine that created the backup.

ajs · November 21, 2019, 12:21pm

This would be my initial thought too, but this system uses ecc memory… no ecc memory errors reported from the BMC.

Does restic save the blob directly from memory to the BackBlaze bucket, or does it get cached on the local storage first? This is the only other point of failure I can think of, other than backblaze itself or a restic/os bug. While the data being backed up resides on a zfs filesystem, I believe the restic cache directory is not zfs, just ext4 on SSD backed media.

Thanks!

cdhowie · November 21, 2019, 8:06pm

A temporary local pack file is assembled in the temporary directory (/tmp by default) and then uploaded once a size threshold is passed.