Another case of number of used blobs is larger than number available blobs

deanroker123 · May 15, 2020, 2:11pm

Hi,
I have 2 repos in wasabi using the rclone back end to access them.

I have the same problem with both repos.
I have read other forum posts, but I cant seem to find out how you can identify the problem shapshots that have the missing blobs, or what files they effect.

I want to try and repair the repos rather than starting again as its a lot of data.

I have a local restic repo with snapshots in that are created at a similar time. Is there any way I can get the blobs back from there?

I have tried using the --force backup option to try and put the blobs back.

Could a corrupt cache be causing this problem?

Thanks
Dean

MichaelEischer · May 15, 2020, 5:55pm

The usual steps would be to run rebuild-index to make sure that the repository index only contains existing blobs. Afterwards restic check should report missing blobs / trees. restic find --tree <tree-id> or restic find --blob <blob-id> will then complain about the affected snapshots. backup --force ... can only help if the check command did also complain about missing blobs/trees.

Regarding a corrupt cache: Does restic complain about invalid/incomplete pack files? If yes, then you could try to remove the affected files from the local cache (in case they exist).

If you have a similar backup repository, then you could try the following PR which adds a command to copy snapshots. That would allow you to copy the similar snapshots which either repairs the damaged ones or you could just use the copied snapshots instead of the damaged ones.

deanroker123 · May 27, 2020, 10:22am

Thanks Michael,

I will give it a go and report back.

Thanks

Dean

deanroker123 · May 27, 2020, 2:43pm

@MichaelEischer

I ran rebuild index and restic check.
I get errors like
error for tree 9178405e:
tree 9178405e: file “samba4_private.201219.tar.bz2” blob 1 size could not be found
tree 9178405e: file “samba4_private.201219.tar.bz2” blob 2 size could not be found
tree 9178405e, blob 1efe2f01: not found in index
tree 9178405e, blob 4ffcadb2: not found in index

and

error for tree 2f0bed8f:
tree 2f0bed8f: file “Cashflow.xlsx” blob 0 size could not be found
tree 2f0bed8f: file “Cashflow1.xlsx” blob 0 size could not be found
tree 2f0bed8f, blob 959fe88e: not found in index
tree 2f0bed8f, blob 03c94a4b: not found in index

When it says blob size 0 what does that mean?

Thanks
Dean

dionorgua · May 27, 2020, 3:13pm

File Cashflow.xlsx is splitted to some blobs. And first blob (with index 0) is not found. As far as I understand you just lost 4 blobs.

I don’t know how this happens, but if you can obtain mentioned files at that point of time (from another backup you mentioned), just put it to some directory (not necessary to their usual place) and backup them to same repo. If they are exactly same, restic will save these blobs and ‘recover’ old snapshots.

deanroker123 · May 28, 2020, 8:54am

@dionorgua I think the file is an old version so I wont have access to it, I just hope the same blob isnt used in the file in lots of versions.

@MichaelEischer I ran find tree and it didnt complain, it said the following.
repository 233a6b90 opened successfully, password is correct
Found tree 79a120f72b9662a8bf2a3deee2efc3004d4169ea3a72cb2b6a2fdcc4af507769
… path /home/shares/unldocs/iris_backups/Data/SQLBAK
… in snapshot f79c4c1d (2020-01-26 23:40:07)

Should it complain?

Thanks

Dean

dionorgua · May 28, 2020, 11:00am

A bit more verbose:

If all errors from restic check looks exactly same (except file name/blob index), then you missed only ‘data’ blobs. That’s actually why restic tree works.

Are you getting only these messages mentioned above? Or maybe some other blobs? Could you please post whole log of restic check?

To understand what snapshots are affected, just issue followed 4 commands:

restic find --blob 1efe2f01
restic find --blob 4ffcadb2
restic find --blob 959fe88e
restic find --blob 03c94a4b

Restic will print ALL usages of these missing blobs. Something like:

Found blob 15128b5bc712baa8d6052d4f86f1161f24408bc313dd6765dcb32c567a0ec58c
 ... in file /tmp/test_data/a/big_dir/mesa-20.0.4/src/intel/compiler/brw_eu_util.c
     (tree 19470a46d2f57009cd6d40e910b2a8e14c5140411d32a5ce423baaeeb29222ea)
 ... in snapshot 88350c97 (2020-05-27 20:56:05)
Found blob 15128b5bc712baa8d6052d4f86f1161f24408bc313dd6765dcb32c567a0ec58c
 ... in file /tmp/test_data/a/big_dir/mesa-20.0.4/src/intel/compiler/brw_eu_util.c
     (tree c585b98dd5a411c1240fa04c3cfe09b0760d53b2ea47bf5822f04d1fac4efed2)
 ... in snapshot eae9cc08 (2020-05-28 13:43:10)

You’re getting information about snapshot, date and file path. If you issued this command for ALL missing blobs, you’ll know all affected snapshots.

If you have access to these files at that period of time in your other repo, just restore these files to local disk (to any directory) and then backup them again to that corrupted repo.

If you don’t have access to that files you can just forget these snapshots completely (like there was no backup at that time)

PS. I think that you can also try to ‘remove’ these corrupted files from snapshot using

But it’s not tested well…

deanroker123 · May 28, 2020, 12:52pm

So I did find on one of the blobs in another xlsx file that was missing.
It said its in a number of snapshots.
I checked and the file still exists in the in the same place and hasn’t been modified for ages.

Why didn’t the --force backup run fix this file?

Something weird is going on here.

Could the cache be causing a problem when I did the backup run?

dionorgua · May 28, 2020, 1:43pm

As far as I know restic validates that cache is still valid (not removed). But to be 100% sure, just remove it locally.

rebuild-index and then backup --force should fixe such issues (if files are still present and not modified).

Are you sure that file is not changed? Some times ago there was topic here that MS Excel may silently modify file (even if you don’t change it and don’t trigger ‘save’ action). And even more, to hide this, excel restores ‘modification time’ of file so that this change is not visible:

deanroker123 · May 28, 2020, 3:59pm

@dionorgua
its odd as the file appears in backups going back to last year and some relatively recently.

I have rebuilt the index, I have cleared the cache and I am trying a force backup again.

Ill keep you posted.

Could there be a problem going via rclone backend if there are network problems?
I have 570 blobs missing which seems like a lot.

deanroker123 · May 28, 2020, 5:14pm

@dionorgua
I finished that and now I only have 569 blobs missing.

Thats a lot of blobs to find overlapping snapshots in. Is there an easier way of doing it than running restic find for each blob ? It takes a long time to do a find for each blob.

Thanks
Dean

dionorgua · May 28, 2020, 5:39pm

As far as I see you can find multiple blobs simultaneously:

restic find --blob blobA blobB blobC

it should be much faster than 569 calls with single blob.

deanroker123 · May 28, 2020, 8:04pm

@dionorgua Excellent, Im running it now.

Any ideas how so many blobs could go missing? Wasabi is supposed to be pretty reliable. I can only think there must be a bug somewhere?

doscott · May 28, 2020, 10:21pm

I have had similar problems in the past with Wasabi. I switched for a while to just doing an rclone sync of my local backup, with no issues. After a few weeks I started doing restic backups again using the rclone backend. After a couple of weeks I encountered trouble again but a rebuild fixed things. These problems always occured after a forget/prune operation. I have switched to the following sequence for prunes:
forget (without prune)
check
prune
check

It’s been a few weeks and I have yet to encounter any further problems. There were some suggestions that timing on Wasabi’s end may be a problem, and that may explain why the above sequence is avoiding the problem, or it may be just blind luck.

deanroker123 · May 28, 2020, 11:18pm

@doscott
Thanks for the info. Digging into it there seem to be a few snapshots that have more than one missing blob, and the same snapshots show up quite a lot.
I still have the logs from the backup and the prune operation from every run going back a year.
I cant see any errors in the backup, forget or prune commands.

Very strange. I might have to go to the rclone method. The local repos are all fine.

dionorgua · May 29, 2020, 6:56am

I have no idea why it happens. Maybe Wasabi just lost a few pack files…
Personally I use local rest-server based storage, with rclone mirror to Backblaze. Totally ~3 machines are backuped to this repo and total repoistory size is usually around 2.5TB now. I do just forget --prune every 3-4 weeks manually (on local repo). Never observe such things myself…