FAIL: restic stats --mode raw-data

MichaelEischer · March 27, 2020, 9:45pm

Hmm, when prune complains about an already existing lock, it won’t modify the repository. So maybe the repository corruption already happened earlier? Do you still have the log output of the last few prune runs and if yes, could you check whether you notice any unusual log output in them?

foolsgold · March 28, 2020, 4:31am

I don’t I deleted them

doscott · March 28, 2020, 11:44am

I run a backup / check daily, forget / prune once a week. On Mar 14 backup / check reported no errors. My Mar 15 backup / check report is missing, so that may have been the beginning of the trouble.

My Mar 15 forget / prune log produced

7 snapshots have been removed, running prune
counting files in repo
building new index for repo
[2:00:11] 100.00%  637887 / 637887 packs

incomplete pack file (will be removed): abc0577ecc347f1780fa3371b27799498d0c17e71eb77e9a037194c8d7d6bdaa
incomplete pack file (will be removed): abe612705c6d51b4540100adf77b0673dad355abc0147a3f0006f854c1b2d5a5
 ....
repository contains 637821 packs (3063321 blobs) with 3.036 TiB
processed 3063321 blobs: 0 duplicate blobs, 0 B duplicate
load all snapshots
find data that is still in use for 41 snapshots
[0:53] 100.00%  41 / 41 snapshots

found 2956149 of 3063321 data blobs still in use, removing 107172 blobs
will remove 66 invalid files
will delete 7685 packs and rewrite 3648 packs, this frees 44.556 GiB
[1:52:03] 100.00%  3648 / 3648 packs rewritten
 ....

On Mar 16 a backup / check produced:

Files:          63 new,   119 changed, 176872 unmodified
Dirs:            0 new,     3 changed,     0 unmodified
Added to the repo: 13.661 GiB

processed 177054 files, 2.621 TiB in 27:39
snapshot ff4983a1 saved
using temporary cache in /tmp/restic-check-cache-723633347
created new cache in /tmp/restic-check-cache-723633347
create exclusive lock for repository
load indexes
check all packs
pack 07f9e875: not referenced in any index
pack 04430c7e: not referenced in any index
pack 047a0f0c: not referenced in any index
....
59 additional files were found in the repo, which likely contain duplicate data.
You can run `restic prune` to correct this.
check snapshots, trees and blobs
error for tree caf20d03:
  tree caf20d03: file "The Fugitive.m4v" blob 140 size could not be found
  tree caf20d03, blob 52d680a7: not found in index
....
Fatal: repository contains errors

I followed this with a manual index rebuild, then a check which failed, then a forced backup, then a check which failed. I’m not sure at what point I got the message about the “blobs larger than available” but it was in these manual steps.

My conclusion is that WASABI isn’t suitable for a direct backup from restic but it works reliably with rclone, so I now rclone sync my local backup to WASABI that way.

foolsgold · March 28, 2020, 3:57pm

This is why I didn’t respond to rawtaz’s comment about causes outside restic. A backup program should be able to recover and handle it as well as any other programs out there.

Did you ever try doing a restore to see if it truly was a problem ? I’d prefer providing the help I can to restic to get it fixed than migrating to rclone. But that’s a good alternative solution. Thanks for mentioning.

foolsgold · March 28, 2020, 4:00pm

I wish check actually told me what snapshots had a problem, so I could just delete those so I can know if the whole backup was shot, or if it’s just some bad snapshots.

rawtaz · March 28, 2020, 4:14pm

Look, we’re not having a fight here. But please answer the following:

rawtaz · March 28, 2020, 4:17pm

I personally wouldn’t use Wasabi either, it seems rather unstable to me. But I could be wrong.

Question though; Have you tried using your Wasabi backend from restic via the rclone backend in restic? That is, restic using rclone using Wasabi.

764287 · March 28, 2020, 6:00pm

Have you personally encoutered any issues with Wasabi? I’ve been using Wasabi for multiple repositories for about a year and didn’t encounter a single issue. Furthermore I don’t think I’ve seen a single issue report related to Wasabi. I might be wrong though.

rawtaz · March 28, 2020, 6:03pm

No, I haven’t used it personally, that’s why I could very well be wrong It just feels like there’s reports now and then about problems with Wasabi, and I don’t get the same feeling for other backends, that’s why. I have the same feeling about Backblaze B2, there’s more reports about that too than other backends I think. But take this with a grain of salt, it’s just my personal feelings about it. If Wasabi works for you, by all means use it.

MichaelEischer · March 28, 2020, 7:21pm

Can you look up the creation date of some of the packs that were reported as not referenced in any index in the March 16 check run? My assumption would be that these were created on March 15 during the prune run.

If that’s the case, then restic quite likely had problems with the eventual consistency offered by the wasabi object storage (or quite likely every S3-like storage). That consistency level can lead to a curious behavior: Even though a pack was successfully stored, it maybe not show up in a quickly following list request for the bucket. The pack will eventually (hence the name) show up, but that may take a few seconds. This affects the prune command insofar as that the index rebuild after the packs were rewritten may miss some packs. However, this is just a nuisance as that problem would always be fixable by running rebuild-index sometime later. I’ve already a solution in mind on how to fix the prune command, such that it can properly cope with the reduced consistency level (just get rid of the problematic list request).

Regarding the data loss my suspicion is that this could have something to do with the incomplete pack files reported during the prune run.

In case you still have the logs since the prune run on March 8 (including the prune run), please check the following: The prune run on March 8 should have completed successfully. This would imply that the repository is in a known good state and thus any damaged pack file must have been introduced afterwards.

If that prune run was successful, then please check every backup log between the two prune runs (on March 8 and March 15) on whether these were successful, that is restic must report snapshot <...> saved. A successful backup run cannot leave incomplete pack files behind, as each upload failure causes the backup to abort. There’s one more thing to consider: A single backup run can by construction only introduce up to SaveBlobConcurrency incomplete packs (that’s the number of parallel uploads possible). By default SaveBlobConcurrency is set to the number of CPUs of the host. Is that number multiplies by the backup runs between the two prune runs, larger or equal to 66?

doscott · March 28, 2020, 11:22pm

Sorry, I used to same bucket to rclone my local restic repo in, so I can’t check dates.

March 1 and March 8 prunes reported no errors.

My March 15 report is missing, which would indicate that either: the backup completed and there were a lot of issues so the log was too large to mail; or the backup didn’t complete. WASABI had an incident report on March 16:
March 16, 2020 16:25 UTC

**[Resolved]** We have resolved an issue with the database infrastructure and all regions are processing S3 traffic at normal levels.

and perhaps that was the cause.

Using restic to backup to my NAS has been flawless. I am generally happy with the WASABI cost and lack of ingress/egress fees.

I did not try the rclone backend to restic because of WASABI’s 90 day minimum retention policy, I knew from previous use the rclone sync worked well, and with 3 TB to backup or copy over, I went with what I knew worked. In 90 days I can try something different.

MichaelEischer · March 28, 2020, 11:27pm

Do you have logs with which you could check whether any backup runs between March 8 and March 15 failed?

doscott · March 29, 2020, 12:26am

i had a few wasabi errors on the 9th, but all backups between Mar 8 and Mar 14 reported no errors and passed the checks.