Redownload Pack on Failed Verification

No worries. It takes me some time to respond as well. :grinning: Iā€™ll check out your branch, but I noticed that running the check with --no-cache option seems to be a fix. At least it has not happened in the past 4 checks.

@MichaelEischer, I have been running the check command from your branch regularly and I am waiting for the issue to occur again. So far no dice, and this is with the original memory too. Iā€™ll report back when it happens again and I have additional logs.

1 Like

Maybe I am missing something, but is there a resolution? I ran the memory checks extensively. Before this happened, I had an error issue like the one above. There was an error in my bios events log. Supermicro specifcially and it was an uncorrectly ECC error a few months ago that happened to coincide with a bios update. Anyways, went through all the testing, but the memory would fail consistently when using Memtest86+ 5.31 when using the Multi Core Parallel mode. I managed to find one or two forum posts some where that mentioned this was more of a software issue.Anyways, I ran the Single Core, Roung Robin and Sequential Tests for over 140 Hours to be sure. My other devices didnā€™t have that error though.

Anyways, I have been trying to restore some backup files. One was ~27GB, andother is ~67. Both have failed with a ciphertext verification failure. Iā€™ve tried different devices as well to try and rule out a memory issue.
This is using onedrive business with rclone backend.
Anyways I just tried backing up and restoring an ISO, it completed without errors, but itā€™s only 4 GB.

Any thoughts?
I have run check, check --read-data
prune, rebuild-index. Everything seems to be coming back fine or repaired, but restoring time and time again failsā€¦

You say that check --read-data completed without errors? Then you could try to remove the local cache for the problematic repository, restic cache shows where the cache is located. Afterwards Iā€™d expect that restore succeeds.

I was still getting some errors. I didnā€™t have time to note them all down. I ran those commands and it either kept turning up with errors, or the download would always terminate once it reached 12GB out of 27GB . I tried eventually deleting the file. Reuploading it and then I was able to download the snapshot successfully after that.
The difficult part I am dealing with is the fact that this was a test file. However, I have setup a few cron jobs which will run weekly so I can mack weekly backups of a server VM. These servers donā€™t have unlimited storage and so I will really only have room for 1 large backup per week. I donā€™t have the convenience of uploaded the same file again after as it will be rewritten with a new more recent file.
Does that make sense? I will try to repair the repositories and see if the restores will be successful.
Thank you so much for your assistance and direction. I really appreciate it.
One thing though is since I also have access to another NAS device at another location and I will try to Rsync the files over there as well so I have a copy. Since those drives are secure and we have access to it, I figure Iā€™ll just use Rsync over SSH for now. The restic process is helpful when storing on a third party server or some type of proprietary cloud serviceā€¦

That sounds a bit like a problem with the storage backend for the repository. Are you maybe running into some kind of rate limit on the onedrive side?

What would be a sign of a rate limit? Quite possibly.

When I was just uploading a backup to the one drive rclone backend restic repository. I was getting several Errors from rclone saying
Post request put error: activityLimitReached: throttleRequest: The request has been throttled.
When trying to restore the latest backup today, the size of the file is correct in the end, but the stats show Error, didnā€™t finish writing GET Request.

I canā€™t recall right now, but there was an EOF error earlier. I just donā€™t have that showing up now. It may be when trying to prune or check --read-dataā€¦

This is the latest attempt to restore.

Load(<data/96384576e7>, 8531018, 0) returned error, retrying after 461.576101ms: <data/96384576e7> does not exist
Load(<data/d59b4256c7>, 1368142, 3264608) returned error, retrying after 515.292858ms: <data/d59b4256c7> does not exist
Load(<data/688aee0bd1>, 1297807, 0) returned error, retrying after 376.77025ms: <data/688aee0bd1> does not exist
Load(<data/a56dfb498e>, 767332, 3837717) returned error, retrying after 391.040497ms: <data/a56dfb498e> does not exist
Load(<data/fbbf134c06>, 4887206, 0) returned error, retrying after 644.302458ms: <data/fbbf134c06> does not exist
Load(<data/c5cb513e71>, 5122663, 0) returned error, retrying after 430.90274ms: <data/c5cb513e71> does not exist
Load(<data/596b92f3cd>, 946038, 4153438) returned error, retrying after 690.271562ms: <data/596b92f3cd> does not exist
Load(<data/c25dd068db>, 3234432, 946550) returned error, retrying after 398.55613ms: <data/c25dd068db> does not exist
Load(<data/da6a86bf70>, 993187, 3234944) returned error, retrying after 697.180865ms: <data/da6a86bf70> does not exist
rclone: 2020/11/13 01:50:11 ERROR : data/d1/d1c8f134ca229698fdb26a26ce397efec09e67a80bad3a11e20992bf2e830be0: Didnā€™t finish writing GET request (wrote 4464654/4472305 bytes): http2: stream closed

Do errors for the same filename e.g. data/c5cb513e71 repeat multiple times? Do you see retries which wait for more than 10 seconds? If no then restic has resolved the displayed errors automatically by retrying the failed requests.

You could also try to copy the repository to a local folder and try to restore from there, just to verify that the repository is intact.

@MichaelEischer
Thank you for the follow up. The error does not constantly repeat itself. At most there is one more repeat of the exact same filename, and it does not wait for more than 10 seconds. The errors go to a max of about 500ms.

Is there a specific way to restore the repository to a local folder?
I have tried restore --target and a local folder, but have not done the whole repository.
Thanks so much. At present I have mounted the repository to the local computer and using rsync to transfer a vzdump zst image for my Proxmox server backup.I want to try and see if I can restore without running into a corrupted block errorā€¦

That looks like restic is able to read the pack files after one or two retries. So you should be able to ignore these errors.

You could also try to copy the repository to a local folder and try to restore from there, just to verify that the repository is intact.

My idea there was to just copy the repository files to some local storage and then let restic access it via its local backend (just specify the folder as repository path).

Thanks Michael. I am in the process of doing this now.
Every process I run completes that much faster now that the repository is local.
These are the messags I get running restic prune

counting files in repo
building new index for repo
[0:12] 100.00%  95403 / 95403 packs
repository contains 95403 packs (318661 blobs) with 475.186 GiB
processed 318661 blobs: 135799 duplicate blobs, 201.534 GiB duplicate
load all snapshots
find data that is still in use for 9 snapshots
[0:00] 100.00%  9 / 9 snapshots
found 182179 of 318661 data blobs still in use, removing 136482 blobs
will remove 0 invalid files
will delete 116 packs and rewrite 43209 packs, this frees 202.524 GiB
[2:43] 3.26%  1407 / 43209 packs rewritten

It doesnā€™t usually complete and I will run it again, but then it shows a different number of packs needing to be rewritten. The amount of space to be freed goes up, but the number of packs to be deleted remains unchanged. The amount of space freed is suprising considering that my repository is something like ~ 370 GB in size when I copied it over and there are 9 snapshots, with each being about 90GB each of a zst files.

@Jarvar The pruning is another issue and does not fit into this topicā€¦
If you abort the prune, only new files have been created which need processing in the next prune run.

If your prune speed is too slow, consider using the latest beta and play around with the new options which were added to prune.

I am wondering if the issue is with the actaul file backup medium which are vzdumpā€¦vma.zst file formats from proxmox backups. Or if itā€™s the issue with Microsoft One Drive with Rclone as the backen which connects it to Restic.
When the backups are small, lets say a few GB (3-4GB tested so far), the restored file seems to work without issue. However, when I am dealing with backups that are 60-100or more GB then those files turn out corrupt. I tried restoring those files on the original computer which made the backups before uploading them so I know they work. However, when I restore the files to another computer remotely, then try to unpack the file there is a corruptionā€¦