Restic check --read-data fails

schmitch · January 26, 2023, 8:57am

hello,

we actually use restic backup and restic check --read-data under windows.

and we also backup to two different smb shares:

\smbshare1\data
\smbshare2\data

and we do this with the LocalSystem account.

at the end of the day we do a restic backup && restic check --read-data for both repos (on a different time, one at 20 o clock the other at 2 o clock and they never run at the same time)
however the second repository always prints a pack id does not match. if I delete the second repository and let it recreate it, it will work for the first time and the second time it will fail with the same error.

however if I login as an administor and connect to both repositories and do a restic check --read-data it works just fine, is there any way to somehow debug the proble or what exactly can be the problem?

MichaelEischer · January 26, 2023, 10:08pm

That sounds like an issue with the file permissions. Which restic version are you using? And what is the full error message?

schmitch · January 31, 2023, 7:44am

Hello,

thanks for your answer, currently we changed our IIS pool from LocalSystem to Administrator and removed the restic cache and than it looked fine for a few days.
However know we have the problem once again. (after like 3 days of working correctly)

the complete error message is as follow:

using temporary cache in C:\Users\ADMINI~1\AppData\Local\Temp\1\restic-check-cache-2356316313
repository 7fa1f5f4 opened (version 2, compression level auto)
created new cache in C:\Users\ADMINI~1\AppData\Local\Temp\1\restic-check-cache-2356316313
create exclusive lock for repository
load indexes
check all packs
check snapshots, trees and blobs
[0:00] 100.00%  10 / 10 snapshots
read all data
Pack ID does not match, want 18ca21ecd4789ac6609f7f2a9eb565ea52dbac8174ec6491881112740166562a, got 5e32fdba1933aa64832998ecb550c3fe74030884298dd0ce4205643e3c62975c
[6:48] 100.00%  1550 / 1550 packs
Fatal: repository contains errors

we basically upgraded to the lastest restic version because of this problem so at the moment we are using:

restic 0.15.0 compiled with go1.19.5 on windows/amd64

(or at least it was the newest version)

the repository is a windows smb one, which sadly is not managed by myself.

MichaelEischer · February 4, 2023, 11:24am

Can you manually take a look at the file in data/18/18ca21ecd4789ac6609f7f2a9eb565ea52dbac8174ec6491881112740166562a in the repository (that’s the one check complained about) and compute its sha256 hash?

Is the smb connection authenticated and encrypted? I wonder whether the file could be corrupted while transferred over the network.

schmitch · February 6, 2023, 9:57am

I’m not sure if I still have data/18/18ca21ecd4789ac6609f7f2a9eb565ea52dbac8174ec6491881112740166562a since we remade the backup again. It’s really strange the first share will work just fine, only the second will fail (the second is in a different building, thats the only thing we know)

However at the moment we have the following pack error: Pack ID does not match, want 071ac34af9cc6193d561869ea450497e78f918dab2c62593409a168dd365d184, got 941595254f6884b41bd8d3153d824cd5af4c571f82da163d8d6679a27344b145

so I checked for data/5a/5a3213ec2cd60e0f171e73d4a14920e4a7fce3c6d6b59e5715c75fe298814871 and the shasum was:

071AC34AF9CC6193D561869EA450497E78F918DAB2C62593409A168DD365D184 it’s uppercase because I used powershell Get-Filehash NAME -Algorithm SHA256

it’s authenticated, but I’m not sure if it’s encrypted, we connect via net use so it might be unencrypted since that is the default I guess.

when a file gets corrupted while transferred than there is no retry in restic? i.e. blobs won’t get checked after upload and retried again? I think that looks like a useful functionality to have a backup --retry 3 or something? (but with sha256 check)

whats even more strange, I made the integration check manually and sometimes the error that I had at friday night is gone, if I try it at monday morning.
so it’s really really strange, because the integration check directly after the backup failed, however waiting for two days it looks fine.
for me that looks like a really really strange error, especially because it only happens on a single repository.
might be restic cache related?

even worse after retrying it later it will print an error again… if the backup succeeded.

I know have another error with:

Pack ID does not match, want d633ba73c13be701e81aca2a53cf9da848fdcecc5d2a62c2b836f733a061fb7d, got 988499a09fb8246ae4555ae93ca361367d8fdb47f1d9e5534d892bf75c1158ff

and the pack file is indeed incorrect: 988499A09FB8246AE4555AE93CA361367D8FDB47F1D9E5534D892BF75C1158FF

MichaelEischer · February 6, 2023, 10:12pm

Ah, sorry I meant SMB signing (Overview of Server Message Block signing - Windows Server | Microsoft Learn) and not just authenticated.

For restic to retry a corrupted upload, it first would have to know that the upload was corrupted. In fact uploads are already retried if they cause any error. But that has to rely on learning that something went wrong.

check uses a temporary cache and downloads all files again from the repository. So there is no cache which is reused between different check invocations.

schmitch:

I know have another error with:

Pack ID does not match, want d633ba73c13be701e81aca2a53cf9da848fdcecc5d2a62c2b836f733a061fb7d, got 988499a09fb8246ae4555ae93ca361367d8fdb47f1d9e5534d892bf75c1158ff

Did the old error disappear? Or do you now see the old and the new error? Does the 5a3213ec2cd60e0f171e73d4a14920e4a7fce3c6d6b59e5715c75fe298814871 pack file still exist? Is it’s hash now correct or still wrong?

MichaelEischer · February 6, 2023, 10:15pm

Assuming that the affected file still exists, but the error disappeared, then the most likely cause is a bad RAM module. In fact most “Pack ID does not match” errors turn out to be related to hardware problems.

schmitch · February 7, 2023, 7:56am

Ah, sorry I meant SMB signing (Overview of Server Message Block signing - Windows Server | Microsoft Learn) and not just authenticated.

I’m pretty sure thats also not turned on, but I would need to check? is that turned on by default in a windows server? we are actually not part of the domain of the smb server, so I’m pretty sure that this is not enabled.

For restic to retry a corrupted upload, it first would have to know that the upload was corrupted. In fact uploads are already retried if they cause any error. But that has to rely on learning that something went wrong.

but doesn’t restic create sha256’s hashes for files anyway? so wouldn’t it be possible to create a sha256 hash on the remote and than if it does not matches it will retry the pack file? (like rsync does?)

For restic to retry a corrupted upload, it first would have to know that the upload was corrupted. In fact uploads are already retried if they cause any error. But that has to rely on learning that something went wrong.

ah ok, that is good to know, but I’m pretty sure that are just simple errors and not data corruption right?

Did the old error disappear? Or do you now see the old and the new error? Does the 5a3213ec2cd60e0f171e73d4a14920e4a7fce3c6d6b59e5715c75fe298814871 pack file still exist? Is it’s hash now correct or still wrong?

The old error did disappear.

5a3213ec2cd60e0f171e73d4a14920e4a7fce3c6d6b59e5715c75fe298814871 exists and the hash is correct
yesterday I had the following error:

Pack ID does not match, want d633ba73c13be701e81aca2a53cf9da848fdcecc5d2a62c2b836f733a061fb7d, got 988499a09fb8246ae4555ae93ca361367d8fdb47f1d9e5534d892bf75c1158ff

yesterday the sha of d6… was in fact 988 today it is correct.

Assuming that the affected file still exists, but the error disappeared, then the most likely cause is a bad RAM module. In fact most “Pack ID does not match” errors turn out to be related to hardware problems.

okay! thank you very much for your extensive help. really such problems are really hard to grasp and it’s helpful to get a little bit more insight.

MichaelEischer · February 9, 2023, 9:54pm

From what I can tell, it is a parameter that can be configured for each SMB server, it should be independent of domain memberships etc. It might be enabled by default for shares using SMB2/3, but I don’t know enough about Windows servers to say anything definite.

The SMB share is only accessible via the file interface, so we can’t run commands at the remote host?
With the described behavior it’s also unlikely to help: directly after writing the data, it is still stored in the page cache, so we’d just verify that but not what is actually written on disk.

Pack ID mismatches can only be caused if the pack file content is corrupted somewhere along the way. These are nearly exclusively caused by hardware issues.

schmitch:

The old error did disappear.

5a3213ec2cd60e0f171e73d4a14920e4a7fce3c6d6b59e5715c75fe298814871 exists and the hash is correct

yesterday I had the following error:
Pack ID does not match, want d633ba73c13be701e81aca2a53cf9da848fdcecc5d2a62c2b836f733a061fb7d, got 988499a09fb8246ae4555ae93ca361367d8fdb47f1d9e5534d892bf75c1158ff
yesterday the sha of d6… was in fact 988 today it is correct.

This only leaves two possible explanations: either the data is corrupted in memory at the server which hosts the SMB share or while transferring the data to the client running restic. The most likely variant is that there’s something wrong with the server which hosts the SMB share. With the given symptoms it’s very likely to be a faulty RAM module.