Inconsistent errors with check --read-data

Hello everyone,

I’ve been using restic for the last half a year, and it’s been working great until I’ve started having issues a couple weeks back. I’m worried my repo could got corrupted.

On Jun 12th this happened:

+ /volume1/backups/backup-tools/restic/bin/restic --password-file=<CUT> -r rclone:jottacloud:backup check
using temporary cache in /tmp/restic-check-cache-911178758
create exclusive lock for repository
load indexes
Load(<index/ef519b76ae>, 0, 0) returned error, retrying after 720.254544ms: <index/ef519b76ae> does not exist
...
<CUT - hundreds messages as above>
error: error loading index 451f331a: <index/451f331a64> does not exist
error: error loading index ef519b76: <index/ef519b76ae> does not exist
...
<CUT - dozens messages as above>
Fatal: LoadIndex returned errors

Now I’ll admit I’m a restic noob, so I did a little bit of googling, and started fixing things. After a while, I think the issue should be fixed, but the restic check --read-data returns errors. I thought this was weird because the issue really should’ve been fixed by now, so I’ve run the check again, and the Pack IDs that were returned were different, see below.

restic check --read-data run 1:

+ /volume1/backups/backup-tools/restic/bin/restic --password-file=<CUT> -r rclone:jottacloud:backup check --read-data --no-cache --limit-download 30720 --limit-upload 30720
repository fa48a53c opened successfully, password is correct
create exclusive lock for repository
load indexes
check all packs
check snapshots, trees and blobs
[5:39] 100.00%  136 / 136 snapshots...
read all data
Pack ID does not match, want 3c5835eb, got 1137930f
Pack ID does not match, want 098b6d74, got 6b607d31
Pack ID does not match, want 2f00b044, got e18c47d6
Pack ID does not match, want 687c948a, got c63485ff
Pack ID does not match, want 035eefd0, got a3e811de
Pack ID does not match, want a06111ee, got d5e1c014
Pack ID does not match, want e8c165ff, got 33bdca13
Pack ID does not match, want 04319aa8, got 8844aea5
Load(<data/6a40ff5807>, 0, 0) returned error, retrying after 720.254544ms: <data/6a40ff5807> does not exist
Load(<data/7d384924ed>, 0, 0) returned error, retrying after 582.280027ms: <data/7d384924ed> does not exist
Load(<data/51436263f0>, 0, 0) returned error, retrying after 468.857094ms: <data/51436263f0> does not exist
Load(<data/5a375c4a7a>, 0, 0) returned error, retrying after 462.318748ms: <data/5a375c4a7a> does not exist
Pack ID does not match, want 7d384924, got 0d0164be
Pack ID does not match, want 30a76979, got 4d35136f
Load(<data/28a559ac44>, 0, 0) returned error, retrying after 593.411537ms: <data/28a559ac44> does not exist
Pack ID does not match, want c9f71c08, got 0cb853bd
rclone: 2022/06/23 23:36:33 ERROR : data/fa/fa7b2faa9acfb73c3fb2a5658e85715751b2d8e5e5d357586c11eae900fb2b1d: Didn't finish writing GET request (wrote 5766616/6884871 bytes): unexpected EOF
Load(<data/fa7b2faa9a>, 0, 0) returned error, retrying after 282.818509ms: unexpected EOF
Pack ID does not match, want 28a559ac, got 5b4545d1
Load(<data/e342c6a897>, 0, 0) returned error, retrying after 328.259627ms: <data/e342c6a897> does not exist
Pack ID does not match, want e342c6a8, got 8931b26a
Pack ID does not match, want 58c226a9, got 31827337
Pack ID does not match, want 3535b1d1, got 9e9db54f
Pack ID does not match, want 0b63fd02, got 236f18e0
Pack ID does not match, want 8f8142e8, got cacbdd61
Load(<data/b2315c92f7>, 0, 0) returned error, retrying after 298.484759ms: <data/b2315c92f7> does not exist
Pack ID does not match, want 00aa29ae, got bb442ccd
Pack ID does not match, want 9364d83b, got 4fadf06d
Pack ID does not match, want 40c4356c, got b6b7ca79
Pack ID does not match, want e532a8fd, got c1f0c8bc
Pack ID does not match, want bdcd7ac2, got aa525695
Pack ID does not match, want 05547bd6, got b8f74683
Pack ID does not match, want c815197d, got 5deadfce
Pack ID does not match, want 43209efe, got 8077ebf6
Pack ID does not match, want f3ca1988, got b90a4944
Pack ID does not match, want e49c3fbe, got 2f155072
Pack ID does not match, want 2fc88b8f, got 49a247b6
Pack ID does not match, want 817d8bb7, got c42bdaf1
Pack ID does not match, want f9180b29, got c1e43649
Pack ID does not match, want 8f3fc571, got 1be57579
Pack ID does not match, want 059f175f, got 8af23084
rclone: 2022/06/24 06:13:40 ERROR : data/50/501ef294dabb37c4d4621aa62c2177c077fb247d26e8068ce08b0703da5d4e47: Didn't finish writing GET request (wrote 5242328/8556846 bytes): unexpected EOF
Load(<data/501ef294da>, 0, 0) returned error, retrying after 400.45593ms: unexpected EOF
rclone: 2022/06/24 06:13:41 ERROR : data/7b/7b14cd3f5b5d875892fa9384167e8d066c3fc36a8b7ef7d686b4a11bf1ad4309: Didn't finish writing GET request (wrote 2615536/5550869 bytes): unexpected EOF
Load(<data/7b14cd3f5b>, 0, 0) returned error, retrying after 507.606314ms: unexpected EOF
Load(<data/b48fd0a7ed>, 0, 0) returned error, retrying after 656.819981ms: <data/b48fd0a7ed> does not exist
Load(<data/501ef294da>, 0, 0) returned error, retrying after 535.697904ms: <data/501ef294da> does not exist
Load(<data/7b14cd3f5b>, 0, 0) returned error, retrying after 660.492892ms: <data/7b14cd3f5b> does not exist
Load(<data/cd771185fc>, 0, 0) returned error, retrying after 409.029087ms: <data/cd771185fc> does not exist
Load(<data/e67f20e2f7>, 0, 0) returned error, retrying after 484.444922ms: <data/e67f20e2f7> does not exist
Load(<data/b48fd0a7ed>, 0, 0) returned error, retrying after 587.275613ms: <data/b48fd0a7ed> does not exist
Load(<data/501ef294da>, 0, 0) returned error, retrying after 892.239589ms: <data/501ef294da> does not exist
Load(<data/7b14cd3f5b>, 0, 0) returned error, retrying after 1.326470261s: <data/7b14cd3f5b> does not exist
Load(<data/cd771185fc>, 0, 0) returned error, retrying after 538.914789ms: <data/cd771185fc> does not exist
Load(<data/e67f20e2f7>, 0, 0) returned error, retrying after 527.390157ms: <data/e67f20e2f7> does not exist
Load(<data/b48fd0a7ed>, 0, 0) returned error, retrying after 968.480344ms: <data/b48fd0a7ed> does not exist
Pack ID does not match, want 3b833150, got 4bec284b
rclone: 2022/06/24 07:16:58 ERROR : data/11/11a856769d42230d25dc7e0bda025215d06ed27fe8ec2a8e496401f7a152f900: Didn't finish writing GET request (wrote 0/8781706 bytes): unexpected EOF
Load(<data/11a856769d>, 0, 0) returned error, retrying after 535.336638ms: unexpected EOF: got 0 instead of 8781706 bytes
rclone: 2022/06/24 07:17:00 ERROR : data/fd/fd9a21ed8528199b4f8f50332f04616763c4566b1b11b085273f61c3af43801d: Didn't finish writing GET request (wrote 3663176/5489542 bytes): unexpected EOF
Load(<data/fd9a21ed85>, 0, 0) returned error, retrying after 681.245719ms: unexpected EOF
Pack ID does not match, want e86ff830, got 7a69ed6c
Load(<data/04430a5876>, 0, 0) returned error, retrying after 396.557122ms: <data/04430a5876> does not exist
Pack ID does not match, want a5c54487, got 8c6cb897
Load(<data/0a686b382a>, 0, 0) returned error, retrying after 398.541282ms: <data/0a686b382a> does not exist
Pack ID does not match, want 4b10abc3, got d3bd2344
Pack ID does not match, want 32c45b4a, got cab6e0f3
rclone: 2022/06/24 09:27:40 ERROR : locks/f225df729853e9c8ed7e4ea698783ed14705fbe08fc247dfc8455acd8d2d1de4: Post request put error: Post "https://api.jottacloud.com/files/v1/allocate": read tcp 192.168.66.189:50782->185.179.130.26:443: read: connection reset by peer
rclone: 2022/06/24 09:27:40 ERROR : locks/f225df729853e9c8ed7e4ea698783ed14705fbe08fc247dfc8455acd8d2d1de4: Post request rcat error: Post "https://api.jottacloud.com/files/v1/allocate": read tcp 192.168.66.189:50782->185.179.130.26:443: read: connection reset by peer
Save(<lock/f225df7298>) returned error, retrying after 626.286518ms: server response unexpected: 500 Internal Server Error (500)
rclone: 2022/06/24 09:30:11 ERROR : data/e4/e4d8373a977877aa8958ba64144b1e28ed451b03240bf6e644b88b8985b00bba: Didn't finish writing GET request (wrote 1702032/4899529 bytes): read tcp 192.168.66.189:35410->185.179.130.30:443: read: connection reset by peer
Load(<data/e4d8373a97>, 0, 0) returned error, retrying after 353.291331ms: unexpected EOF
rclone: 2022/06/24 09:30:52 ERROR : data/2f/2f97c2310f6aa055c4409e17e21010142af4a9454e7251972676a25f882d6298: Didn't finish writing GET request (wrote 1572408/5781141 bytes): read tcp 192.168.66.189:35446->185.179.130.30:443: i/o timeout
Load(<data/2f97c2310f>, 0, 0) returned error, retrying after 682.667507ms: unexpected EOF
rclone: 2022/06/24 09:32:57 ERROR : locks/9818809f7d02e164e535476f314482b6d4f714e10e9b7a85134575682370f13b: Post request put error: Post "https://api.jottacloud.com/files/v1/allocate": dial tcp 185.179.130.26:443: i/o timeout
rclone: 2022/06/24 09:32:57 ERROR : locks/9818809f7d02e164e535476f314482b6d4f714e10e9b7a85134575682370f13b: Post request rcat error: Post "https://api.jottacloud.com/files/v1/allocate": dial tcp 185.179.130.26:443: i/o timeout
Save(<lock/9818809f7d>) returned error, retrying after 598.359583ms: server response unexpected: 500 Internal Server Error (500)
rclone: 2022/06/24 09:35:45 ERROR : data/de/de3c438a36f80f565766838c2c8e6d5ffd284d8aac97841d03673e6b54972c33: Didn't finish writing GET request (wrote 239056/5520398 bytes): read tcp 192.168.66.189:35466->185.179.130.30:443: read: connection reset by peer
Load(<data/de3c438a36>, 0, 0) returned error, retrying after 511.910153ms: unexpected EOF
rclone: 2022/06/24 09:37:01 ERROR : data/d3/d3843f85fd51ec5433bbb75e2d51ca78460c8b67ca4d0178280b98b672646b6a: Didn't finish writing GET request (wrote 4089848/7096812 bytes): unexpected EOF
Load(<data/d3843f85fd>, 0, 0) returned error, retrying after 264.151541ms: unexpected EOF
rclone: 2022/06/24 09:37:50 ERROR : locks/fa87f29be5c900c80d7abe08de45352e5bb763c3e4f7e4ff3bfb637ad98f2899: Post request put error: Post "https://api.jottacloud.com/files/v1/allocate": read tcp 192.168.66.189:50820->185.179.130.26:443: read: connection reset by peer
rclone: 2022/06/24 09:37:50 ERROR : locks/fa87f29be5c900c80d7abe08de45352e5bb763c3e4f7e4ff3bfb637ad98f2899: Post request rcat error: Post "https://api.jottacloud.com/files/v1/allocate": read tcp 192.168.66.189:50820->185.179.130.26:443: read: connection reset by peer
Save(<lock/fa87f29be5>) returned error, retrying after 329.164139ms: server response unexpected: 500 Internal Server Error (500)
rclone: 2022/06/24 09:39:08 ERROR : locks/fa87f29be5c900c80d7abe08de45352e5bb763c3e4f7e4ff3bfb637ad98f2899: Post request put error: Post "https://api.jottacloud.com/files/v1/allocate": dial tcp 185.179.130.26:443: i/o timeout
rclone: 2022/06/24 09:39:08 ERROR : locks/fa87f29be5c900c80d7abe08de45352e5bb763c3e4f7e4ff3bfb637ad98f2899: Post request rcat error: Post "https://api.jottacloud.com/files/v1/allocate": dial tcp 185.179.130.26:443: i/o timeout
Save(<lock/fa87f29be5>) returned error, retrying after 830.44008ms: server response unexpected: 500 Internal Server Error (500)
rclone: 2022/06/24 09:40:09 ERROR : data/de/de3c438a36f80f565766838c2c8e6d5ffd284d8aac97841d03673e6b54972c33: Didn't finish writing GET request (wrote 4299752/5520398 bytes): unexpected EOF
Load(<data/de3c438a36>, 0, 0) returned error, retrying after 1.106431215s: unexpected EOF
rclone: 2022/06/24 09:41:02 ERROR : data/03/037a0016563fb7c5cc7681082f3069bd4575c1f5d2a4ff4a72f614246b132767: Didn't finish writing GET request (wrote 3904987/5038669 bytes): unexpected EOF
Load(<data/037a001656>, 0, 0) returned error, retrying after 289.726811ms: unexpected EOF
rclone: 2022/06/24 09:42:33 ERROR : locks/cf69ddffad9c49ff01588bad4e333f990b77a07435da5e8e9ff7b36f899f574c: Post request put error: Post "https://api.jottacloud.com/files/v1/allocate": read tcp 192.168.66.189:50856->185.179.130.26:443: read: connection reset by peer
rclone: 2022/06/24 09:42:33 ERROR : locks/cf69ddffad9c49ff01588bad4e333f990b77a07435da5e8e9ff7b36f899f574c: Post request rcat error: Post "https://api.jottacloud.com/files/v1/allocate": read tcp 192.168.66.189:50856->185.179.130.26:443: read: connection reset by peer
Save(<lock/cf69ddffad>) returned error, retrying after 547.404299ms: server response unexpected: 500 Internal Server Error (500)
rclone: 2022/06/24 09:52:57 ERROR : locks/74639959b996db8b3dd8d3558f8741320646a2f8e72e6f1ed3e48a3b7b544ad4: Post request put error: Post "https://api.jottacloud.com/files/v1/allocate": dial tcp 185.179.130.26:443: i/o timeout
rclone: 2022/06/24 09:52:57 ERROR : locks/74639959b996db8b3dd8d3558f8741320646a2f8e72e6f1ed3e48a3b7b544ad4: Post request rcat error: Post "https://api.jottacloud.com/files/v1/allocate": dial tcp 185.179.130.26:443: i/o timeout
rclone: 2022/06/24 09:52:58 ERROR : data/61/61cafe4c9a29f7b28d5b387854c38719297aca1cd4579db189fd5c0df17a52e0: Didn't finish writing GET request (wrote 1748288/6112114 bytes): read tcp 192.168.66.189:35478->185.179.130.30:443: read: connection reset by peer
checkPack: Load: unexpected EOF102 packs...
Save(<lock/74639959b9>) returned error, retrying after 596.012294ms: server response unexpected: 500 Internal Server Error (500)
rclone: 2022/06/24 09:55:37 ERROR : data/3c/3c0c103aaa23fe87040561667d08e0a4234fe1293cf1cf058174a1ae5f4947d2: Didn't finish writing GET request (wrote 5464883/6235188 bytes): unexpected EOF
Load(<data/3c0c103aaa>, 0, 0) returned error, retrying after 400.76134ms: unexpected EOF
rclone: 2022/06/24 10:00:08 ERROR : data/28/28c57f41f05b4a64fa90ec2388953f0315c86446f52b713539b9dbdafdf5ab81: Didn't finish writing GET request (wrote 4077019/7225878 bytes): unexpected EOF

Load(<data/28c57f41f0>, 0, 0) returned error, retrying after 336.633119ms: unexpected EOF

rclone: 2022/06/24 10:01:58 ERROR : data/5f/5f6a146057618728a6426599f1a6f4eb2d487d9bdf9da5e6539eda4c095ecbc8: Didn't finish writing GET request (wrote 2136704/8078819 bytes): read tcp 192.168.66.189:35472->185.179.130.30:443: read: connection reset by peer

Load(<data/5f6a146057>, 0, 0) returned error, retrying after 520.549928ms: unexpected EOF
rclone: 2022/06/24 10:02:44 ERROR : data/3c/3c0c103aaa23fe87040561667d08e0a4234fe1293cf1cf058174a1ae5f4947d2: Didn't finish writing GET request (wrote 198056/6235188 bytes): read tcp 192.168.66.189:35606->185.179.130.30:443: read: connection reset by peer
Load(<data/3c0c103aaa>, 0, 0) returned error, retrying after 783.11668ms: unexpected EOF
Pack ID does not match, want 3a627aad, got ea815c1f
Pack ID does not match, want bf7dd7b4, got 965affd6
Pack ID does not match, want b764fc9f, got b337a737
Pack ID does not match, want 1145a662, got 2b7984ff
Pack ID does not match, want 5fc7b11f, got 86cb3f55
Pack ID does not match, want 9222e722, got cbaa82a9
Pack ID does not match, want 5318b984, got 99075905
Pack ID does not match, want 10d97b48, got 4c04b9d3
Pack ID does not match, want 118bef8c, got feb21c8e
Pack ID does not match, want faf77ba6, got 5140c7d6
Pack ID does not match, want 2aaba59a, got 30434fc6
Load(<data/ead243b13a>, 0, 0) returned error, retrying after 389.253811ms: <data/ead243b13a> does not exist
Pack ID does not match, want c8b46938, got 28917a35
Pack ID does not match, want 2a7f04b9, got dda57268
Pack ID does not match, want a73afd9c, got c9eca0fd
Pack ID does not match, want f3b47daf, got abc8232b
Pack ID does not match, want 051751c0, got e4090e2b
Pack ID does not match, want 2f793270, got aed14295
Pack ID does not match, want 8edfbe0f, got 22cb319f
Pack ID does not match, want 270a0ff5, got e05052f6
Pack ID does not match, want 8e7474bd, got 122824c3
Pack ID does not match, want adfcef20, got 060861ae
Pack ID does not match, want 3a3f1aae, got c6c70fdb
Pack ID does not match, want 803bbace, got 5663e567
[28:13:59] 100.00%  551102 / 551102 packs...
Fatal: repository contains errors

restic check --read-data run 2:

+ /volume1/backups/backup-tools/restic/bin/restic --password-file=<CUT> -r rclone:jottacloud:backup check --read-data --no-cache --limit-download 30720 --limit-upload 30720
repository fa48a53c opened successfully, password is correct
create exclusive lock for repository
load indexes
check all packs
check snapshots, trees and blobs
[6:17] 100.00%  136 / 136 snapshots...
read all data
Pack ID does not match, want 000ec843, got c3a5bafe
Pack ID does not match, want 81028f6d, got 40c596a2
rclone: 2022/06/25 04:46:27 ERROR : data/f3/f335c93508ffe7b28ff81e50b9619d5ca2071131a39d6401a434ff820a780d40: Didn't finish writing GET request (wrote 2619880/5731308 bytes): unexpected EOF
Load(<data/f335c93508>, 0, 0) returned error, retrying after 720.254544ms: unexpected EOF
rclone: 2022/06/25 04:46:27 ERROR : data/94/942548e5a6c73e90fcfda42b9635960e994c4bf0f4f990ee89d4c642fb6dcbd2: Didn't finish writing GET request (wrote 3138528/5843396 bytes): unexpected EOF
Load(<data/942548e5a6>, 0, 0) returned error, retrying after 582.280027ms: unexpected EOF
rclone: 2022/06/25 04:46:27 ERROR : data/08/08318a96620b76201d97d6a375446e2234c488be3b9c61578bdb3d828ef4059c: Didn't finish writing GET request (wrote 1570688/5476091 bytes): unexpected EOF
Load(<data/08318a9662>, 0, 0) returned error, retrying after 468.857094ms: unexpected EOF
rclone: 2022/06/25 04:46:28 ERROR : data/c3/c37b789954a3ddac735ab3edb4e5d169c45b756fa1d5c7788bcad54f3bdd7d3a: Didn't finish writing GET request (wrote 1570312/5512225 bytes): unexpected EOF
Load(<data/c37b789954>, 0, 0) returned error, retrying after 462.318748ms: unexpected EOF
Load(<data/ebff81c7d3>, 0, 0) returned error, retrying after 593.411537ms: <data/ebff81c7d3> does not exist
Pack ID does not match, want 3190ac85, got 803c4008
Pack ID does not match, want 99edd304, got b3b28f6e
Pack ID does not match, want ec0a870f, got 122e5724
Pack ID does not match, want 519bbb96, got 15f8fece
Pack ID does not match, want 30b13824, got 5b7470a3
rclone: 2022/06/25 06:23:08 ERROR : data/20/202a973504d1632607eb7edcd7368250c49c77a38d7c88565c2b2ccdf4dfe03a: Didn't finish writing GET request (wrote 5760368/6395118 bytes): unexpected EOF
Load(<data/202a973504>, 0, 0) returned error, retrying after 282.818509ms: unexpected EOF
Pack ID does not match, want 4375e07b, got 0f4f0f39
[25:49:46] 100.00%  551102 / 551102 packs...
Fatal: repository contains errors

Now if I understand this correctly, that shouldn’t be the case. If you grep Pack IDs and compare them (I did) you’ll see that the sets of “wrong” Pack IDs are exclusive between two runs. So my best guess is the errors are caused not because of actual issues with the repo, but some network blips or my cloud provider beeing shitty.

That being said, I’m of course quite worried that my repo got corrupted. It’s a little bit shy of 3 TB of data, so it’s not really feasible for me to download everything and check manually (not to mention, how much time it’d take).

So my questions are:

  • Is my understanding correct, are the errors caused by network/cloud provider?
  • What’s the best option for me to validate correctness of the repo that’s consistent?

Thanks for all the help!

Hello again, I’ve spend the last week running more tests & reading more. I’ve figured that in order to rule out networking errors, I’ll download the whole repo, and then run check --read-data locally. Sounded like a good idea, so I did that and results are a bit confusing.

I’ve run the check --read-data command 5 times (2 times I had to stop it mid-way, but that’s irrelevant). And I’ve got 5 completely different result :expressionless: it’s mostly Pack ID does not match error, with an occasional pack X contains 1 errors: [blob 0: ciphertext verification failed]. None of the checks have completed successfully, but one of them almost did that with just one error, so my guess is if I’d run enough checks, eventually one of them would work.

Now, my conslusion is:

  • The repo seems to be fine. Each pack has been successfully validated at least once, since I get different errors with every run, and pack IDs don’t repeat.
  • The issues are caused most likely by bit flips. Originally, when I was running the command via network I was getting quite a few of them. Running locally it’s just a handful, which makes sense.

And, questions I have:

  • Is the above assessment correct, especially regarding the repo correctness?
  • If the answer to the previous point is “yes”, are those bitflips I’m observing something normal (like, cosmic rays happen, my repo is almost 3TB so random bitflips are unavoidable), or is the problem more serious, and means hardware errors?
  • If “this is normal”, what’s the best way to run full check in a consistent way? Because running it number of times and comparing pack IDs seems wrong.

For the record, I was running all commands via SSH on my Synology NAS. AFAICT the NAS is fine, but I’m not completely sure how to check it more thoroughly.

And, of course, output of my 5 check --read-data runs (note that one of them has broken pipe, I had to kill the session, but I don’t think it matters since the errors reported so far were inconsistent anyway):

$ /volume1/backups/backup-tools/restic/bin/restic --password-file=<CUT> -r backup-tmp/ check
using temporary cache in /tmp/restic-check-cache-109571841
repository fa48a53c opened successfully, password is correct
created new cache in /tmp/restic-check-cache-109571841
create exclusive lock for repository
load indexes
check all packs
check snapshots, trees and blobs
[0:23] 100.00%  136 / 136 snapshots...
no errors were found

$ /volume1/backups/backup-tools/restic/bin/restic --password-file=<CUT> -r backup-tmp/ check --read-data
using temporary cache in /tmp/restic-check-cache-1134635854
repository fa48a53c opened successfully, password is correct
created new cache in /tmp/restic-check-cache-1134635854
create exclusive lock for repository
load indexes
check all packs
check snapshots, trees and blobs
[0:22] 100.00%  136 / 136 snapshots...
read all data
Pack ID does not match, want e796edb4, got 22819612
Pack ID does not match, want 3d2c4b41, got 4d389104
Pack ID does not match, want c4ee8021, got 27057caa
Pack ID does not match, want 779bc5d7, got e2cb86e7
[12:31:04] 100.00%  551102 / 551102 packs...
Fatal: repository contains errors

$ /volume1/backups/backup-tools/restic/bin/restic --password-file=<CUT> -r backup-tmp/ check --read-data
using temporary cache in /tmp/restic-check-cache-703405674
repository fa48a53c opened successfully, password is correct
created new cache in /tmp/restic-check-cache-703405674
create exclusive lock for repository
load indexes
check all packs
check snapshots, trees and blobs
[0:19] 100.00%  136 / 136 snapshots...
read all data
Pack ID does not match, want ad567f74, got 5692a7c1
Pack ID does not match, want 38daf897, got 7ec525c6
Pack ID does not match, want cbcd1d1e, got 89392f07
pack a39d48ba contains 1 errors: [blob 2: ciphertext verification failed]
Pack ID does not match, want bd1e82ce, got 09c2d58c
client_loop: send disconnect: Broken pipe.

$ /volume1/backups/backup-tools/restic/bin/restic --password-file=<CUT> -r backup-tmp/ check --read-data --no-lock
using temporary cache in /tmp/restic-check-cache-1067055194
repository fa48a53c opened successfully, password is correct
created new cache in /tmp/restic-check-cache-1067055194
load indexes
check all packs
check snapshots, trees and blobs
[0:22] 100.00%  136 / 136 snapshots...
read all data
pack 1048ebdb contains 1 errors: [blob 0: ciphertext verification failed]
Pack ID does not match, want 048da962, got 6b9d4407
Pack ID does not match, want 9adce8b0, got a0e0b2ba
[12:30:28] 100.00%  551102 / 551102 packs...
Fatal: repository contains errors

$ /volume1/backups/backup-tools/restic/bin/restic --password-file=<CUT> -r backup-tmp/ check --read-data --no-lock
using temporary cache in /tmp/restic-check-cache-39037144
repository fa48a53c opened successfully, password is correct
created new cache in /tmp/restic-check-cache-39037144
load indexes
check all packs
check snapshots, trees and blobs
[0:20] 100.00%  136 / 136 snapshots...
read all data
Pack ID does not match, want c98a7697, got ba5df4d8
[12:35:57] 100.00%  551102 / 551102 packs...
Fatal: repository contains errors

You should try to run the tests from a different PC. Your results strongly look like hardware errors (RAM or CPU) on the system you are testing!

In order to check your PC, run something like memtest86+ on it!

2 Likes

Thanks, I’ll see how much I can do given it’s not a PC we are talking about but NAS :slight_smile:

I’ll also run the check from another machine and we’ll see.

These tests don’t cover everything. I used to get integrity errors, as it can be seen in my post history. SMART, memtest86, fsck, etc, running for full days found nothing.

I tested with ZFS and still got integrity errors. At that point, I suspected hardware issues. I changed the computer and so far haven’t got any integrity errors.

Clearly it was a hardware problem!

I love it when restic randomly find hardware issues :smiley:

With the check results it looks like 1 bitflip / TB, which is far higher than I’d be comfortable with. My guess would be that something in the range of 1 bitflip per petabyte would be more standard (that is a somewhat random guess inspired by a recent Facebook paper stating “Memory corruption is common at [Exa]scale.”)

If it was 1 bit/TB, by now many people using different applications would have reported it. I don’t see many integrity error complains.

Still, @MichaelEischer any plans for adding RS error correction code or similar to Restic?

There are no specific plans at the moment.

Erasure coding would only solve data corruption while it is stored and not corruption while processing. E.g. bitflips in memory wouldn’t be covered by erasure coding.

Hello everyone, thank you all for your replies. Apologies it took me a while to get back here - but just so you know, it wasn’t because I’ve ignored this issue or anything - your suggestions were very reasonable, so I went on to test things very thoroughly; unfortunately, as my repo is over 2.5TB big, doing one full check took between 20 and 50 hours (depending on how I was testing), and I did quite a few of them - hence why I’m responding only just now.

So what are the results? Well, seems you were all most likely right (unsuprisingly) about the memory, but there are some weird results I don’t understand.

So the test was to run check --read-data using 3 machines: NAS (via SSH), my modern-ish windows PC (I did it actually twice on the PC, once with WSL2 and once with powershell), and a little big aged linux laptop. For each of the devices, I run the test at least twice, once using my cloud provider, and once using the local copy of the repo I’ve made - actually it was more like a “local/LAN copy”, as I don’t have free 2.5TB neither on my PC nor on my laptop, so I’ve used smb-mounted drives there.

The results were as follows:

  • On NAS I’m consistently getting Pack ID errors - not a huge amount of them, but always some - no matter what repo I’m using (remote/local)
  • On the PC & Laptop I’m getting a clean check if I use the smb-mounted repo. I still get Pack ID errors when I use the “remote/cloud” repo. This is true for Linux, PowerShell & WSL.
  • As always, the Pack ID errors are random, I haven’t check them all, but the ones I’ve checked don’t ever repeat.

So the first thing to take from here is that you were probably right, my NAS’ RAM is bad.

The second thing however is that it also seems that the check --read-data is very unreliable when used over the interent. Is this expected? FWIW, my current ISP is pretty shitty, so maybe that’s a factor too, but I’d assume shitty ISP would mean things would break sooner, on the network layer, so badly downloaded packets would be discarded, and would never end up in restic? I’m confused.

My first question right now is - if checking over the internet is so unreliable, is there a more robust way to do it? I’d like to run a full check occasionally, llike every few months, and it seems not trivial.

My second question is, what do you recommend to do with NAS? Sadly, it’s a cheaper synology model, and I can’t swap RAM. It’s possible it’s also no longer under warranty although I’d need to check - but even if it’s covered by warranty, how would I make my case to the producent? Memcheck works without any issues, so the only proof is that “a random (from synology’s point of view) software restic fails” - not a strong case I’m afraid :expressionless: also, buying a whole new unit isn’t really an option too at the moment I’m afraid. Any advices?

Thanks again for all the comments, and thanks in advance for future replies!