Welcome to the forum @jlduprat, and thanks for the comprehensive report!
While I don’t have the time right now to dig into this issue, I’m certain that “dropped packets” as the error source is very unlikely:
- When a packet is dropped, the TCP stack will make sure it is resent
- The connections to B2 are always HTTPS, which means that the TLS layer ensures data integrity and “data is missing in the middle of a connection” is a fatal error in TLS so the connection would have been aborted
- When saving data to B2 does not work, restic just retries the request until it succeeds or the time is up
I can understand that. Please be aware that using restic has revealed a number of hardware issues over the last five years our users were not aware of before. Do you have the option to run memtest on the machine for a few hours at least?
You’re also right about the two kinds of errors:
This is an error on the outermost level: restic requested a file from B2 for which it knows the SHA256 hash of the contents, but got something with a different hash back. The data might have been modified at rest, during transit, in memory of the machine, or even at backup time before it could be saved to B2.
This one is much different: restic requested the pack 82b0562f
and the hash of the contents matched the file name, so the data was not modified in transit or at rest. But: a part of the (encrypted) data has been modified, which could only have happened between restic encrypting the data (so called blob
, pack files contain one or more of these) and saving the data to a temp file before uploading to B2. In theory, it could also have happened during check
, but you can easily test that for yourself:
- Find the complete filename for the pack:
restic list packs | grep '^82b0562f'
- Download the pack and check its hash :
restic cat pack <ID> | sha256sum
Otherwise, I can only imagine that this happened within restic, at backup. This leaves two possibilities:
- A bug in restic or the Go compiler/runtime
- A hardware issue on the machine running restic (RAM, CPU, storage?)
While I cannot rule out any of those issues (and there may be a bug), I think the second one is more likely because many other people running restic without data integrity issues with even more data (up to several tens of terabyte I was told) and we’ve seen similar issues in the past which were indeed caused by hardware issues.
So, would you mind running memtest and reporting back?