How to debug restic integrity errors?

MichaelEischer · September 7, 2021, 9:21pm

An interrupted network connection or backup in general only cause “incomplete pack file” warnings. Background processes could cause restic to read inconsistent data, but that won’t damage the repository. Permissions problems on the backend would completely prevent reading some files from the backend which leads to other error messages. Permissions problems during backup would have been reported properly and also cannot lead to check errors. Problems with restic are always a possibility, but the kind of error messages you see are, in my experience, usually caused by bit flips in hardware.

Restic is designed to keep the backup repository intact no matter when during a backup or other operations it or the network connection is interrupted (there are a few caveats with sftp which can require manually deleting some incomplete files in rare cases. But the errors you see are something completely different).

The first error type could be caused by data corruption in the backend storage or by bitflips during the backup.
The “Blob ID does not match” and “invalid character” errors can only be caused by bitflips on the host running the backup. They cannot be caused by some random bitflips in the storage backend (no matter whether it is a BTRFS raid or something completely different).

To trigger the last two error types, the bitflips must be introduced during or prior to encryption of the blobs. And that process runs completely in memory on the client which creates the backup. The “invalid character” error is a bit strange, as it can only be introduce either during the JSON encoding or while calculating the blob hash (which happens before encrypting the blob). In any case this hints at a memory or CPU problem. I’d recommend running prime95 to check whether the system stays stable under load.

Which kernel version runs on the client? We had some problems with data corruption and a kernel bug in the past, see How to fix "failed: ciphertext verification failed" running 'restic 'prune' - #4 by fd0 .