Study of repository damage types

tl;dr Jump forward to “Combined Results” or “Discussion” for the results. But please read the “Limitations” section before jumping to conclusions.

Repository damage study

Backups are the standard method to protect from data loss due to hardware or software problems. But just like the original data, a backup repository itself can also be damaged. A recent issue regarding integrity errors reported by prune caused me to wonder how frequent such problems show up and which types of damages exist.

This systematic review of recent reports of damaged repositories aims to answer to this question.

Methodology

Data is gathered by searching the restic Github issues and the restic forum for reports of damaged repositories. I have only included reports for restic >= 0.12.0 as it includes a rewrite of the prune algorithm including many bugfixes and thus the results are not comparable with earlier versions. Restic 0.12.0 was released on 2021-02-14 and thus older topics/issues will be ignored. To gain a full picture, there will be limited attempts to find older issues which were hijacked with later problem reports.

The search term selection and validation is discussed below. Using the final search terms the issues are categorized based on the reported type of damage including the used restic version or are marked as ignored if the issue is not relevant for this review. Ignored issues include for example feature requests, using an old restic version, crashes/fixed bugs, running out of space, duplicate problem reports, wrong backend behavior, usage questions and repository damages caused by the user. Repository damages caused by the user explicitly include damage caused by other tool, NOT restic. Fixed backend problems or those unrelated to repository corruption are ignored. At the end the results are categorized based on the damage type. The categories are not explicitly predefined, but are developed iteratively to ensure that similar issues are categorized consistently.

As a general prefilter, Github issues before #3280 are ignored due to their age. Similarly forum posts before 2021-02-14 are excluded. The latest issue at the time of this review is #3817, forum posts are include up to 2022-07-08.

Search terms

The initial list of search terms is created by scavenging through prune/check and selecting warning related to detected integrity errors. To detected crashed execution of these command, the search term “runtime.goexit” is included as it indicates the presence of a stacktrace.

prune:

  • prune error
  • prune “runtime.goexit”
  • prune “not found in the index”
  • prune “does not match real size”
  • prune “pack files which are missing from the repository”
  • prune “returned invalid hash”

Validating the search terms aginst the Github issues shows that prune error returns a superset of all following search terms and that the amount of issues is still reasonable. Searching just for “prune” primarily increases the amount of noise, except for two issues with hanging prune runs due to backend problems which are not relevant for this review.

check:

  • check error
  • “repository contains errors”
  • check “runtime.goexit”

Validating the search terms aginst the Github issues shows that check error returns a superset of all following search terms. Comparing the set of returned issues, check error only seems to increase the noise compared to prune error. The same applies for check.

backup:

  • “storing the file again”
  • “the repository could be damaged”

bitflips:

  • “ciphertext verification failed”
    maybe storage or bitflips or upload:
  • “hash does not match id”

Collected Data

Github Issues

excluded issues are marked with “ignore”
included issues either report the used restic version or “unknown”

Raw data

Issues were collected using the following commands

gh issue list -S 'prune error sort:created-desc' -s all -L 60
gh issue list -S '"ciphertext verification failed" sort:created-desc' -s all -L 60
gh issue list -S ' "hash does not match id" sort:created-desc' -s all -L 60

To include hijacked Github issues, also include older issues which explicitly specify a recent restic version:

// without time limit
gh issue list -S 'prune error 0.12.0 sort:created-desc' -s all -L 60
gh issue list -S 'prune error 0.12.1 sort:created-desc' -s all -L 60
gh issue list -S 'prune error 0.13.0 sort:created-desc' -s all -L 60
gh issue list -S 'prune error 0.13.1 sort:created-desc' -s all -L 60
- "prune error"
#3808 ignore - feature request
#3800 0.13.1 data corruption - "ciphertext verification failed"
#3766 unknown many missing blobs - local; 0.13.1 one missing blob - rclone/pcloud
#3765 ignore - old restic version
#3763 ignore - crash
#3755 ignore - backend problem - REST filesystem issue
#3732 ignore - usage
#3724 ignore - out of space
#3703 ignore - feature request
#3676 0.12.1 one missing blob - sftp
#3662 ignore - feature request
#3652 ignore - feature request
#3650 ignore - feature request
#3646 backend problem - S3 incomplete list
#3631 ignore - old restic version
#3620 ignore - feature request
#3609 ignore - crash
#3606 ignore - old restic version
#3583 repository format limitation
#3582 ignore - out of space
#3581 0.12.1 one missing blob - B2
#3559 ignore - feature request
#3551 0.12.0 data corruption - disappeared after upgrade
#3545 ignore - feature request
#3543 ignore - backend problem - B2 idempot. delete
#3541 ignore - backend problem - B2 idempot. delete
#3518 ignore - feature request
#3498 ignore - feature request
#3495 0.12.1 many missing blobs - sftp
#3491 ignore - feature request
#3486 0.12.1 many missing blobs - SMB
#3476 0.12.1 one missing tree blob - B2
#3471 0.12.0 missing pack files - S3
#3466 ignore - crash
#3464 ignore - feature request
#3461 0.12.0 backend problem - local/SMB pack size mismatch
#3455 ignore - duplicate
#3448 ignore - usage
#3435 0.12.0 backend problem - local/SMB hash mismatch
#3400 0.12.0 many missing blobs - ?
#3384 0.12.0 many missing blobs - ?
#3381 ignore - bug
#3370 ignore - old restic version
#3365 0.12.0 many missing blobs - REST bug
#3348 0.12.0 one missing tree blob - Swift
#3342 ignore - old restic version
#3339 ignore - bug
#3336 ignore - out of space
#3302 ignore - bug
#3295 ignore - feature request
#3289 ignore - out of space
#3288 ignore - old restic version

// old issues
#828  ignore - feature request
#1078 ignore - feature request
#1153 ignore - out of space
#1450 ignore - backend problem - broken S3 setup
#1986 ignore - duplicate
#1999 ignore - duplicate
#2562 ignore - backend problem - B2 idempot. delete
#2659 0.12.0 backend problem - local/SMB pack size mismatch etc
#2673 ignore - backend problem - B2 slow upload
#2736 ignore - feature request
#3213 ignore - old restic version - REST bug
#3260 ignore - old restic version
#3265 0.11.0 backend problem - S3 upload retry
#3266 backend problem - B2 broken data
#3268 backend problem - B2 broken data
#3272 ignore - old restic version
#3273 ignore - old restic version


- "ciphertext verification failed"
#3430 broken key

- "hash does not match id": nothing new

Restic Forum

Forum topics are only reported under the first search term that returned them. The search term check error was skipped as it was not useful for Github issues. Besides that all other listed search terms were used.

Raw data
https://forum.restic.net/search?q=prune%20error%20after%3A2021-02-13%20order%3Alatest_topic

ignore - usage https://forum.restic.net/t/using-forget-for-the-first-time-triggering-backup-triggers-a-full-backup-why/5176
ignore - usage https://forum.restic.net/t/restic-check-hangs-reports-no-error/5140
0.13.1 - prune with broken locking https://forum.restic.net/t/data-seems-to-be-missing-after-repack-uncompressed/5136
ignore - backend problem - B2 incomplete listing https://forum.restic.net/t/restic-check-errors-with-b2-and-a-huge-repo-whats-next/5125
ignore - usage https://forum.restic.net/t/restic-prune-with-backblaze-returns-errors-b2-b2errdeleted/5114
ignore - out of space https://forum.restic.net/t/is-there-any-hope-when-restic-reports-no-space-left-on-device/5063
ignore - bug / old restic version https://forum.restic.net/t/help-in-recovery-a-corrupted-repository/5044
ignore - old restic version https://forum.restic.net/t/extremely-slow-pruning/5028
ignore - repository damage by user https://forum.restic.net/t/cant-browse-mounted-repo/5014
ignore - usage https://forum.restic.net/t/prune-on-backblaze-b2-leads-to-no-space-error/4986
ignore - usage https://forum.restic.net/t/restic-copy-diff-repair/4938
ignore - backend problem - S3 misconfiguration https://forum.restic.net/t/frequently-corrupt-snapshots-prune-errors-missing-clues/4913
ignore - usage https://forum.restic.net/t/solved-no-matching-id-found-for-prefix-after-update-to-0-13-0/4897
0.12.1 data corruption - local hash mismatch https://forum.restic.net/t/newbie-here-pack-id-does-not-match-errors-how-to-fix/4882
0.12.1 backend problem - S3 incomplete list https://forum.restic.net/t/fatal-packs-from-index-missing-in-repo/4869
ignore - off topic https://forum.restic.net/t/a-restic-client-written-in-rust/4867
0.12.1 corrupt cache https://forum.restic.net/t/tree-could-not-be-loaded/4860
ignore - backend problem - S3 slow delete https://forum.restic.net/t/very-slow-restic-prune/4841
0.12.1 data corruption - "ciphertext verification failed" https://forum.restic.net/t/requesting-strategy-check-for-recovery-from-bad-repository/4837
ignore - backend problem - REST permissions https://forum.restic.net/t/blob-not-remove-with-rest-server/4830
0.12.1 many missing blobs, one missing tree blob - REST https://forum.restic.net/t/unfixable-not-found-in-index-fixed-not-found-in-repo-issues-on-prune/4749
ignore - usage https://forum.restic.net/t/customize-pruning-for-lot-of-usage/4741
ignore - usage https://forum.restic.net/t/how-to-remove-unused-blobs/4738
ignore - usage https://forum.restic.net/t/unresolved-pack-f8c6065a-not-referenced-in-any-index-error/4626
ignore - old restic version https://forum.restic.net/t/listing-snapshots-is-slow/4611
0.12.1 backend problem - many missing blobs - rclone/pCloud https://forum.restic.net/t/repo-corruption/4598
unknown hardware issue - massive data corruption, hash mismatches https://forum.restic.net/t/multiple-errors-after-restic-check-cmd/4542
unknown corrupt cache https://forum.restic.net/t/restic-copy-of-repo-crashed-now-root-dir-is-full-what-to-do/4531
ignore - bug https://forum.restic.net/t/too-many-open-files/4503
unknown data corruption - "ciphertext verification failed" https://forum.restic.net/t/fatal-repository-contains-errors-how-best-to-respond/4498
ignore - out of space https://forum.restic.net/t/my-backup-hdd-is-completely-full-now-how-to-prune/4461
ignore - backend problem - local async. preempt https://forum.restic.net/t/backup-failure-after-check-prune-succeed/4418
ignore - backend problem - B2 idempot. delete https://forum.restic.net/t/pruned-the-copy-destination-now-copy-fails/4395
ignore - backend problem - B2 hang https://forum.restic.net/t/restic-hangs-with-b2-since-2021-10-01/4390
0.12.1 hardware issue - massive data corruption https://forum.restic.net/t/how-to-debug-restic-integrity-errors/4324
0.12.1 data corruption - invalid hash (once) https://forum.restic.net/t/help-debugging-a-blob-invalid-hash/4318
ignore - usage https://forum.restic.net/t/script-works-manually-but-not-on-cron/4315
ignore - backend problem - REST permissions https://forum.restic.net/t/error-when-creating-a-backup-via-the-rest-server/4290
0.12.1 backend problem - local FS damage https://forum.restic.net/t/files-not-found-in-index/4255
0.12.0 data corruption - "ciphertext verification failed" https://forum.restic.net/t/damaged-tree-blob-what-to-do-now/4190
ignore - usage https://forum.restic.net/t/unexpected-costs-with-backblaze-b2-backend/4176
ignore - usage https://forum.restic.net/t/fatal-repository-contains-errors-after-check-unused/4175
0.12.0 data corruption - "ciphertext verification failed" (once) https://forum.restic.net/t/automatically-remove-unused-blobs/4128
0.12.0 one missing tree blob - ? https://forum.restic.net/t/prune-fails-on-missing-tree/3981
0.12.0 various damages https://forum.restic.net/t/persistent-repository-corruption-and-data-loss/3974
0.12.0 backend problem - many missing blobs - rclone/pCloud https://forum.restic.net/t/prune-fails-with-data-missing-how-to-recreate-data-packs/3946
ignore - bug - B2 https://forum.restic.net/t/out-of-content-errors/3908
ignore - usage https://forum.restic.net/t/check-for-improve-script-for-webserver-with-mysql/3841
ignore - backend problem - B2 idempot. delete https://forum.restic.net/t/rebuild-index-and-prune-fail-when-trying-to-delete-an-index-that-doesnt-exist/3794
0.12.0 many missing blobs - S3 https://forum.restic.net/t/restic-failing-with-id-hex-string-not-found-in-repository/3789
ignore - usage https://forum.restic.net/t/howto-make-sure-that-the-backups-are-restorable/3777
ignore - old restic version https://forum.restic.net/t/trying-to-restore-snapshots-and-getting-unknown-blob/3759
ignore - usage https://forum.restic.net/t/efficient-check-for-unchanged-source/3729
ignore - old restic version https://forum.restic.net/t/pack-contains-errors-ciphertext-verification-failed/3724
ignore - backend problem - local permissions https://forum.restic.net/t/index-19b6de8c23-does-not-exist-after-forget-prune/3719
ignore - experimental restic version https://forum.restic.net/t/prune-crash-hash-does-not-match/3672
ignore - usage https://forum.restic.net/t/raw-copy-repositories-from-a-restic-rest-server/3671
ignore - usage https://forum.restic.net/t/restic-prune-runtime-cannot-allocate-memory/3660
backend problem - rclone/pcloud incomplete files https://forum.restic.net/t/connection-errors-during-backup-with-restic-rclone/3639
ignore - old restic version https://forum.restic.net/t/weird-check-errors/3623
ignore - old restic version (broken index from 0.9.6) https://forum.restic.net/t/error-while-triying-to-prune-a-huge-repo/3622
ignore - experimental restic version https://forum.restic.net/t/getting-rid-of-nil-subtrees/3602
Linux 5.2-5.4 kernel bug https://forum.restic.net/t/fatal-load-index-xxxxxxxxx-invalid-data-returned/3596
ignore - old topic https://forum.restic.net/t/very-old-lock-and-concerns-about-restic-unlock/2829
ignore - old topic https://forum.restic.net/t/backup-clients-and-sleep-interruptions/1904
ignore - old topic https://forum.restic.net/t/o-b2-connections-n-seems-to-have-no-impact/1260

https://forum.restic.net/search?q=prune%20%22runtime.goexit%22%20after%3A2021-02-13%20order%3Alatest_topic
ignore - backend problem - rclone/gdrive list error https://forum.restic.net/t/stacktrace-on-restic-prune/4404

//nothing new
https://forum.restic.net/search?q=prune%20%22not%20found%20in%20the%20index%22%20after%3A2021-02-13%20order%3Alatest_topic
https://forum.restic.net/search?q=prune%20%22does%20not%20match%20real%20size%22%20after%3A2021-02-13%20order%3Alatest_topic
https://forum.restic.net/search?q=prune%20%22pack%20files%20which%20are%20missing%20from%20the%20repository%22%20after%3A2021-02-13%20order%3Alatest_topic
https://forum.restic.net/search?q=prune%20%22returned%20invalid%20hash%22%20after%3A2021-02-13%20order%3Alatest_topic

https://forum.restic.net/search?q=%22repository%20contains%20errors%22%20after%3A2021-02-13%20order%3Alatest_topic
unknown hardware issue - frequent data corruption https://forum.restic.net/t/inconsistent-errors-with-check-read-data/5168
0.13.1 data corruption - "ciphertext verification failed" https://forum.restic.net/t/help-with-repository-errors/5165
unknown data corruption - hash mismatch https://forum.restic.net/t/how-to-remove-damaged-pack-file/5120
0.12.1 hardware issue - missing/damaged packs https://forum.restic.net/t/repository-contains-errors-what-snapshot-file-is-affected/4527

https://forum.restic.net/search?q=check%20%22runtime.goexit%22%20after%3A2021-02-13%20order%3Alatest_topic
ignore - usage https://forum.restic.net/t/restic-uses-more-than-4gb-ram-on-14gb-backup-than-crashes/4584
ignore - usage https://forum.restic.net/t/should-rebuild-index-check-if-rest-server-is-in-append-only-mode/3585

https://forum.restic.net/search?q=%22ciphertext%20verification%20failed%22%20after%3A2021-02-13%20order%3Alatest_topic
ignore - usage https://forum.restic.net/t/cant-init-backblaze-b2-repo/5052
ignore - feature request https://forum.restic.net/t/automatically-remove-damaged-pack-files/4679
ignore - old restic version https://forum.restic.net/t/unable-to-create-lock-in-backed-ciphertext-verification-failed/4520

//nothing new
https://forum.restic.net/search?q=%22hash%20does%20not%20match%20id%22%20after%3A2021-02-13%20order%3Alatest_topic

https://forum.restic.net/search?expanded=true&q=%22the%20repository%20could%20be%20damaged%22
ignore - bug https://forum.restic.net/t/parallel-b2-uploads-error-tree-is-not-known-on-slight-overlap/4859
ignore - bug https://forum.restic.net/t/why-dont-use-tags-to-find-latest-snapshot-when-backup/4824

//nothing new
https://forum.restic.net/search?expanded=true&q=%22storing%20the%20file%20again%22

Combined Results

After filtering out the ignored entries, the items are grouped into data loss, which is defined as data that has disappeared, and data corruption. The latter is subdivided based on whether the probable cause is in hardware or software.

Limitations

tl;dr Don’t claim that integrity problems of repositories are frequent or not. There’s no data for such claims.

The collected data shows a massive reporting bias between Github issues and the restic forum depending on the problem type. For example missing data blobs are nearly always reported on Github (as prompted by the error message) whereas data corruption related issues are mostly reported on the forum. Prompting user reports in some cases but not others also has the potential to massively skew the data collection. That is, different types of damages which are reported equally often, can actually occur with a significantly different frequency.

We have no idea at all, how many users and repositories there are. Judging from the number of stars on Github, the number of users alone is in the 10k+ / 100k+ range. As by far not all repository problems are reported, the numbers in the following are an incomplete sample out of an user population of unknown size. Or in other words, there is no way to determine whether a problem is frequent or rare.

Data loss

There are 3 reports of a single missing data blob and 4 reports of a single missing tree blob, both types are spread across various backend types. 8 issues report multiple missing blobs.

There seems to be a suprising pattern regarding missing blobs: there are quite a few reports of a single missing blob and also for several missing blobs but nothing inbetween. A possible explanation is that single blob losses are caused by bitflips during prune, whereas multiple missing blobs are likely a result of lost pack files.

Besides that there are 2 reports of missing files using S3 and 2 reports of incomplete file listings from S3.

Raw data
#3766 0.13.1 one missing blob - rclone/pcloud
#3676 0.12.1 one missing blob - sftp
#3581 0.12.1 one missing blob - B2
#3476 0.12.1 one missing tree blob - B2
#3348 0.12.0 one missing tree blob - Swift
0.12.0 one missing tree blob - ? https://forum.restic.net/t/prune-fails-on-missing-tree/3981
0.12.1 one missing tree blob - REST https://forum.restic.net/t/unfixable-not-found-in-index-fixed-not-found-in-repo-issues-on-prune/4749

#3766 unknown many missing blobs - local
#3495 0.12.1 many missing blobs - sftp
#3486 0.12.1 many missing blobs - SMB
#3400 0.12.0 many missing blobs - ?
#3384 0.12.0 many missing blobs - ?
#3365 0.12.0 many missing blobs - REST bug
0.12.1 many missing blobs - REST https://forum.restic.net/t/unfixable-not-found-in-index-fixed-not-found-in-repo-issues-on-prune/4749
0.12.0 many missing blobs - S3 https://forum.restic.net/t/restic-failing-with-id-hex-string-not-found-in-repository/3789

#3471 0.12.0 missing pack files - S3
#3265 0.11.0 backend problem - S3 upload retry

// usually recoverable
#3646 backend problem - S3 incomplete list
0.12.1 backend problem - S3 incomplete list https://forum.restic.net/t/fatal-packs-from-index-missing-in-repo/4869

Data Corruption (Software)

Samba shares on Linux seem to be a major cause of problems (3 issues), followed by pCloud via rclone (3 issues), B2 returning broken data (2 issues) and local filesystem corruption (1 issue). The B2 issue was only temporary and has been fixed in the meantime.

Raw data
#3461 0.12.0 backend problem - local/SMB pack size mismatch
#3435 0.12.0 backend problem - local/SMB hash mismatch
#2659 0.12.0 backend problem - local/SMB pack size mismatch etc
0.12.1 backend problem - local FS damage https://forum.restic.net/t/files-not-found-in-index/4255
#3266 backend problem - B2 broken data
#3268 backend problem - B2 broken data
backend problem - rclone/pcloud incomplete files https://forum.restic.net/t/connection-errors-during-backup-with-restic-rclone/3639
0.12.1 backend problem - many missing blobs - rclone/pCloud https://forum.restic.net/t/repo-corruption/4598
0.12.0 backend problem - many missing blobs - rclone/pCloud https://forum.restic.net/t/prune-fails-with-data-missing-how-to-recreate-data-packs/3946

There have also been issues related to an AVX-bug in the Linux Kernel 5.2-5.4, data corruption which disappeared after updating restic and incomplete files in the local cache used by restic. The cache issue should largely be fixed since 0.13.0.

Raw data
// fixed by updates
Linux 5.2-5.4 kernel bug https://forum.restic.net/t/fatal-load-index-xxxxxxxxx-invalid-data-returned/3596
#3551 0.12.0 data corruption - disappeared after upgrade
0.12.1 corrupt cache https://forum.restic.net/t/tree-could-not-be-loaded/4860
unknown corrupt cache https://forum.restic.net/t/restic-copy-of-repo-crashed-now-root-dir-is-full-what-to-do/4531

Data Corruption (Hardware)

The most frequent errors are related to encryption (7 issues) usually resulting in “ciphertext verification failed”. This is followed by apparently hardware issues causing massive data corruption (4 issues) and hash mismatches for pack files (2 issues).

The ciphertext verification errors usually indicate a bitflip, especially when the pack content matches its hash. In that case the only explanation for an invalid ciphertext are bitflips (and possibly software bugs).

Hash mismatches for pack files can either be caused due to data corruption in the stored file or by a miscalculated hash once again due to a bitflip in memory or during computation. Since restic 0.13.0 most backends receive the expected hash for the uploaded file, which could partially prevent this damage type.

Raw data
0.12.1 data corruption - Local hash mismatch https://forum.restic.net/t/newbie-here-pack-id-does-not-match-errors-how-to-fix/4882
unknown data corruption - hash mismatch https://forum.restic.net/t/how-to-remove-damaged-pack-file/5120

#3800 0.13.1 data corruption - "ciphertext verification failed"
0.13.1 data corruption - "ciphertext verification failed" https://forum.restic.net/t/help-with-repository-errors/5165
0.12.1 data corruption - "ciphertext verification failed" https://forum.restic.net/t/requesting-strategy-check-for-recovery-from-bad-repository/4837
unknown data corruption - "ciphertext verification failed" https://forum.restic.net/t/fatal-repository-contains-errors-how-best-to-respond/4498
0.12.0 data corruption - "ciphertext verification failed" https://forum.restic.net/t/damaged-tree-blob-what-to-do-now/4190
0.12.0 data corruption - "ciphertext verification failed" (once) https://forum.restic.net/t/automatically-remove-unused-blobs/4128
0.12.1 data corruption - invalid hash (once) https://forum.restic.net/t/help-debugging-a-blob-invalid-hash/4318

// broken hardware
unknown hardware issue - frequent data corruption https://forum.restic.net/t/inconsistent-errors-with-check-read-data/5168
unknown hardware issue - massive data corruption, hash mismatches https://forum.restic.net/t/multiple-errors-after-restic-check-cmd/4542
0.12.1 hardware issue - missing/damaged packs https://forum.restic.net/t/repository-contains-errors-what-snapshot-file-is-affected/4527
0.12.1 hardware issue - massive data corruption https://forum.restic.net/t/how-to-debug-restic-integrity-errors/4324

Unique Issues

There is a small number of issues with various causes.

Raw data
#3583 repository format limitation
#3430 broken key
0.12.0 various damages https://forum.restic.net/t/persistent-repository-corruption-and-data-loss/3974
0.13.1 prune with broken locking https://forum.restic.net/t/data-seems-to-be-missing-after-repack-uncompressed/5136

Discussion

The main causes of data corruption seem to be backend problems (9 issues), disappeared pack files (8 issues), missing single blobs (7 issues combined) and encryption errors (7 issues). The remaining 18 issues have various causes.

As encryption errors and the massive data corruption cases account for 11 of 49 issues, this could indicate that quite a few of the other problems are also caused by bitflips, but just not in an immediately obvious way.

The backend problems and several of the various issues have causes outside of restic which must be fixed there. Regarding the disappeared pack files the problem cause is so far unknown, it might be possible to use erasure coding across multiple files to recover lost files, but without knowing the root cause this has no guarantee of success. The cause of missing just a single blob is similarly unclear, although it might be possible to introduce some sanity checks to let prune abort in case of bitflips. Some of the encryption errors could probably be caught by verifying a pack file before uploading it. This should ideally happen after calculating the pack file hash, as that way makes it more likely to verify the data that is sent to be backend.

14 Likes

Would you say that running `restic check --read-data-subset 15%’ once a day (thus, verifying the whole thing once a week) would prevent most of these issues?

The “data loss” category is already detected by running a plain check. Some of the data corruptions (like truncated files) are also detected. Depending on how massive the data corruption is, it can become detectable even without using --read-data.

That said, checking each file on average every week is pretty aggressive. But it allows you to learn rather quickly when some data is corrupted. However, depending on the used hardware, there’s a really high chance that most repositories will never contain corrupt data. I haven’t seen data corruption in any of my repositories so far. So it’s a tradeoff between the overhead of frequent checks, how fast you’d learn about data corruption and whether/how frequently it occurs.

One more thing: check can only detect corruption but not prevent it. If supported by a backend, during a backup (or any other operation which uploads data) restic informs it about the hash a files is expected to have. That increases the chance that files are stored correctly or that the whole backup fails if not.