Advice for broken repo

Many months ago I performed restic check on a 2 TB repo (maybe for the 1st time) and received errors.

Sorry for my incomplete memory (and not having found notes abour what exactly I did back then, yet):

1st thing I did with restic was rebuilding the index. I’m not 100% sure what I did next (definitely I did further index rebuild runs, maybe some other operation if suggested by the program). Anyway at some point I decided to take a backup of the repo before doing anything further. (had got the impression I might be on a wrong path)

In lack of time, storage space and money (and due to having an intact repo with backup of my most important data) I did not progress on the issue.

Now I have an SSD-device big enough to continue working on that. (while still short of time)

I copied the broken repo to my 4 TB SSD and did a comparision of filenames with hashes I got using sha25sum.

What I found is that there are ~200 pack files with non matching hashes and all of them seem to have been written during successive backups I had taken. (on different days)

Additionally I did restic check --read-data with max verbosity
a) with current index-dir contents
b) after replacing the index-drirectory with tared states I obviously had created during my initial tries to repair before copying the repo to another device.

From a) I got (current=newest index-dir contents):


using temporary cache in /tmp/restic-check-cache-...
enter password for repository: 
create exclusive lock for repository
load indexes
check all packs
pack ...: not referenced in any index

[...similar lines...]

166 additional files were found in the repo, which likely contain duplicate data.
This is non-critical, you can run `restic prune` to correct this.
check snapshots, trees and blobs
error for tree ...:
  id ... not found in repository
error for tree ...:
  tree ...: file "SOMElnkFILE" blob ... not found in index
error for tree ...:
  id ... not found in repository
error for tree ...:
  tree ...: file "SOMEmailboxFILE" blob ... not found in index
  tree ...: file "SOMEmailboxFILE" blob ... not found in index
  tree ...: file "SOMEmailboxFILE" blob ... not found in index
  tree ...: file "SOMEmailboxFILE" blob ... not found in index

[...similar blocks...]

error for tree ...:
  id ... not found in repository

[...similar errorblocks for trees of these 2 types ("blob not found in index" and "id not found in repository")...]

[0:03] 100.00%  ... / ... snapshots

read all data
Pack ID does not match, want ...., got ....

[...similar lines... 24 in total, not ~200 as above with sha256sum]


[1:51:05] 100.00%  ... / ... packs

Fatal: repository contains errors

Result of b) (after restoring older index backup):

using temporary cache in /tmp/restic-check-cache-...
enter password for repository: 
create exclusive lock for repository
load indexes
Load(<index/...>, 0, 0) returned error, retrying after ...ms: invalid data returned
error: error loading index ...: load(<index/...>): invalid data returned
Fatal: LoadIndex returned errors

Result of b) (after restoring NEWER index backup):

using temporary cache in /tmp/restic-check-cache-...
enter password for repository: 
create exclusive lock for repository
load indexes
check all packs
pack ...: not referenced in any index

[...similar lines...]

300 additional files were found in the repo, which likely contain duplicate data.
This is non-critical, you can run `restic prune` to correct this.
check snapshots, trees and blobs
error for tree ...:
  tree ...: file "SOMEmd5FILE05" blob ... not found in index
  tree ...: file "SOMEmd5FILE06" blob ... not found in index
error for tree ...:
  tree ...: file "SOMEmd5FILE05" blob ... not found in index
  tree ...: file "SOMEmd5FILE06" blob ... not found in index
error for tree ...:
  tree ...: file "SOMEpdf" blob ... not found in index

[...similar lines...]

error for tree ...:

[...similar details about blobs not found in index as above...]

[...further blocks about trees with blobs not found in index]   

error for tree ...:
  tree ...: file "SOMEpropertiesFILE" blob ... not found in index
error for tree ...:
  id ... not found in repository
error for tree ...:
  id ... not found in repository
[0:03] 100.00%  .../ ... snapshots

read all data
Pack ID does not match, want ..., got ...
[further similar lines]
[1:50:55] 100.00%  ... / ... packs

Fatal: repository contains errors

Is it a sensible option to try with restored index-dir contents? (vs possible inconsistencies that are capable of making things even worse)

Any idea what might have corrupted the repo?

How would you recommend to proceed? (if at all)

Many thanks in davance!

Versions used for backups:
restic_0.9.6_windows_amd64.exe
restic_0.11.0_windows_amd64.exe (date of files with unmatching hash is before updating to this version)

Version used for check back then and rebuilding index:
restic_0.11.0_windows_amd64.exe

Version now used for investigation:
restic 0.16.0 in Linux

…I’ve found my notes from last year…

Last Backup

s:\>restic_0.11.0_windows_amd64.exe backup --cache-dir MYCACHEDIR\ -r SMYREPO\ --tag MYTAG1 --tag MYTAG2 --tag MYTAG3  DIR2BACKUP1\ DIR2BACKUP2\
enter password for repository:
repository SOMEID opened successfully, password is correct

Files:         ... new,     ... changed, ... unmodified
Dirs:           ... new,     ... changed,  ... unmodified
Added to the repo: ... GiB

processed ... files, ... GiB in ...
snapshot SNAPSHOTID saved

s:\>


1st Check showing issues

s:\>restic_0.11.0_windows_amd64.exe check --cache-dir MYCACHEDIR\ -r MYREPO\ --read-data-subset 1/20
using temporary cache in MYCHECKCACHEDIR\restic-check-cache-...
enter password for repository:
repository SOMEID opened successfully, password is correct
created new cache in MYCHECKCACHEDIR\restic-check-cache-...
create exclusive lock for repository
load indexes
check all packs
pack ...: does not exist
[...similar messages...]
check snapshots, trees and blobs
read group #1 of ... data packs (out of total ... packs in .. groups)
Load(<data/...>, 0, 0) returned error, retrying after ...ms: open \\?\MYREPO\data\dc\...: Das System kann die angegebene Datei nicht finden.
[...similar messages...]
checkPack: Load: open \\?\MYREPO\data\dc\...: Das System kann die angegebene Datei nicht finden.
Load(<data/...>, 0, 0) returned error, retrying after 5...ms: open \\?\MYREPO\data\00\...: Das System kann die angegebene Datei nicht finden.
[...similar messages...]
checkPack: Load: open \\?\MYREPO\data\00\...: Das System kann die angegebene Datei nicht finden.
Load(<data/...>, 0, 0) returned error, retrying after ...ms: open \\?\MYREPO\data\c8\...: Das System kann die angegebene Datei nicht finden.
[...similar messages...]
checkPack: Load: open \\?\MYREPO\data\c8\...: Das System kann die angegebene Datei nicht finden.
[...]
Pack ID does not match, want ..., got ...
..interrupted at
[35:07] 54.20%  ... / ... packs


Another Check

[command like 1st Check]
[..output similar..]
[1:02:01] 100.00%  ... / ... packs
Fatal: repository contains errors

s:\>

chkdsk /f (ruined index?)

???

Check after chkdsk /f

error loading index xxxxxx, invalid data returned
Fatal: LoadIndex returned errors

Rebuild-Index

S:\>restic_0.11.0_windows_amd64.exe rebuild-index -r MYREPO\
enter password for repository:
repository SOMEID opened successfully, password is correct
found 5 old cache directories in ... run `restic cache --cleanup` to remove them
counting files in repo
[5:58:22] 100.00%  ... / ... packs
finding old index files
saved new indexes as
remove ... old index files
[0:00] 100.00%  ... / ... files deleted

S:\>

Check after Rebuild-Index

Fatal: repository contains errors
(similar messages as during "Another Check" (see above), but NUMBER OF ERRORS INCREASED BY ABOUT FACTOR 2)

That’s what made me decide to not do anything further before taking a backup of the repo.

Conclusions by now:

  • chkdsk /f might be able to do harm
  • ALWAYS take a backup of broken repos BEFORE doing sth. further that might include write acess to the backup media (in this case USB-HDD with SMR)

Any ideas

  • how to recover as much as possible out of that broken repo
  • how to avoid similar issues in the future (additionally to always take a backup FIRST)
    or other comments are welcome.

Many thanks!

I think hardware problems are the common cause of repository errors. Check your hardware, and follow this procedure if it’s relevant:

Additional question:
Is there a way to find out which files (list of “full_path/filename in snapshot XYZ”) are affected?
(other than testing by trying to restore each snapshot)

Thank you for your hint.

@everyone

  • Is it too late for Route 1 with my already rebuilt index?
  • I think for minimum data loss Route 1 and Route 2 fit best. Which one would you give a try 1st?

An advice in generel: if I would have a problem with a broken repository, first I would create a local copy. And a copy of the copy, used as a playground. Then try option 1.
If it doesn’t solve the problem, throw the playground away, create a new one and try the next option.

1 Like

Please upgrade everything to restic 0.16.0. The restic versions are quite old and we’ve improved the reliability of the storage backends in the meantime.

This is likely a case of shooting the messenger (don’t :wink: ). The likely chain of events is that the filesystem was damaged for some reason and chkdsk just exposed the mess left behind by the underlying problem.

Without the information from the index, it (likely) won’t be possible to salvage the pack files that don’t match their hash. Thus, it’s probably best to just follow the steps from the documentation: Troubleshooting — restic 0.16.3 documentation (requires restic 0.16.0).