Recover corrupted repo


#1

Hi,

First of all, it’s just my test repo. So I don’t care about it. Just testing what happens if something will go wrong.

So I’ve just performed a few restic backup operations to new repo. Then I’ve started deleting of various files inside repo (to simulate certain disk/fs failures).

  1. I’ve ignored config file, and keys dir because as far as I understand, they are pretty small and usually immutable. So it’s possible to keep a few more copies of them in safe place.

  2. Removed one file in index. restic check now prints tons of errors about pack files that pack XXX: not referenced in any index. Running restic rebuild-index fixes everything. So all data is safe.

  3. Removed one file in snapshots dir. Unfortunately such repo passes restic check and even restic check --read-data without any errors :frowning: . Just one particular snapshot disappeared from restic snapshots. Probably it’s no longer possible to recover data from this snapshot. And even worse, if restic prune is called periodically (I’ve restic check && restic prune cron job for this), unique data from this snapshot will be purged from repo…

  4. Removed one file from data dir. restic check now fails. And this is very good. It’s also possible to partially restore snapshot. restic restore prints error message for every file that it’s unable to restore.

  5. Instead of removing, I’ve just modified one of files in data (without changing file size). restic check shows no errors. At the same time restic check --check-data fails. Which is good. restic prune shows will remove 1 invalid files but returns 0 exit code. At the same time restic check after prune fails. So my check && prune cron job will fail next time. Maybe it’s better to use check && prune && check instead.

I really don’t like what happens with snapshots dir. If repository location is not 100% trusted, then it’s better to backup snapshot files somewhere… I think that restic should handle this somehow. Maybe keep encrypted snapshot list somewhere else.

Btw is it any way to recover repository with corrupted file in data dir? I know that it might be possible to just restic backup same location again and if files were not changed on disk, restic will same them again and “fix” previous snapshots.

But what happens if data was changed? Any way to just ‘fix’ repo (by removing all unrecoverable files) and pass restic check --read-data?


#2

Hi, and welcome to the forum!

It’s a great idea to thoroughly test a system before trusting it with your data. Did you discover the restic design document yet? It describes the basic data structures and ideas behind restic and the restic repository layout.

I’ll try to respond to some of your test cases:

  1. Index files are basically just a shortcut to learn what’s stored within each of the files in data/ (called pack-files). By reading all the headers of all the pack files the index can be rebuild without a problem. That’s what restic rebuild-index does. Since this is a rather expensive operation, it is only done when rebuild-index is run manually.

  2. Removing files in the snapshots directory is serious, as you’ve discovered. It’s also mentioned in our threat model that you can find in the last section of the design document, here: https://github.com/restic/restic/blob/master/doc/design.rst#threat-model
    We could implement checks for vanishing snapshot files, but that’s also what the forget operation does. I’m not sure what we can do further if we assume that attackers are able to delete files. After all, they could also just overwrite the data in the repository with an older version of that repository, so all data saved in between is lost. Do you have any idea on how to improve that?

  3. That’s also a correct observation, restic check won’t detect modified files because that’s also an expensive operation. If you trust the server, you can e.g. use sha256sum running on the server to detect bit rot. But if you don’t trust the server, the only way to make sure that the data is unmodified is downloading and checking it. That’s what restic check --read-data does.
    We tried to find a middle ground between usability and paranoia here :slight_smile:

No, how would that be possible? There are plans to add error correction to the pack files so that accidentally bit rot can be corrected (up to some level of corruption), but that hasn’t been implemented yet.

As long as there are snapshots which reference data that is not there any more (for whatever reason), restic check will complain.


#3

Hi,

Thanks. Yes. I know about design doc. At the same time for now I don’t care about ‘attacker’ that tries to remove/replace files. I was thinking about disk/files corruption or loss.

Again, in case of data corruption single signed file with snapshot list will provide a way to at least detect such thing.

As about replacing files with older version, if such ‘snapshot list’ have some sort of timestamp (or just revision number), it’ll be possible to cache it locally (when metadata cache will be ready) and check that remote timestamp >= cached timestamp and warn/fail if this is not true.

Maybe I was not clear enough. I was asking not about ‘recover pack files’. But just fix repository to make it consistent again. Just to be able to continue backups to this repo, keep history and content of other files/snapshots. Pretty same thing like fsck for filesystem, like removing files from existing snapshots or marking them as ‘known to be missing’.


#4

This last point was also something I was wondering about: if you detect bit rot in a pack file, through the sha256 value not matching the filename, what should you do then (other sound alarms about the integrity of the storage)? Ideally, if the files represented in that pack are still in the live data set, and unchanged, the next backup would recreate the pack file.


#5

I can confirm that just deleting broken pack file and triggering backup again fixes repository(while source data is unchanged)


#6

Excellent – thank you for confirming!