Error while triying to prune a huge repo

Hello everyone,
First of all thanks for this nice piece of software and for the help the comunity gives to everyone.

The environment:

  • I was in restic 0.9.6 and upgraded to restic 0.12.0 to be able to do a more efficient prune.
  • I am doing prune in local. I make backup to a remote machine, but I have shell access and restic also in that machine so I decided to do prune in local (I prefer to spend resources in teh machine where the repo is located)
  • The repo is quite big. Almost 8T of data in disk.
  • I have yet done a forget command

The command (executed from the directory where the repo is located):
restic -r . --cache-dir /foo/bar prune -vv

The output:
repository XXXX opened successfully, password is correct
created new cache in /foo/bar
loading indexesā€¦
loading all snapshotsā€¦
finding data that is still in use for 181 snapshots
[3:33:13] 100.00% 181 / 181 snapshots
searching used packsā€¦
collecting packs for deletion and repacking
will remove pack 8c41d0dd as it is unused and not indexed
will remove pack 8ce999df as it is unused and not indexed
[ā€¦]
will remove pack f3c27e3b as it is unused and not indexed
pack ec908d7a: calculated size 1744769 does not match real size 4224015
[3:32] 51.58% 894714 / 1734614 packs processed
Fatal: pack size does not match calculated size from index

Iā€™m not sure how to continue from here. Is this a corrupted repository?
Can I do something to repair it?

Any help is appreciated

1 Like

Iā€™d try

restic rebuild-index

then

restic check --read-data

If you find errors, you can, for example, search for the affected trees by doing:

restic find --tree ABCD1234

It will then print out all the snapshots that reference that tree. You can then restic forget the snapshot IDs. Afterwards, your database should pass another restic check --read-data and youā€™re safe to prune.

1 Like

@dantefff You have a pack file that should have another size (calculated by the index entries) than it actually has (as reported by listing the files in your repository) The file should be located in /data/ec/ec908d7a....

Unfortunately, this file is needed (else prune would simply delete it), so yes, your repo is corrupt.

Before trying to repair, you should run a restic check (and if access to your repo is cheap, even with --read-data) to see what error that reports. It should also report at least the same file size mismatch.

If access to your repo is expensive, manually download those corrupt files and run a sha256sum to check if the file is really corrupt. This is automatically done if you run check with --read-data.

This helps, if those files are valid but the index isnā€™t correct (for whatever reason). If the files are not valid, it ā€œjustā€ helps to remove them and the referenced blobs from the index. Hence if the pack files are corrupt, you should still see errors during a check (now errors that blobs are missing)

This is always a good idea, but note that --read-data downloads all files. If this is expensive, I woulnā€™t do that for a large repo. As written above, you can download suspicious pack files and manually check the sha256. There is also

which allows you to only give specific files to check.

A even better first try is to check if you can redo you backups for those snapshots. If you had run an rebuild-index and now blobs are missing, those blobs will be added if you run a backup and the blobs are still available on some files on you hard disc.

If that is not the case, then your repo is corrupt and cannot be completely repaired without loosing some data. You can forget the affected snapshots or, use this not yet reviewed PR which will find the snapshots for you and tries to salvage as much data as possible from affected snapshots:

2 Likes

Thanks a lot @akrabu and @alexweiss for your instructions.
If I have understood right:

  • As I have direct access from local to the repository I can easily do a sha256sum for the ec908d7aā€¦ file (indeed the sha256sum looks OK to me. Same as the file name).
  • As the access to my repo is cheap (I can do it from a inhouse local machine), Iā€™m now doing a restic -r . --cache-dir /foo/bar check --read-data
  • The restic check --read-data will find index errors and also data integrity errors (sha256sum mismatches).
  • If only index errors are encountered, a restic rebuild-index would be enough to sanityze de repo?
  • If there are also data integrity errors, after a restic rebuild-index I can do a new backup and if blobs are still in the original data they will be used to rebuild damaged backups. Is this right?
  • After making a new backup, if there are still some missing blobs (how can I see it? with a new restic check?) I can try to recover the damaged data with PR aawsome:new-repair-command or if I can live with that, forget the damaged snapshots located with restic find --tree ec908d7a

Is this this right?

Doing a check --read-data is even better in your situation

check --read-data reads all pack files and does even more checks than only a SHA256. It also decrypts all files, checks the blob hashes and compares them with the index.

In general, if check reports errors you do not expect, you should first try to find the root cause for them. They might indicate hardware problems or other severe things you definitively want to check out before continuing to rely on your backups!

rebuild-index does what the name stands for: It rebuilds the index. The point is that check can do many checks only if the index correctly represents the pack files. So after rebuild-index, always run another check.

Not exactly. If you have blobs missing from the index those will be re-saved during a backup run. So if the only errors that remain after rebuild-index can be healed by re-saving blobs (and those blobs are still generated by the data to backup), your repository will be healed. In this case, backup will print out some warnings. Also make sure you again run another check after that backup you assume should heal the repo.

Exactly. check will report missing blobs and once you remove the snapshots that need those blobs, your repo is in a sane state. The PR creates new snapshots which only rely on blobs that are present in the index and can remove (if you explicitely specify it) ā€œdefectā€ snapshots.

After you reached a sane state (and made sure that nothing is missing), you can run a prune to remove remaining unused data.

1 Like

Thanks a lot for your explanation. Just for completion, my repository only encountered that pack size mismatch error.

As you suggest in your comment, I made a disk check and everything looks good to me. May be the pack size mismatch error caused by network issues?

Anyway I rebuilt index made another check and now Iā€™m pruning as expected. Thanks a lot.

The interesting part about that error message is that the index did not contain all blobs which exist in the pack file (the calculated size is less than the real size). So itā€™s not the pack file which is incomplete but rather the repository index. In restic 0.9.6 it was possible that a part of the blobs of a pack file are listed in one index and the second part in another index. So maybe here only the first index was uploaded and the backup got interrupted afterwards.

Iā€™m not sure whether thatā€™s what has happened here, but it would be a possible scenario.

1 Like