Pack ID does not match - Repository corrupted?

Hello,
Last night I ran a backup for which I forgot to exclude a directory (~500Mb). I did not want to leave that data in the repository, so I executed a forget on that snapshot only. After that I executed a prune: there were no errors

resticbkp@picloud:~ $ restic -r repository/morgoth prune
repository 8fe564d8 opened successfully, password is correct
counting files in repo
building new index for repo
[0:11] 100.00%  17932 / 17932 packs
repository contains 17932 packs (161945 blobs) with 84.270 GiB
processed 161945 blobs: 0 duplicate blobs, 0 B duplicate
load all snapshots
find data that is still in use for 53 snapshots
[0:34] 100.00%  53 / 53 snapshots
found 158592 of 161945 data blobs still in use, removing 3353 blobs
will remove 0 invalid files
will delete 113 packs and rewrite 0 packs, this frees 512.030 MiB
counting files in repo
[0:01] 100.00%  17819 / 17819 packs
finding old index files
saved new indexes as [7db67b69 c2b34ea6 04c757fd 1b3093b5 aa617f3d 93915c80]
remove 92 old index files
[0:00] 100.00%  113 / 113 packs deleted
done
resticbkp@picloud:~ $

I then executed a check --read-data and I got 3 errors about Pack ID does not match:

Pack ID does not match, want a3f75476, got 7015676c
Pack ID does not match, want 005a0a28, got 23cc97ee
Pack ID does not match, want 2118dddb, got 31bd9e31

At this point I was still working locally on my Raspberry Pi 4. Just to be sure I stopped the Pi and connected the disk to my PC (Fedora Linux 31) for further verifications.

I executed a fsck and it did not find any error. I executed check --read-data again and it found the same pack in error. Using resitc 0.9.6 on both OS. I normally backup my Linux PC to the Pi using rclone and SFTP

In the past months I did execute check --read-data on few occasions. I check the repositories on a monthly basis. I also checked that repository when I moved it to a new SSD. The result of the all of the previous check showed no errors. The first errors appeared after the forget and prune that I did today.

I was able to find the 3 packs with restic find --pack <id>. I won’t paste it here because the packs are found in many blobs. All I can say is that those packs were found in snapshots no older than 2 weeks.

I rebuilt the index and ran a new check and I got errors on the same packs.

I tried to verify the checksum: the filename does not match the checksum. If I remember it should match

23cc97eebcfa1b15ab275a25fb186273e4f3fc8d5580526a5a96fcc24487418e  ./morgoth/data/00/005a0a28be2de035325a6d71e53249369e7ea735dafed564e3e91a8ae40d7b99
7015676ce037f8c33f323f4886a85c5bb6c1dc2a9b7b20753b86e44dfe1f696e  ./morgoth/data/a3/a3f754764caff69ae4f8ff31d9ad029ac3252d034a5203b2bdf472d5d27b4885
31bd9e31aaceb9a72917f50103063052658564f40673e35de8503192e10a6dec  ./morgoth/data/21/2118dddbfbfbfb99aabf2f04a851ddee5dcf65d334cf0ffa6aa2ccc4e9f42d6d

I do rememer my Pi crashing twice this month with this new SSD but there was no application accessing the repository. I also had to run fsck to fix errors. I wonder if it could be a bad SSD, but I have no idea how to verify that.

Any idea of what is happening ? Is there anyway to recover/repair this repository ?

Thanks

The prune run which you’ve posted, shouldn’t be the reason for the damaged pack files. restic just deleted a bunch of packs, but didn’t rewrite any packs. As the pack error shows up on both your raspberry and the PC, it looks like the pack files were somehow damaged. But as restic does not modify already existing packs, I would assume that the actual culprit is something else.

As prune seems to be able to access all tree blobs, the damaged packs probably contain data blobs. You could try the following, which should be able to recover the missing blobs and repair the repository if the corresponding files are still in the folders you backup: Create a copy of the index folder of your repository along with the three damaged packs. Then remove the three packs from the data folder. Run restic rebuild-index to recreate the index and thus remove the removed packs from the index, then run restic backup --force ... to let restic rescan the backup folders. If restic found all the missing data blobs, then restic check --read-data should complete successfully now.

Otherwise you could also remove the affected snapshots and let prune cleanup the repository afterwards.

1 Like

I had in mind to try your last proposal when I came back from work, but I decided to try the first one. I checked again the list of files returned by the find --pack to make sure all files were still present on my system.

So I did a backup copy the index folder and the packs, rebuilt the index and ran a backup with the --force flag. No error during the backup.

I than ran a check --read-data and the output was really not nice to see. Here’s an extract:

resticbkp@picloud:~/repository $ restic check --read-data
using temporary cache in /tmp/restic-check-cache-634341475
repository 8fe564d8 opened successfully, password is correct
created new cache in /tmp/restic-check-cache-634341475
create exclusive lock for repository
load indexes
check all packs
check snapshots, trees and blobs
error for tree a9ae2d9d:
  tree a9ae2d9d: file ".bash_history" blob 0 size could not be found
  tree a9ae2d9d: file ".viminfo" blob 0 size could not be found
  tree a9ae2d9d, blob 3e316573: not found in index
  tree a9ae2d9d, blob fc7b2c45: not found in index
error for tree b8cc2130:
  tree b8cc2130: file "trims.prefs" blob 0 size could not be found
  tree b8cc2130, blob a2584c95: not found in index
error for tree 3418bd68:
  tree 3418bd68: file "gwenviewrc" blob 0 size could not be found
  tree 3418bd68: file "okularrc" blob 0 size could not be found
  tree 3418bd68, blob ff7c7632: not found in index
  tree 3418bd68, blob 66244af9: not found in index
error for tree c96ad946:
  tree c96ad946: file "QtProject.conf" blob 0 size could not be found
  tree c96ad946: file "gtkrc" blob 0 size could not be found
  tree c96ad946: file "gtkrc-2.0" blob 0 size could not be found
  tree c96ad946: file "gwenviewrc" blob 0 size could not be found
  tree c96ad946: file "kdialogrc" blob 0 size could not be found
  tree c96ad946: file "korgacrc" blob 0 size could not be found
  tree c96ad946: file "okularrc" blob 0 size could not be found
  tree c96ad946: file "plasma-org.kde.plasma.desktop-appletsrc" blob 0 size could not be found
  tree c96ad946: file "plasmashellrc" blob 0 size could not be found
  tree c96ad946, blob 15c474dc: not found in index

I decided to stop there and to move to the next steps. I deleted all snapshots made in the last 2 weeks. These were the snapshots where I found the packs in error. No error with the forget and prune command

resticbkp@picloud:~/repository $ restic forget 9c3e0c37 29f5d6b1 ec396bdd
repository 8fe564d8 opened successfully, password is correct
removed snapshot 9c3e0c37
removed snapshot 29f5d6b1
removed snapshot ec396bdd

resticbkp@picloud:~/repository $ restic prune
repository 8fe564d8 opened successfully, password is correct
counting files in repo
building new index for repo
[0:01] 100.00%  17925 / 17925 packs
repository contains 17925 packs (161940 blobs) with 84.244 GiB
processed 161940 blobs: 0 duplicate blobs, 0 B duplicate
load all snapshots
find data that is still in use for 51 snapshots
[0:35] 100.00%  51 / 51 snapshots
found 157361 of 161940 data blobs still in use, removing 4579 blobs
will remove 0 invalid files
will delete 131 packs and rewrite 0 packs, this frees 584.381 MiB
counting files in repo
[0:01] 100.00%  17794 / 17794 packs
finding old index files
saved new indexes as [2787f0ed e8d72b41 2456834a bc57d04c 66614b4e 5db54182]
remove 8 old index files
[0:00] 100.00%  131 / 131 packs deleted
done

Finally I ran check --read-data and this time it completed without any error !!

resticbkp@picloud:~/repository $ restic check --read-data
using temporary cache in /tmp/restic-check-cache-898284538
repository 8fe564d8 opened successfully, password is correct
created new cache in /tmp/restic-check-cache-898284538
create exclusive lock for repository
load indexes
check all packs
check snapshots, trees and blobs
read all data
[1:57:45] 100.00%  17794 / 17794 items
duration: 1:57:45
no errors were found
resticbkp@picloud:~/repository $ 

I still have no clue what happened here. I checked all the backup logs for this repository and I found no error at all. Fortunately I have 2 other repositories but I’m worried that a repositoty can get damanged for no visible reason. As a safety measure maybe I should make an extra backup using a different tool (borg, duplicati).

As a precaution I will run a memory test on my computer. I remember reading some posts about problem that could be related to hardware issue.

Ah, well with dot files such as .bash_history there’s not much of a chance to have to file version from the backup still on disk. And as restic currently does not have the capabilities to salvage blobs from damaged packs or to remove files/directories from a snapshots, I’m afraid that deleting the damaged snapshots was for now the only way to repair the repository.

I still wonder how the packs got damaged though. Such damaged repositories have been frequently caused by hardware problems, but there might also be some bugs lurking in restic.

I ran memory test with memtest over the night. It completed 8 passes and no error were found. I also ran few self-test on the SSD itself. Again, no error.

So it’s unclear what happened. All logs for backup on this repository completed without a single warning or error and hardware tests does not show problem. To be honest I start to be really afraid to use restic for my backup because problem appear in the repository for no reason.

First off, I understand your concern. Personally I wouldn’t feel worried that it’s a bug in restic (for various reasons), but I totally understand that you might be a bit worried about what you’re seeing.

Assuming it’s a hardware issue (which I’d think), the tricky part is figuring out what it is. Memory issues are the easy one, but there’s other hardware that can go bad as well, such as SSDs. I’d also be weary of USB controllers. Even cables have been known to cause corruption in certain cases (not suggesting it’s the cause here though).

That said, I’m just curious about the following:

  • You said the affected packs are from snapshots not older than two weeks. Do you know if you have run any checks within those two weeks and before the time that you backed up the snapshot you later removed?

  • What SSD are you using, and is the firmware on it up to date, you think?

  • What self-test did you run on the SSD?

  • How is the SSD containing your repository connected to the Pi and your computer?

To be honest I don’t think i ran any check on the repository in the last 2 week. In that period I had only 3 snapshots. Since restic didn’t return any warning or error I did not bother checking the repository at that time. I usually check the repositories every month, that’s every 5-6 snapshots.

The SSD for the Pi is a SANDISK - 480 Go Plus. I’ll have to verify for the firmware. It’s usually done through the sandisk ssd dashboard but it’s recommended only for internal storage.

I just realized I’ve been a bit stupid: I did run a self-test on a SSD … but on my main computer, not on the on the Pi :man_facepalming:

Normally this repository is accessed using SSH. However for maintenance, like check, I will run restic locally. I always make sure to use the same restic version.

edit: I ignored the sandisk "recommendatio"n. I temporarily connected the disk over USB only to check the firmware version. The disk has the latest firmware available.

Thanks for the info. I guess we will never know for sure what caused those bad packs. Hardware issues can be intermittent and not always reproducible at will :frowning:

Since you have two other repo’s I’d just continue backing up with such a good strategy and make sure to do my checking of them once in a while. If one would show problems, I’d have the others.

I’m still suspicious of the drive or the USB connection though :male_detective:

That is exactly what I was thinking. I’ll try to see if I can find hardware faults, but it could be very hard to find.

I think the best strategy for now is to continue to use multiple repositories and check them regularly. I’ll also see if I can have extra backups with a different tool (e.g. borg).

Anyway, thanks a lot for the help and advice :slight_smile:

Yep, perfect. Using multiple repositories and two separate tools very much increases your resilience.

1 Like