Pruning a fresh repo, corruption?

Hi,

I have tried the following:

  • Create a new repository in the new 2 format.
  • Create an initial snapshot.
  • Creating the initial snapshot lasted several days, since it took place over a slowish upload and was interrupted several times. Orginal size of the source files was about 570 GB, compressed size 409 GB.
  • After the initial snapshot was finished, there is a batch job that runs every two hours to a) create a new snapshot and b) run restic forget on the repo.
  • restic forget is run with the following arguments: --keep-hourly 5 --keep-daily 40 --keep-weekly 9 --keep-monthly 9
  • After a few hours I interrupted the batch job to have a look around. This is the output of restic snapshots:
repository 2b5afae3 opened (repository version 2) successfully, password is correct
ID        Time                 Host                      Tags        Paths
--------------------------------------------------------------------------
b1f72e3d  2022-09-30 10:31:16  fortunamajor.xxx.local              /etc
                                                                     /root
                                                                     /var

10240e90  2022-10-03 09:17:18  fortunamajor.xxx.local              /etc
                                                                     /root
                                                                     /var

729718c2  2022-10-03 11:21:58  fortunamajor.xxx.local              /etc
                                                                     /root
                                                                     /var
--------------------------------------------------------------------------
3 snapshots

You can see the initial snapshot plus two later ones, which were created by the batch job. I have the full log of the two later ones (plus the forget operation) but omit it here for brevity. Both don’t show any errors, but I can include them if necessary.

Then, I ran restic prune on the repo. Again, I have the full logs, if necessary, but there where no errors indicated.

Then I did this:

  • Run restic check --read-data on the repo, I interrupted that command to have a look at disk usage.
  • Run restic check --read-data on the repo again, this time it ran to completion, obviously after a long time.

Unfortunately, there were errors shown:

using temporary cache in /tmp/restic-check-cache-4251669863
create exclusive lock for repository
load indexes
check all packs
check snapshots, trees and blobs
[0:10] 100.00%  3 / 3 snapshots

read all data
Pack ID does not match, want 7145d9f6be3b3982f14c4eea65d7955820a0b3cadca28e69fbe5bffa8b83ee4e, got 76c0fe665461705b18ea19c659be3d485dda02ca42e0b08ed22015e3e249f2ea
Pack ID does not match, want c3e6469dc6bb9822d7d21407ee4c28f9e1a4946ffaff4abf758bea71da318785, got dbe806445bf1f33e1ea7421a68f219303487b4834a247fad7095db74a83b5bd1
Pack ID does not match, want a7ff9fcbacd6bd49df99615ea0d7c2826fc0fa3e22e381e3d7f603e04885ba71, got 26f9037a18ceb6155fd5811cc571ff4bdda1499b8be47ecd7a3d065d355d47fb
Pack ID does not match, want d991bae11f97f02013c983f5c93066a64556a580d27509449a69746816f4a100, got c950146a4123ae5f723d900671b259d8be50bfa3d8629afc2103c5dbcae8f024
Pack ID does not match, want ea3292d4061fcb943545ae5208731389ff642fcd4779f5c0582555652bb33d7f, got 9c5d7fd54eacef4953344dfebbc8bece797196201bd8b8d421b26b2882601758
Pack ID does not match, want eb4ca650ecce8d592c1a9a3aaf2c52de1d3aa3ee459b6e8d238be9166530e8f3, got f442942d92e070cb6faf4ff33860a91240857450b2181c725826d81ebac236a5
Pack ID does not match, want 4ce429384b59976abd56e9d574e92f22a37f8a33596b9bfb2a54afcc03df0313, got 754f3b04ae5cc32d0b725614d4b4611ddc9900afee974f43494135dd07f81390
Pack ID does not match, want dd5585dc890bf6b4af8b7241ef97fcf552350a044e900cfff8244b725ebc8a5d, got eeef9f59b7079a624a375316c76fd5602f4d5614faf5c91675991d0677b4c487
Pack ID does not match, want 3ef1f3ef50cbc6f2c0f2a1fe370e59b136a6f379a4517b9f391f6d745527a5dd, got e44f68586df3942eaa09b5af1ddbfad5e141a8e0dc62f6c06544b953ee89c142
Pack ID does not match, want be0ad698c5d8de86adbcf8306d77274620b07643b2133b40706a315d441b8547, got 87927d07cc389b187a8467eb58e3fc7dd7dc3b5c2812c1b0f9e631cb20cc5c8f
Pack ID does not match, want 9f22ab0802dc28d0f9d0d458fad66f62ac3ea51b46ec00b757cc1d3bd6b7f815, got a6f7fd6d38bd98e7f573f8a937ca7f241a278620f409a9887c089f8321a6d161
Pack ID does not match, want cf2ea4cdc0045009b6eaef274431cc7e032e04c1a5be63f101ceb54622b748c8, got 1de3c6e8ee2051a8a61ab79414672f522d47aa5d7b98057af6bde706cafa72a4
[3:07:04] 100.00%  24719 / 24719 packs

Fatal: repository contains errors

I don’t understand how this is possible, a fresh repo showing corruption after a few, simple operations? Also, is this repairable?

Thanks a lot for your input,

I presume you are using restic 0.14.0, can you verify? What is your repository string/URL?

Yes, it is Restic 0.14.0.

The repo string is a bit involved. Let us call the backup source machine “S” and the backup target machine “T”. Backups are initiated from “T” by ssh-ing into “S” and opening a port back to “T”, like this

ssh -A -R 9999:127.0.0.1:22 "S" /path/to/backup/script

/path/to/backup/script then runs the backup on “S” using the repo string sftp://root@localhost:9999//q/rdir as the destination for the backup. /q/rdir is the name of the restic repo directory on “T”. The various public/private keypairs to authenticate all of this are in the right places.

I ran the pruning operation and restic check on “T”, but that should not matter much, or should it?

thanks,

The result should be identical assuming that both machines work correctly.

Having corrupted data after such a short amount of time and relatively small repository usually indicates some hardware problems. Could you run a memcheck and maybe prime95 or similar to test that the CPU works properly?

You called it right Michael, one of the memory chips in that machine (“T”) is indeed flaky, so obviously all bets are off.

In a way that is a relief, I would have hated to have found a bug that puts the integrity of all those restic backups out there into question …

best,

1 Like

You are not alone! Plenty cases here where restic uncovered hardware problems.