Pruning a fresh repo, corruption?

Hi,

I have tried the following:

  • Create a new repository in the new 2 format.
  • Create an initial snapshot.
  • Creating the initial snapshot lasted several days, since it took place over a slowish upload and was interrupted several times. Orginal size of the source files was about 570 GB, compressed size 409 GB.
  • After the initial snapshot was finished, there is a batch job that runs every two hours to a) create a new snapshot and b) run restic forget on the repo.
  • restic forget is run with the following arguments: --keep-hourly 5 --keep-daily 40 --keep-weekly 9 --keep-monthly 9
  • After a few hours I interrupted the batch job to have a look around. This is the output of restic snapshots:
repository 2b5afae3 opened (repository version 2) successfully, password is correct
ID        Time                 Host                      Tags        Paths
--------------------------------------------------------------------------
b1f72e3d  2022-09-30 10:31:16  fortunamajor.xxx.local              /etc
                                                                     /root
                                                                     /var

10240e90  2022-10-03 09:17:18  fortunamajor.xxx.local              /etc
                                                                     /root
                                                                     /var

729718c2  2022-10-03 11:21:58  fortunamajor.xxx.local              /etc
                                                                     /root
                                                                     /var
--------------------------------------------------------------------------
3 snapshots

You can see the initial snapshot plus two later ones, which were created by the batch job. I have the full log of the two later ones (plus the forget operation) but omit it here for brevity. Both don’t show any errors, but I can include them if necessary.

Then, I ran restic prune on the repo. Again, I have the full logs, if necessary, but there where no errors indicated.

Then I did this:

  • Run restic check --read-data on the repo, I interrupted that command to have a look at disk usage.
  • Run restic check --read-data on the repo again, this time it ran to completion, obviously after a long time.

Unfortunately, there were errors shown:

using temporary cache in /tmp/restic-check-cache-4251669863
create exclusive lock for repository
load indexes
check all packs
check snapshots, trees and blobs
[0:10] 100.00%  3 / 3 snapshots

read all data
Pack ID does not match, want 7145d9f6be3b3982f14c4eea65d7955820a0b3cadca28e69fbe5bffa8b83ee4e, got 76c0fe665461705b18ea19c659be3d485dda02ca42e0b08ed22015e3e249f2ea
Pack ID does not match, want c3e6469dc6bb9822d7d21407ee4c28f9e1a4946ffaff4abf758bea71da318785, got dbe806445bf1f33e1ea7421a68f219303487b4834a247fad7095db74a83b5bd1
Pack ID does not match, want a7ff9fcbacd6bd49df99615ea0d7c2826fc0fa3e22e381e3d7f603e04885ba71, got 26f9037a18ceb6155fd5811cc571ff4bdda1499b8be47ecd7a3d065d355d47fb
Pack ID does not match, want d991bae11f97f02013c983f5c93066a64556a580d27509449a69746816f4a100, got c950146a4123ae5f723d900671b259d8be50bfa3d8629afc2103c5dbcae8f024
Pack ID does not match, want ea3292d4061fcb943545ae5208731389ff642fcd4779f5c0582555652bb33d7f, got 9c5d7fd54eacef4953344dfebbc8bece797196201bd8b8d421b26b2882601758
Pack ID does not match, want eb4ca650ecce8d592c1a9a3aaf2c52de1d3aa3ee459b6e8d238be9166530e8f3, got f442942d92e070cb6faf4ff33860a91240857450b2181c725826d81ebac236a5
Pack ID does not match, want 4ce429384b59976abd56e9d574e92f22a37f8a33596b9bfb2a54afcc03df0313, got 754f3b04ae5cc32d0b725614d4b4611ddc9900afee974f43494135dd07f81390
Pack ID does not match, want dd5585dc890bf6b4af8b7241ef97fcf552350a044e900cfff8244b725ebc8a5d, got eeef9f59b7079a624a375316c76fd5602f4d5614faf5c91675991d0677b4c487
Pack ID does not match, want 3ef1f3ef50cbc6f2c0f2a1fe370e59b136a6f379a4517b9f391f6d745527a5dd, got e44f68586df3942eaa09b5af1ddbfad5e141a8e0dc62f6c06544b953ee89c142
Pack ID does not match, want be0ad698c5d8de86adbcf8306d77274620b07643b2133b40706a315d441b8547, got 87927d07cc389b187a8467eb58e3fc7dd7dc3b5c2812c1b0f9e631cb20cc5c8f
Pack ID does not match, want 9f22ab0802dc28d0f9d0d458fad66f62ac3ea51b46ec00b757cc1d3bd6b7f815, got a6f7fd6d38bd98e7f573f8a937ca7f241a278620f409a9887c089f8321a6d161
Pack ID does not match, want cf2ea4cdc0045009b6eaef274431cc7e032e04c1a5be63f101ceb54622b748c8, got 1de3c6e8ee2051a8a61ab79414672f522d47aa5d7b98057af6bde706cafa72a4
[3:07:04] 100.00%  24719 / 24719 packs

Fatal: repository contains errors

I don’t understand how this is possible, a fresh repo showing corruption after a few, simple operations? Also, is this repairable?

Thanks a lot for your input,

I presume you are using restic 0.14.0, can you verify? What is your repository string/URL?

Yes, it is Restic 0.14.0.

The repo string is a bit involved. Let us call the backup source machine β€œS” and the backup target machine β€œT”. Backups are initiated from β€œT” by ssh-ing into β€œS” and opening a port back to β€œT”, like this

ssh -A -R 9999:127.0.0.1:22 "S" /path/to/backup/script

/path/to/backup/script then runs the backup on β€œS” using the repo string sftp://root@localhost:9999//q/rdir as the destination for the backup. /q/rdir is the name of the restic repo directory on β€œT”. The various public/private keypairs to authenticate all of this are in the right places.

I ran the pruning operation and restic check on β€œT”, but that should not matter much, or should it?

thanks,

The result should be identical assuming that both machines work correctly.

Having corrupted data after such a short amount of time and relatively small repository usually indicates some hardware problems. Could you run a memcheck and maybe prime95 or similar to test that the CPU works properly?

You called it right Michael, one of the memory chips in that machine (β€œT”) is indeed flaky, so obviously all bets are off.

In a way that is a relief, I would have hated to have found a bug that puts the integrity of all those restic backups out there into question …

best,

1 Like

You are not alone! Plenty cases here where restic uncovered hardware problems.