Fix broken snapshot

Hello,

restic check reports, that there is an error and I that should follow Troubleshooting — restic 0.17.3 documentation.

I did all steps (apart from step 2) but restic check still reports an error.

$ restic check --read-data
using temporary cache in /tmp/restic-check-cache-753204985
create exclusive lock for repository
repository d0b94d05 opened (version 2, compression level auto)
created new cache in /tmp/restic-check-cache-753204985
load indexes
[0:06] 100.00%  3 / 3 index files loaded
check all packs
7 additional files were found in the repo, which likely contain duplicate data.
This is non-critical, you can run `restic prune` to correct this.
check snapshots, trees and blobs
error: failed to load snapshot cce5e7a4: LoadRaw(<snapshot/cce5e7a420>): invalid data returned
[0:16] 100.00%  246 / 246 snapshots
read all data
[3:37:46] 100.00%  42806 / 42806 packs

The repository is damaged and must be repaired. Please follow the troubleshooting guide at https://restic.readthedocs.io/en/stable/077_troubleshooting.html .

Fatal: repository contains errors

As a full backup was already done (optional step 4) I am fine with just getting rid of the broken snapshot (assuming that it is one). How do I fix this repo?

Kind regards

Start by running restic forget cce5e7a4, or if that doesn’t work because restic is unable to load that snapshot, you can also just delete that snapshot’s file in the snapshots/ folder in the repository.

Any idea why the data in that snapshot is invalid? Something caused it, so you might want to treat this as something potentially bigger and debug it a bit before removing the snapshot.

1 Like

Please provide some information on your setup, that is restic version, storage backend etc.

There should be a file at snapshots/cc/cce5e7a420[...] in the repository. What does shasum -a256 snapshots/cc/cce5e7a420[...] return? Is there anything suspicious about that file?

1 Like

I already deleted the snapshot via rm. (And now everything works again.)

The broken repository is saved on my NAS with a RAID 1 with three drives. Two WD drives (different models) and one drive (of which I don’t know the manufacturer right now). The underlying fs is btrfs.

I push new snapshots using sftp.

But this repository is just a secondary one. Each (external) system pushes to a secondary repository. And then there is a primary repository (also on the NAS) pulling snapshots (via the copy command). This is done to mitigate ransomware attracts.

I recently had (actually still have) problems with the restic process being killed due too heavy memory usage. Maybe that had an impact on the corrupted secondary repository? The process performs copy, forget and prune on the secondary repositories. Currently I paused this process.

I am currently running restic check --read-data on the primary repository to verify that is not broken. Afterwards I am going to investigate the OOM problem.

You’ve again dodged the question regarding the used restic version. Recent versions mostly ensure that files are added atomically to the repository, so a file should either be fully uploaded or not. So an OOM issue shouldn’t result in damaged snapshots (unless maybe the NAS crashes).

OOM shouldn’t :tm: be able to damage the repository, but either way it’s still important to investigate.