How to recover from "snapshot does not exist"?

moriraaca · August 23, 2023, 6:39am

Hello all,

I have a wierd problem with my restic setup. Everything was going on fine till recently when things went south.

I’ve run the check command recently, and it has failed with a bunch of errors <index/f9fdb5db6a> does not exist. In an attempt to fix things I’ve tried running rebuild-index and prune, but now it seems the error has changed and situation kind of got worse…

When I run rebuild-index, it does what it does, and ends with the result 46 / 46 files deleted. The thing is, if I run the same command again, it again says it’s deleted 46 files… the number doesn’t change no matter how many times I run it, so it seems to me that the files aren’t actually deleted.
When I run prune, it spams <snapshot/b95bbfb725> does not exist error, and eventually fails with a SIGSEV (example exceptions attached at the end) - this SIGSEV happens every time, it’s not a fluke, although it looks a bit differently each time.
When I run check, it spams <snapshot/b95bbfb725> does not exist again, although this time there are no exceptions, it just eventually fails (error: <snapshot/b95bbfb725> does not exist => Fatal: repository contains errors) because of the missing snapshot
I’ve decided to just remove this snapshot to fix things, but when I run forget b95bbfb725 it again spams the same error <snapshot/b95bbfb725> does not exist, and eventually fails, so the snapshot isn’t removed.

I’m not sure what to do next. The funny thing is, the snapshot does seem to exist - I can see a file with that name in my cloud backend, I can access it, download it etc. Tbh, I don’t actually care about this snapshot - I do regular backups and I have plenty of snapshots that should be fine - but it seems I can’t even remove it.

Any recommendations how to proceed? I’d of course very much like to NOT loose all of my backup - loosing this one snapshot is fine.

I appreciate all the help, thank you!

finding data that is still in use for 474 snapshots
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x7a4b80]

goroutine 27 [running]:
main.getUsedBlobs.func1({0xb9, 0x5b, 0xbf, 0xb7, 0x25, 0xc3, 0xad, 0x2a, 0x2a, 0x50, ...}, ...)
        /restic/cmd/restic/cmd_prune.go:558 +0x60
github.com/restic/restic/internal/restic.ForAllSnapshots.func2()
        /restic/internal/restic/snapshot.go:113 +0x1a0
golang.org/x/sync/errgroup.(*Group).Go.func1()
        /home/build/go/pkg/mod/golang.org/x/sync@v0.0.0-20210220032951-036812b2e83c/errgroup/errgroup.go:57 +0x64
created by golang.org/x/sync/errgroup.(*Group).Go
        /home/build/go/pkg/mod/golang.org/x/sync@v0.0.0-20210220032951-036812b2e83c/errgroup/errgroup.go:54 +0x90

finding data that is still in use for 474 snapshots
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x7a4b80]

goroutine 15 [running]:
main.getUsedBlobs.func1({0xb9, 0x5b, 0xbf, 0xb7, 0x25, 0xc3, 0xad, 0x2a, 0x2a, 0x50, ...}, ...)
        /restic/cmd/restic/cmd_prune.go:558 +0x60
github.com/restic/restic/internal/restic.ForAllSnapshots.func2()
        /restic/internal/restic/snapshot.go:113 +0x1a0
golang.org/x/sync/errgroup.(*Group).Go.func1()
        /home/build/go/pkg/mod/golang.org/x/sync@v0.0.0-20210220032951-036812b2e83c/errgroup/errgroup.go:57 +0x64
created by golang.org/x/sync/errgroup.(*Group).Go
        /home/build/go/pkg/mod/golang.org/x/sync@v0.0.0-20210220032951-036812b2e83c/errgroup/errgroup.go:54 +0x90

rawtaz · August 23, 2023, 9:46am

What system are you running this on/from, what restic version, what backend, what cloud provider?

Might want to run a memory test on the system.

moriraaca · August 23, 2023, 10:15am

Hello, thanks for the reply.

I’m running this on Synology NAS DS418 where also the data is stored on. The backend is jottacloud via rclone.

The restic version is restic 0.13.1 compiled with go1.18 on linux/arm64 - I realize it’s not the most recent one, but I was a bit afraid to change it before I get my repo in order.

If that’s a memory problem, what are my options since I can’t unfortunately replace memory on this device (and there’s no ECC)?

rawtaz · August 23, 2023, 10:32am

IMO you can definitely upgrade to the latest restic version, it’s backwards compatible.

Sounds like there’s problems with the backend/Jottacloud side of things, but I could be wrong (I’m not an expert on the internals of restic). Try it with the new version and see if things improve for starters?

moriraaca · August 24, 2023, 4:30pm

Ok, a little bit of update, this whole thing is a bit weird tbh - tl;dr the problems have vanished but I don’t know why…

What I did:

I’ve used restic 0.16.0 AND another machine which I’m fairly certain doesn’t have issues with RAM. Running the same commands results where pretty much identical i.e. errors with the same snapshot mentioned - the only difference was that there was no SIGSEGV
As I was running those commnads, I’ve noticed the rebuild-index command has changed to repair (index|snapshot), and as I was reading through the manual, I was reminded there’s a --read-all-packs switch. I’ve run repair snapshot (nothing significant has happened) and then repair index --read-all-packs - it took a while, which is why I wasn’t posting any updates.
The repair index finished earlier today, and I’ve checked now and it seems all commands (including check and backup) work without any errors now.

I’m not sure now what’s happened. I can stil see the b95bbfb725 snapshot on the list, so I’m not sure if repair index did anything important. My working theory is that it was indeed a problem with my backend provider, jottacloud (wouldn’t be the first time tbh, they are cheap but they occasionally have issues - usually they don’t last for that long though)

Whatever it was - it’s fixed now, thanks for the help @rawtaz !