Error breaking all operations on restic: Unable to decode index, invalid character

Restic version

$restic version
restic 0.11.0 compiled with go1.15.3 on linux/amd64

Problem
Hi! So recently backup process stopped and when checking why I got this error:

Fatal: unable to decode index 36852324: DecodeIndex: invalid character ‘@’ after object key

I figured there’s something wrong with the index so I tried rebuilding the index.

$restic rebuild-index
repository e22ded59 opened successfully, password is correct
counting files in repo
List(data) returned error, retrying after 462.318748ms: XML syntax error on line 2: invalid UTF-8
List(data) returned error, retrying after 890.117305ms: XML syntax error on line 2: invalid UTF-8
[18:23] 5.79% 75999 / 1311492 packs
XML syntax error on line 2: invalid UTF-8

Did a check on the repo (added spaces in the links before .com and .org so the forum would allow me to post this)

$restic check
using temporary cache in /tmp/restic-check-cache-444495941
repository e22ded59 opened successfully, password is correct
created new cache in /tmp/restic-check-cache-444495941
create exclusive lock for repository
load indexes
panic: assignment to entry in nil map
goroutine 150 [running]:
github .com/restic/restic/internal/restic.IDSet.Insert(…)
/restic/internal/restic/idset.go:26
github .com/restic/restic/internal/checker.(*Checker).LoadIndex.func4(0x0, 0x0)
/restic/internal/checker/checker.go:172 +0x306
golang .org/x/sync/errgroup.(*Group).Go.func1(0xc00058c9c0, 0xc0001b0680)
/home/build/go/pkg/mod/golang.org/x/sync@v0.0.0-20200625203802-6e8e738ad208/errgroup/errgroup.go:57 +0x59
created by golang .org/x/sync/errgroup.(*Group).Go
/home/build/go/pkg/mod/golang.org/x/sync@v0.0.0-20200625203802-6e8e738ad208/errgroup/errgroup.go:54 +0x66

Tried running restic prune

$restic prune
repository e22ded59 opened successfully, password is correct
Fatal: unable to decode index f1ba7db2: DecodeIndex: invalid character ‘\x14’ in string literal

And also restic recover:

$restic recover
repository e22ded59 opened successfully, password is correct
load index files
Fatal: unable to decode index f1ba7db2: DecodeIndex: invalid character ‘\x14’ in string literal

I think this might be related to a forcible stop of the backup process using killall restic which hasn’t caused such problems before, but I see how it could lead to issues. But it seems to be something novel as I get nothing when googling these index errors I’m seeing.
Any ideas what else I could try to fix the situation? The backup itself is about 4.5TB

@markusli It looks like you’re having severe hardware problems:

This indicates a hardware problem on the host which has created that index file. The index data seems to be broken, while at the same time it is correctly authenticated. The most likely way to cause that is a bit-flip in hardware (either CPU/Bus/RAM). There’s been also a report about problems with older versions of a Linux 5.2 - 5.4 kernel.

Which backend do you use? Could that be a network problem? Hardware problems would also be a possible cause for that.

That too looks like a hardware problem. The missing map is created immediately before at line 170 in checker.go.

Killing restic can’t cause the errors you have seen.

restic rebuild-index should be able to repair the index. But before trying that again, please run some hardware stress tests and a memtest.

Thanks for the thorough reply!

I’m using ECC RAM on the machine and the tests returned no issues with RAM. I would guess that it could be a network error.
I tried running rebuild-index on the repo from another machine and this worked, but other things are still failing.

$restic prune
repository e22ded59 opened successfully, password is correct
counting files in repo
building new index for repo
pack file cannot be listed 213dc7e28f7ceec6512393717050ba3bc3cf829229b1f288e93f55f9b30385c8: invalid type 27
pack file cannot be listed 22cbb998b3f189d58390b9cbb78ba387a910545ef7a8dea9f9a50780a59fcc66: invalid type 145
pack file cannot be listed 4dee22363b6e74abf3beae038d7892d59fbb438c0b5bcaeeb3ae3b5c1731a843: invalid type 22
pack file cannot be listed 7f850687b5ee0b4641b68d9c5e6881efb4c14ad4d9e673d0bdfa35b7be5c8c02: invalid type 248
pack file cannot be listed 87d69d0aa805b12befed4543a3951e6c75964a0a8f8ea4a1f43c415d274747d7: invalid type 173
pack file cannot be listed a6dbd553d8681346444f105c85494ccc7b4c034e3f124057a87ff0fd67be2bdc: invalid type 225
pack file cannot be listed b3eb139c96dd50a3bacf145fe5750d65ede817ca8a409f55358d776bad1441b3: invalid type 152
pack file cannot be listed c10e7b7024c24082faee1bc6a34542157b77dadbb65bdf941739bd13cfab11ad: ciphertext verification failed
pack file cannot be listed ce0953c35bdcfc9ca93c4170d4d475ab3cce432ee465de68b4cd8ac26d436da4: invalid type 135
pack file cannot be listed eee4f3f4032cf2ddaff477cf9b36081ef2f059e2219ceca521feb5aca77d2f7c: invalid type 86
[8:24:06] 100.00% 1311487 / 1311487 packs
repository contains 1311477 packs (16095308 blobs) with 6.373 TiB
processed 16095308 blobs: 1182 duplicate blobs, 773.489 MiB duplicate
load all snapshots
find data that is still in use for 8 snapshots
[0:09] 0.00% 0 / 8 snapshots
blob 695deee5d9d7934e3039d5c5b774ac1e994391f682d505035662c7d763bdae71 returned invalid hash
github.com/restic/restic/internal/repository.(*Repository).LoadBlob
/restic/internal/repository/repository.go:204
github.com/restic/restic/internal/repository.(*Repository).LoadTree
/restic/internal/repository/repository.go:723
github.com/restic/restic/internal/restic.FindUsedBlobs
/restic/internal/restic/find.go:19
github.com/restic/restic/internal/restic.FindUsedBlobs
/restic/internal/restic/find.go:31
github.com/restic/restic/internal/restic.FindUsedBlobs
/restic/internal/restic/find.go:31
github.com/restic/restic/internal/restic.FindUsedBlobs
/restic/internal/restic/find.go:31
github.com/restic/restic/internal/restic.FindUsedBlobs
/restic/internal/restic/find.go:31
github.com/restic/restic/internal/restic.FindUsedBlobs
/restic/internal/restic/find.go:31
github.com/restic/restic/internal/restic.FindUsedBlobs
/restic/internal/restic/find.go:31
github.com/restic/restic/internal/restic.FindUsedBlobs
/restic/internal/restic/find.go:31
main.getUsedBlobs
/restic/cmd/restic/cmd_prune.go:276
main.pruneRepository
/restic/cmd/restic/cmd_prune.go:158
main.runPrune
/restic/cmd/restic/cmd_prune.go:62
main.glob…func19
/restic/cmd/restic/cmd_prune.go:27
github.com/spf13/cobra.(*Command).execute
/home/build/go/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:826
github.com/spf13/cobra.(*Command).ExecuteC
/home/build/go/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:914
github.com/spf13/cobra.(*Command).Execute
/home/build/go/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:864
main.main
/restic/cmd/restic/main.go:98
runtime.main
/usr/local/go/src/runtime/proc.go:204
runtime.goexit
/usr/local/go/src/runtime/asm_amd64.s:1374

I’ll link the output of restic-check as it’s very long.

I’m going to try the repair command added in this pull request and see what it can do.

I’m not sure whether that’s a good idea as something very wrong is going on here. I don’t recall seeing errors like invalid type 27 pack file cannot be listed within the whole last year. This error and most of the ones from the previous forum post can only produced in a single way: The host creating the backup must mess up the data while creating the pack files. This cannot happen due to network problems or problems on the storage server. So this is very likely a hardware problem. There are also a few Linux kernel version which cause problems.

To recover you’ll effectively have to remove all broken files and blobs from the repository: move the pack files that cannot be listed out of the repository (but keep them for now). Then use restic find --show-pack-id --tree 695deee5d9d7934e3039d5c5b774ac1e994391f682d505035662c7d763bdae71 e668ddf0074a831b51fe2e43eb3f5769f45dbf0ed68a8bc2bc6ebb1b03f0d3bd [...] to search for all trees which are reported to have an invalid hash as well. Look at Data blobs seem to be missing, aborting prune to prevent further data loss! · Issue #3023 · restic/restic · GitHub for instruction to salvage data from those pack files. Then move those pack files out of the repository. Then run rebuild-index again (preferably using a current beta version of restic, which should only take a few minutes instead of hours). Afterwards run the normal restic backup tasks to try to recover now missing blobs. Afterwards you could try the repair command to repair the snapshots.