Error "error walking snapshot: walking tree" when calling "stats --mode restore-size" command

Hello,

i use restic + rclone to create backups within my application (KeyHelp - Server Control Panel). It runs fine on multiple thousand servers, but there is this one server i’m running out of options to find the root cause of this issue.

Scenario #1)

I created a new repository, created a backup, which runs fine with no errors and afterwards perform a “restic […] stats --mode restore-size” command. Even the repository contains only the one snapshot created before and no matter if i set it up to be a remote or local backup it fails with the following command:

Command:

nice -n 10 sudo -u ‘root’ RCLONE_CONFIG=‘/backup-keyhelp/root/11_20260115_114913_240651/rclone’ restic --cache-dir ‘/backup-keyhelp/root/cache/’ --json --repo ‘rclone:rclone-storage:/backup/repository-ec0i7/’ --password-file ‘/backup-keyhelp/root/11_20260115_114913_240651/restic_password’ stats --mode restore-size
Error could not load snapshots: context canceled

Response:

{“message_type”:“exit_error”,“code”:1,“message”:“error walking snapshot: walking tree 32859ce40dd0c98a37a9eab7a7c93e64b36d5334bfab93910ef70d425a7e513d: read blob \u003ctree/75a9bff6\u003e from f8eec125: wrong data returned, hash is 695c29c991efaa9cf8674934023bf6abcc88dc439b0a71847662744a7090df3f”}

If i now wait some minutes and check again with the command, it works. Why does it work after some wait time, why did it not work beforehand?

Some additional information:

This is the information i receive right after the backup:
Total processed: 413.46 GiB
Snapshot ID: ef95a8b5f72bececf0d1b7ec4b2320036122fd3a8540005aad376f062d83a690 Files: New: 3052285 / Changed: 0 / Untouched: 0
Directories: New: 191275 / Changed: 0 / Untouched: 0

Scenario #2)

I have a repository on the same server, containing 19 snapshots of the same files, this time it is a remote repository and here the “restic […] stats --mode restore-size” never works and always produce errors like “error walking snapshot: walking tree … read blob … from … wrong data returned, hash is …”

I have run the check command “restic […] check –read-data” → no issues reported

using temporary cache in /backup-keyhelp/root/cache/restic-check-cache-1690946878
create exclusive lock for repository
repository 4f3ab639 opened (version 2, compression level auto)
created new cache in /backup-keyhelp/root/cache/restic-check-cache-1690946878
load indexes
[0:01] 100.00% 21 / 21 index files loaded
check all packs
check snapshots, trees and blobs
[0:09] 100.00% 19 / 19 snapshots
read all data
[34:11] 100.00% 9114

I have tried to repair the index → same result

Switched RAM → same result

As i receive the same error as in the “lesser complicated” scenario #1, guess finding the root cause of it there will also help in the more complex scenario #2.

Thank you very much for taking your time to help or guide me in the right direction!

Restic Version: restic 0.18.1 compiled with go1.25.1 on linux/amd64
Rclone Version: rclone v1.72.1

  • os/version: debian 12.13 (64 bit)
  • os/kernel: 6.1.0-41-amd64 (x86_64)
  • os/type: linux
  • os/arch: amd64
  • go/version: go1.25.5
  • go/linking: static
  • go/tags: none

Does it always complain about the same tree ID or different ones? This very much looks like a bitflip somewhere in the hardware. (there are way more weird failure modes than just problematic RAM).

You mean that command succeeds on server B, but always fails on server A? That would be another indicator that server A has some kind of defect.

wrong data returned, hash is … means that the encrypted data is correctly authenticated, but after decompression it yields the wrong data (or the hash calculation returns the wrong hash). It’s not possible for one server succeed and the other one to fail unless there’s a problem that throws of the integrity check calculation. The data is likely fine.

However, you could try whether the following helps restic repair packs $(restic list packs | grep f8eec125). Make sure to run this on the server that does NOT return an error during check.