Error loading tree, check, prune and forget gives error - B2 backend

Hello.
I’m currently using restic to one of my servers to a B2 bucket using restic.
Current version is :

restic 0.8.1
compiled with go1.9 on linux/amd64

I’m running 1x per day in a cronjob backup to a b2 bucket and have done that for some time.
About 65days or so. One day I took a look at my backup log file which included these error messages :

using parent snapshot b873e05a
scan [/some/directories /etc /some/dir ]
[9:10] 16148 directories, 2568203 files, 1.198 TiB
scanned 16148 directories, 2568203 files in 9:10
error loading tree: id 6bcb8d696d445385e785825fd421e8e4940f51443b21046df615fea25c55d9bb not found in any index
error loading tree: id 3174bd2d39c868c3d6273268491cb31797fb68c1834f4405b3074ea8ec162437 not found in any index
error loading tree: id 5125a2f4d5b9f9314846ae9c92581ee72b5e2a53306ba6399e461205bc7cff4c not found in any index
error loading tree: id dc82bbdb6822d519d3a1994b0b3b8f92b52e2ca90083b039d77e9dc37fac95f6 not found in any index
error loading tree: id eae96b07589bb746cbdc7b2ae8bf53e8c84ce7605f06b8b07ae0a16ec4851663 not found in any index
error loading tree: id 13bab94fa142d3f86be1a5846452501df9857187080db0715900e33548147b61 not found in any index
error loading tree: id 7708d9041dca84b4980b5135e7cbf8e859d6c459ad4c6432e1001eb84eec1a9a not found in any index
error loading tree: id 08061502c3cbda255608f0e8cd9e89c586105889176d9fc1a73f02e297cab02f not found in any index
error loading tree: id 3fdff3b6fb1004f44f44321812ce8b488f07ffee7b9d96f9b779f993dbec7175 not found in any index
error loading tree: id d531bb8fe27e04243f0cf08a3560cfdc52fa304355a77b239c02bd2bfe6809ab not found in any index
error loading tree: id b53d76acb3d6dc06e257d7a729f60dd2957092220cf86eec08cd407721855e81 not found in any index
error loading tree: id d0cd660faa977da122b4f2c8cd9bd3d1742ddcf1b4d31cc2c8274638d34e2946 not found in any index
error loading tree: id eba736971ade77078b1e3271602c46499913094cab6774c207c55a734ca19440 not found in any index
error loading tree: id d6c7eca773b78b77a5ef3bd8d444cb6c39f16370c7086f0df41cb135aef60a86 not found in any index
error loading tree: id 1c17ac933c7ffc14f3672b935faa9741c080cd4fce422654b675887b4babdd7b not found in any index
error loading tree: id 769313060d0579ac103e736eb19cce72448324d74b5bad3069d3afbfc35dda40 not found in any index
error loading tree: id 8c20db03f3a120fe9417f1adbb601fd93e12819ed9d643d5023489ebf891312c not found in any index
error loading tree: id abf6f4dda016bf94317c2d25f751a552784a5951b9e5568492cc016a47c984ee not found in any index
error loading tree: id 3f634c2afa93baefd23eff6b61e2bc2b83c880237688197a0ffbb3c64f46247c not found in any index
error loading tree: id f097b0862db72c722760547e72860508874bd52950f2ac14f9eca2ea4a05c764 not found in any index
error loading tree: id 40b8feb923f3a53cc8729245588e1f0c0f3fb192f1b9559aac3d0359542a9717 not found in any index
error loading tree: id 8222e3b1ca448e4d270738c3a605930cf9a05de0d0b4c2372d1fbc740624d036 not found in any index
error loading tree: id 6489582e1b48869982aaefc6e1cb944d2dadd6f0a70345b374d84a4899393318 not found in any index
error loading tree: id dc08a19229b2480f87cd32d107adfc974cc4038c7b34cb04a78b5e3e92b13cb7 not found in any index
error loading tree: id bf78fb4688dcfc2489a6ddbfbcb6344957366b750d0adf4fab00fad5e6374d23 not found in any index
error loading tree: id 741d333f2ebaeccfed57c13a3c2d190136436d0039891a95dab93f83757e57b6 not found in any index
error loading tree: id 83cc664eb9a9c671333467a625f4391bd6636fd5363b3c618e7e977e85f28eca not found in any index
error loading tree: id d546cf196eb37ff1beaecac2db6ca608d49f9657ad1cf73fc5a3d7fedd7f4cf8 not found in any index
error loading tree: id c4191d285d95c2f21d2fb95aaa6e5aa3ec440845d0928f316a442bc2411f48de not found in any index
error loading tree: id f4207412d150b21f8021ec151b1dcd251d4b081a0a5bdfaacc329dc1563c3c96 not found in any index
[2:25:31] 100.00%  0B/s  1.198 TiB / 1.198 TiB  2584360 / 2584351 items  2 errors  ETA 0:00

duration: 2:25:31, 143.92MiB/s
snapshot 5a0ccecc saved

I then issued a restic check --with-cache , as my current cache folder is a whopping 25GB.
restic print these messages out :

tree 74e9c424, blob 92a60c53: not found in index
  tree 74e9c424, blob d00196cd: not found in index
  tree 74e9c424, blob f5539c08: not found in index
  tree 74e9c424, blob 55faff96: not found in index
  tree 74e9c424, blob ee86bfb8: not found in index
  tree 74e9c424, blob 34e34e48: not found in index
error for tree a2121f52:
  tree a2121f52, blob 8819e3ec: not found in index
  tree a2121f52, blob cae48783: not found in index
  tree a2121f52, blob 29dcb306: not found in index
  tree a2121f52, blob 3e9e5f22: not found in index
  tree a2121f52, blob 59c2b936: not found in index
  tree a2121f52, blob fbf71a80: not found in index
  tree a2121f52, blob 284b0d9b: not found in index
error for tree f241f8a3:
  tree f241f8a3, blob 9247e5e9: not found in index
  tree f241f8a3, blob f7537775: not found in index
  tree f241f8a3, blob 9247e5e9: not found in index
error for tree deeb292a:
  tree deeb292a, blob 7dc3874d: not found in index
  tree deeb292a, blob 87fa5eab: not found in index
  tree deeb292a, blob 7dc3874d: not found in index
error for tree ed527eff:
  id ed527eff34be32c70b32103b2fc9b51bc6a292638b905fb0ec76304ef1c56576 not found in any index
error for tree c4191d28:
  id c4191d285d95c2f21d2fb95aaa6e5aa3ec440845d0928f316a442bc2411f48de not found in any index
  
Fatal: repository contains errors

These are my first error messages using restic, tho some days before I got these :

Save(<index/d98d139cc0>) returned error, retrying after 401.016119ms: Close: context canceled
Save(<index/d98d139cc0>) returned error, retrying after 775.403979ms: Close: context canceled
Save(<index/d98d139cc0>) returned error, retrying after 1.114714812s: Close: context canceled
Save(<index/d98d139cc0>) returned error, retrying after 1.779076911s: Close: context canceled
Save(<index/d98d139cc0>) returned error, retrying after 2.700214741s: Close: context canceled
Save(<index/d98d139cc0>) returned error, retrying after 4.05100594s: Close: context canceled
Save(<index/d98d139cc0>) returned error, retrying after 5.887637071s: Close: context canceled
Save(<index/d98d139cc0>) returned error, retrying after 11.953520901s: Close: context canceled
Save(<index/d98d139cc0>) returned error, retrying after 10.284198513s: Close: context canceled
Save(<index/d98d139cc0>) returned error, retrying after 18.066799439s: Close: context canceled

But okey, I tried to prune and forget yesterday issuing the following command :
restic forget --keep-daily 45 --keep-tag initial --keep-tag firstbackup --prune
It ran for about 26hours and downloaded tens of GB’s and gave me the following error messages :

18 snapshots have been removed, running prune
counting files in repo
building new index for repo
      [23:56:57] 99.70%  200724 / 201329 packs
[24:01:27] 100.00%  201329 / 201329 packs
repository contains 201329 packs (2979090 blobs) with 959.049 GiB
processed 2979090 blobs: 2132 duplicate blobs, 398.824 MiB duplicate
load all snapshots
find data that is still in use for 46 snapshots
id 6bcb8d696d445385e785825fd421e8e4940f51443b21046df615fea25c55d9bb not found in any index
github.com/restic/restic/internal/repository.(*MasterIndex).LookupSize
        src/github.com/restic/restic/internal/repository/master_index.go:54
github.com/restic/restic/internal/repository.(*Repository).LoadTree
        src/github.com/restic/restic/internal/repository/repository.go:625
github.com/restic/restic/internal/restic.FindUsedBlobs
        src/github.com/restic/restic/internal/restic/find.go:11
github.com/restic/restic/internal/restic.FindUsedBlobs
        src/github.com/restic/restic/internal/restic/find.go:31
github.com/restic/restic/internal/restic.FindUsedBlobs
        src/github.com/restic/restic/internal/restic/find.go:31
github.com/restic/restic/internal/restic.FindUsedBlobs
        src/github.com/restic/restic/internal/restic/find.go:31
github.com/restic/restic/internal/restic.FindUsedBlobs
        src/github.com/restic/restic/internal/restic/find.go:31
main.pruneRepository
        src/github.com/restic/restic/cmd/restic/cmd_prune.go:187
main.runForget
        src/github.com/restic/restic/cmd/restic/cmd_forget.go:237
main.glob..func8
        src/github.com/restic/restic/cmd/restic/cmd_forget.go:24
github.com/restic/restic/vendor/github.com/spf13/cobra.(*Command).execute
        src/github.com/restic/restic/vendor/github.com/spf13/cobra/command.go:698
github.com/restic/restic/vendor/github.com/spf13/cobra.(*Command).ExecuteC
        src/github.com/restic/restic/vendor/github.com/spf13/cobra/command.go:783
github.com/restic/restic/vendor/github.com/spf13/cobra.(*Command).Execute
        src/github.com/restic/restic/vendor/github.com/spf13/cobra/command.go:736
main.main
        src/github.com/restic/restic/cmd/restic/main.go:69
runtime.main
        /usr/local/go/src/runtime/proc.go:185
runtime.goexit
        /usr/local/go/src/runtime/asm_amd64.s:2337

I had about 64 snapshots before this, and now currently having 46 snapshots as “restic snapshots” tell me.
So it must have deleted, but I’m not sure.

Is my repository damaged?
And it seems that B2 as a backend for repository isn’t good for production?

Thanks in advance.
Best regards,
Svavar O - Reykjavik - Iceland

Welcome to the forum!

From what you describe it looks like at some point either files in the repo vanished, or weren’t successfully uploaded but this fact was not detected by restic or the service did not return an error code, but did not save the file either. Restic made a note (in a so-called “index file”) that a file was successfully uploaded, but it is no there any more. Unfortunately I can’t tell what happened here, and it’s not the first time this has happened with B2.

The error messages you get during backup (error loading tree[...]) are not critical, restic will just re-upload the data it can’t find in the repo. When the missing data is still present on the local disc, the repo will even “heal” itself.

I’m also sorry that restic does not handle running prune with missing data well. That’s a case we still need to improve. And prune will also be improved to be more data-efficient.

Running forget --prune will first remove snapshots, and then run prune, so restic indeed removed snapshots but the data is still there (since the process couldn’t complete properly).

What I’d like you to try now is: Please run restic rebuild-index and when that completed run restic check again. Let’s see if the files are there, just not in the index. Please report back!

Thank you kindly for the quick response.
I will try that tomorrow and report back.

Thanx.

All is good now.
I issued a rebuild-index which outputs the following

restic rebuild-index
password is correct
counting files in repo
[21:46:52] 100.00%  201329 / 201329 packs
finding old index files
saved new index as 8842fb61
remove 244 old index files 

And followed by a normal backup, no errors.

Thanks for the feedback! Now I’d love to find out what went wrong here. In retrospect it looks like an index file has gone missing (or wasn’t uploaded properly).

Usually, restic works like this:

  • Backup data, save data in pack files in the repo
  • Upload a new index which describes the contents of the newly uploaded pack files
  • Upload a snapshot file which references the new data

The last step is only done when the previous steps were executed successfully.

From the error message you pasted I can see that while uploading an index, the upload process was cancelled. One way to do that is send SIGINT (ctrl+c) to the process. Then, uploading and the whole backup process is stopped, so the index file is not uploaded, and (more important) the snapshot file is not uploaded.

So on the next run of restic backup, it re-uploads the data, because there’s no index file covering the new files uploaded during the cancelled backup run. There’s also no snapshot file for the previous run, so restic should behave just like before, uploading the data again.

At no point can I see where restic uploads a snapshot which references data not included in an index file. The index file is always uploaded first, and only then is the snapshot file uploaded. It’s really odd.

Did I miss anything? Any other ideas on how to debug this further?

All in all I must say that I’m proud that the repository format is so robust, yours and similar situations can be easily resolved :slight_smile:

Uh, I got an idea, you may have found a bug!

Can you please confirm that you did not actively (ctrl+c or sending SIGINT) cancel the backup run which printed the error messages about “context cancelled”?

Do you have the last couple of errors printed during that backup run? Was there a line containing the text “error saving preliminary index”?

Hi again.
Thanks for the detailed information.
Everything releated to my restic usage is run through cronjob and my backup script.
If not, I always use the screen command for restic operations, just so If I loose the connection
the process is not killed. But no, I’m 100% sure that I did not send CTRL+C or SIGINT.

So for my backup operations, everything is executed with cronjob about 02:00 in the morning.

My script looks like this :

#!/bin/bash
export RESTIC_PASSWORD="restic_repos_password"
#
export RESTIC_REPOSITORY="b2:bucketname"
export B2_ACCOUNT_ID="b2_account_id"
export B2_ACCOUNT_KEY="b2_account_key"
TMPDIR="/tmp"
INCLUDEFILE="/etc/restic/includefiles.txt"
EXCLUDEFILE="/etc/restic/excludefiles.txt"
LOGFILE=/var/log/restic.log
#
echo "Starting incremantal backup..."`date` >> $LOGFILE
/usr/local/bin/restic backup -o b2.connections=12 --files-from $INCLUDEFILE --exclude-file $EXCLUDEFILE >> $LOGFILE 2>&1
echo "Ending incremental backup..."`date` >> $LOGFILE

My restic log file is available restic log file through pastebin

Thanks for the confirmation, and congratulations: Based on your very detailed report, we’ve discovered a bug together :slight_smile:

What I haven’t told you yet is that during a backup, restic will upload “intermediate” index files, which cover the successfully uploaded data files up to that point in time. This is done so that when the backup is cancelled and re-run, on the second run restic can just download those index files. Then it knows which data has already been uploaded successfully, so it does not have to start from scratch.

So, what happens here is that the backup process finishes while such an intermediate index upload is in progress. The upload is then cancelled (you can see the error message in line 396), but no error is passed back to the function which is about to finish the backup process. It will then upload the snapshot and terminate successfully.

We have now found a situation which I could not imagine this morning :slight_smile:

The upside is: The fix is easy to do, I’ll have a look later today. And there won’t be any data loss, you can easily recover from this by building a new index from scratch, which covers all uploaded files.

Thanks again, you’ve helped improving restic a lot!

Thank you for the detailed information about how restic functions deep under. :slight_smile:
I’m glad this situation can be used to improve restic.

Thank you again for your hard work.
Have a great weekend.

Best regards,
Svavar Orn
Reykjavik - Iceland

1 Like

So just to be super clear; There was never a risk that this bug could cause data loss, neither for @svavaroe or for anyone else?

Yes, there was no risk of data loss. The whole issue is just an inconvenience.

The issue will be fixed in #1589

And it’s merged! Thanks again for the report and your patience!

1 Like