Prune error tree not found

MrMoronIV · July 6, 2018, 4:57pm

I’m running these steps daily:

restic backup (with my parameters)
restic unlock
restic forget --keep-daily 1 --keep-last 30 --prune

So, your main question is probably, why do you unlock? Well, the prune was complaining that the repository was locked after the backup finished (not always). Is the backup process not finished when the backup command is done? If that is the case, and force unlocking destroys the prune process, there isn’t really a way to automate backup and directly prune afterwards.

since there are snapshots qualifying for prune I get errors:

processed 27414 files, 881.833 MiB in 1:17
snapshot dced3a81 saved
Backup Successful!
Unlocking backend as extra check
Unlocking backend
successfully removed locks
Starting cleanup of old backups
Applying Policy: keep the last 30 snapshots, 1 daily snapshots    1 snapshots have been removed, running prune
counting files in repo
building new index for repo
[0:10] 100.00%  230 / 230 packs

repository contains 230 packs (25881 blobs) with 804.155 MiB
processed 25881 blobs: 0 duplicate blobs, 0B duplicate
load all snapshots
find data that is still in use for 30 snapshots
tree 589981bde504257259c789b65e08d72a3927c68f0dc08d01fab6835bed35f3b4 not found in repository

There are lines omitted above and some were added by my script.

I can fix this by running check --read-data and running the prune command afterwards. However, the cause of the problem still remains. Not sure if unlocking is the cause though. This ran on version 0.9.0-20

Dj0k3 · July 6, 2018, 8:40pm

I get the restic unlock. It happen to me too. Now I just run unlock at the top of my script and everything runs great. This issue has happened to me recently and I fixed it running restic rebuild-index, then I ran check --read-data just to be sure the repo was clean and at the end prune. Everything went back to normal. I hope this helps.

MrMoronIV · July 7, 2018, 9:47am

You run unlock before you start the backup? Isn’t the problem that the repository remains locked after the backup? How would that improve things?

fd0 · July 8, 2018, 9:40am

The issue lies within restic, especially in how it internally indexes data. If it happens, run restic rebuild-index before prune. I’m aware that it’s a rather expensive operation (in terms of time it takes to complete), but that’s how it is right now. Afterwads, you can run restic prune and everything should be alright.

While running prune before, restic detects an error and aborts, without first clearing the lock it left in the repo. So in this case, running restic unlock once is needed. I don’t advise running it at the beginning of a backup script though.

In general, restic unlock only removes lock files it considers “stale”, so even if you run it, it should probably be fine.

MrMoronIV · July 8, 2018, 3:32pm

The issue lies within restic

So does this also mean that if I wait, restic will be ‘fixed’ and the error will never happen again?
Or do I really need to build in a custom check to see if the prune command fails, then rebuild-index and then prune again?

Or maybe you can be more gentle in your prune command and rebuild-index yourself when part of the tree is not found instead of killing the prune process (maybe with a --force-rebuild flag for automated scripts)? I’'m not really into custom scripting when the program itself is failing at what it does

fd0 · July 9, 2018, 7:03am

Yes, I have plans for that. Similar to most other features, there’s no fixed timeline I remember that there’s an issue for it in GitHub, if you like you can try finding it, otherwise please just create a new one. The we can track it and you can subscribe to it.

What happens here internally is that restic has two different data structures handling “index” files. These files contain information which data blob is saved at what offset in which file in the backend. Restic reads the index files at each program start to learn which data is stored where. This is just a short cut: Each file in the repo ends with a “header” that also lists the file’s contents. So the index files can be recreated by reading the headers of all files.

For safety, the prune operation won’t just use the index files in the repo, but rather create a new index from scratch, so it really knows what’s there. The problem/bug is that this new index isn’t fully integrated (yet), so during repacking, the index files from the repo are used again.

When the situation happens that still-referenced data is contained in a file which is present in the backend (so the data is there), but somehow is not contained in any index file, restic bails out.

The obvious solution is to use the newly created index for everything. I’m currently reworking index handling throughout restic (as it is likely also the reason for restic memory hunger), and this will likely get corrected in the process.

In the mean time, when the situation arises, it can be mitigated by creating a new index which covers all files in the backend before running prune, this is exactly what rebuild-index does.

Dj0k3 · July 9, 2018, 4:28pm

I do that, not because my repo remains locked after a backup. My problem wasn’t about that. My problem was a user problem. See, I use restic for me and for my wife’s laptop. I’ve configured her laptop to do automatic backups with a script and a cron job. Sometimes she powered off her laptop when the backup wasn’t finished, so every time she turned on her laptop I had to unlock the repo. So that’s why I’ve added the restic unlock in the beginning and I haven’t had any problems with it. I know when you do a backup you don’t necessarily need to unlock but every time the script runs it doesn’t run just a backup, also, like you, I have the check, forget, keep and prune in the same script.