False positive error on prune operation

Hi there,

I’m sometimes getting tree 6900cc... not found in repository errors on some of my big repositories while trying to prune.

Problem is, restic gives this error at the end, and exits without touching the repository. Probably for safety reasons.

A little bit unrelated note: Since I am getting exit code 1 in most cases, I can’t distinguish this error ( it even has a ticket: Return different exit codes for different failures · Issue #956 · restic/restic · GitHub ) :man_shrugging:

But most of the time this “error” feels like a false alarm, since I know missing a file is a low probability and run rebuild-index and prune again afterwards, the issue disappears.

Currently what I do is running rebuild-index before every prune operation to be safe.

Do you have any insights? Or shall I open a ticket for this case? :thinking:

Thanks.

Can you post a more complete error log? The error message is probably returned somewhere inside restic.FindUsedBlobs. Aborting the prune operation is the only possibility to avoid data loss, as prune only keeps blobs which are still in use. And if restic fails to load a tree blob then it can’t mark blobs referenced by that folder as used.

Did some backup operations fail before the error? The error message should only occur if a index file is lost somewhere during backup. However, in that case restic should never have created a snapshot for that backup run… (The implementation actually make sure to complete the data upload first, then finishes all index files and only after that restic creates the snapshot) So the question is what broke the index? Does the check command print any not referenced in any index warnings?

Sadly I don’t have the error log anymore. I couldn’t find any other occurrence since I’m running rebuild-index by default. I’ll try to reproduce again and run check command if I can catch.

And no, I couldn’t find a failing backup operation. I’ve especially checked logs for a specific node which was the owner of the snapshot included the mentioned “lost tree”. Seems like backing up completed without issues.

Could the parallel backups be a problem (since backup size is not small, multiple hosts might be backing up at the same time)?

Thanks

Parallel backups shouldn’t cause problems, each backup run will (with extremely high probability) use separate packs, index files and snapshots. And as the prune operation requests an exclusive lock, there should be no collisions with backup runs.

However, a backup run that completes while prune lists all packs in the repository could trigger such a warning. In that case the prune error should vanish when running prune twice…

Hmm, I am not sure if I understand completely. Doesn’t the prune place an exclusive lock before even starting listing packs? And placing this lock should not be possible with any other kind of lock, including a completing backup.

Anyway, I’ll run some big forget rounds to see if I can catch a similar error again. I’ll notify if something weird shows up, thanks again :+1:

Sorry for the confusion. This was merely meant as a hypothesis how such an error might show up. However this would only work as you’ve noticed when ignoring locking.

1 Like