Parallel B2 uploads - error: tree is not known - on slight overlap


We’ve put together a systemd service to upload a bunch of large files to B2 in parallel, and for the most part it’s working really well. We upload 6 files in parallel using stdin (they’re tar files being generated/exported by our local backup solution).

We noticed that we see occasional failures when one file completes <20 seconds after another one starts.

example timeline:

02:00:00 file 1 starts uploading
03:00:00 file 2 starts uploading
03:00:10 file 1 finishes uploading
03:00:20 file 2 - error: tree 9d41e5f342c7158ca221d5c1a67e2792cf14a40805ca77d7acad24f80bfb626c is not known; the repository could be damaged, run `rebuild-index` to try to repair it 

file 2 then continues to upload for about the time it would usually take, but then exits with a non-zero code at the end - with no snapshot created in the repo. When we retry the upload there is no error (unless we are very unlucky and come across the same case again where the start of the upload is slightly overlapped by another file finishing).

It feels like something happens when file 1’s process is wrapping up the upload that file 2’s process is not aware of, causing the tree to not be where file 2’s process is looking.

The command we’re using in the script for uploading is

| restic backup --quiet --stdin --stdin-filename "${filename}.tar" -o b2.connections=8

As it’s quite hard to predict the time each file will take exactly, it’s also quite hard to reproduce on demand, but we noticed this by going back through the logs.

We’re using restic 0.12.1 on debian 11.

That sounds a lot like the issue tackled by List snapshots before index by MichaelEischer · Pull Request #3570 · restic/restic · GitHub .

Thanks - I’ll follow the PR with interest.

We’ll automate retries in our script for now.