-o b2.connections=N seems to have no impact

You got it almost right: The setting -o b2.connections=N is there to limit the number of connections, it’s just an upper bound. When we introduced it, restic used way too much concurrency, so for some users it was best to limit the number of outgoing connections in order to not congest the (small) upstream bandwidth. At the moment, the actual upload is done by the file worker threads in the archiver code, of which there are two. The long-term plan is to decouple the upload process from the file reading process, so we can have more uploads going on.

That would be great @fd0! These days managing bandwidth is seldom the issue, but managing latency is. We’re mostly using inter-cloud network bandwidth and even my home upstream bandwidth is 1Gbit. With current restic parallism and options, I can’t even saturate 1% of my home upstream bandwidth to high-latency repos :dizzy_face:.

If I move back to NZ, where they are getting ready for 10Gbit consumer connections. As you’d guess, the latency from NZ to anywhere in the world is huge. Restic max-parallelism speed from NZ to B2 will be ~0.1% of available bandwidth :slight_smile:

It would great to have the configuration options to re-enable some of that original parallelism, even if the default options are conservative. An important modernisation.

1 Like

Hello,
Living in France I have been happily using Restic with Azure (Paris DC) for some time, and I have just started using B2 since they opened their Amsterdam DC, but I am facing very slow transfers (around 2MB/s whereas I have more than 12MB/s with Azure).
The Backblaze Support tells me I have to use multiple threads, but if I understand correctly this discussion, the b2.connections option won’t help. I tried using restic with rclone by specifying the --transfers option to a high number but that didn’t help.
Do you confirm that, until restic implements the option to use more concurrency, my only option to improve the upload speed is to have restic backup to a local disk, then use rclone to transfer tthe files to B2?
Thank you for your feedback.

In short, yes, this seems to be the case. If you have the local disk space, backing up locally and uploading the files separately with rclone is fantastic.

Neither. This has been well covered in the last, you would do well to review the existing discussions.

In short, restic waits until a block is uploaded before moving on, therefore it effectively never makes multiple connections. If you backup to a local path and then rclone that local path you’ll get the desired result (at the cost of local disk space).

I am having a similar issue although with backup restore not backup. Restic completely ignores -o b2.connections=42 and only opens one single connection to backblaze. This single connection is throttled to 25 megabits per second, so it is taking be HOURS to restore a single 20GB file. I verified this with the lsod kernel tool.

lsof -p 32232
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
restic 32232 root cwd DIR 8,0 4096 8011 /root
restic 32232 root rtd DIR 8,0 4096 2 /
restic 32232 root txt REG 8,0 23714488 95924 /usr/bin/restic
restic 32232 root mem REG 8,0 1924768 7409 /lib64/libc-2.12.so
restic 32232 root mem REG 8,0 159312 383 /lib64/ld-2.12.so
restic 32232 root 0u CHR 136,5 0t0 8 /dev/pts/5
restic 32232 root 1u CHR 136,5 0t0 8 /dev/pts/5
restic 32232 root 2u CHR 136,5 0t0 8 /dev/pts/5
restic 32232 root 3u IPv4 2661216015 0t0 TCP myserver:47870->206.190.215.16:https (ESTABLISHED)
restic 32232 root 4u 0000 0,13 0 19187 anon_inode
restic 32232 root 6w REG 8,0 463929534 1157221 /tmp/restore/mariadb_dump.sql

I have a 40 gbit downlink and I’m only able to pull 25 megabits

To be clear, this is not entirely true. This setting is passed through to the underlying library, but more connections are only used if B2 is accessed concurrently by restic, and that concurrency is currently limited to 2 threads for backup.

However, I thought that 0.9.5 used more threads during restore.

Is this still current in JUNE 2020?

Is there any way we can force it to upload with more than 2 threads?
The slow speeds on b2 are freaking me out, but I would like to use it with b2.

1 Like

Regarding uploads, not much has changed. For larger files restic should run up to one pack upload for each CPU core. For smaller files the upload is effectively limited to 2 (the number of files read in parallel). This problem is tracked in

Regarding restore performance the new restorer code that’s been merged into master for some time now, but is not included in any release yet, should be able to download 8 packs in parallel.
workerCount

1 Like

WOW! Nice. I changed to FileReadConcurrancy of 6 and my I/O rate doubled. Upload to B2 also increased as this conversation makes apparent; I’m taking advantage of more upload concurrency now also. Now to find my sweet spot for overall speed.
I’m not yet seeing a harm in my increased FileReadConcurrancy. I don’t mind making my HD hurt! :wink:
I look forward to the archiver and uploader split, so I can find my sweet spot for each independently.
I’d love to see FileReadConcurrancy as a runtime option instead of hard-coded. This would be a nice option ever after archiver and uploader split, so we can fine tune to our machines.

1 Like

I would so like to hear if anything has changed with regards to this. The extreme slowness bugs me. :smiley:

Understand that “upload bandwidth” is VERY much an issue for a large part of the world, so while 10Gbit uploads might be possible for you, it’s hardly universal.

It was just an example where connections can have both high latency and high bandwidth. In those situations restic is unable to use even a tiny a fraction of the bandwidth. It get’s throttled by latency because it has little no mechanism to handle it (for some operations).

Regardless of how slow your endpoint upload is, if your latency is high this will impact you, especially for operations like deleting blocks. You would also still hit this problem with any cloud → cloud backup, since cloud bandwidth keeps increasing, even though the latency cannot improve. As bandwidth improves, restic’s relative performance gets worse and worse. That’s why I flagged latency handling as an important modernization. Parallel and/or asynchronous operations are probably the low-hanging fruit to address this.

The new restore is massively more performant on high latency connections, it really addressed the problem :+1: :heart_eyes: :partying_face:

It is the operations like forget and prune that are bottlenecks now. The easiest workaround right now is to keep backing up without forget/check/prune to one B2/S3 bucket for while, then start over with a new bucket repo, then later delete the whole first bucket. Otherwise we have to suspend backups for a couple of days to allow a forget/prune pass to complete (since it requires a lock, and a prune on B2 can take 24-48 hours to complete for us).

Thanks for the tip about backup limiting to core count. It would be great to un-shackle backup parallelism from the CPU core count. Not sure how CPU could ever be the bottleneck except is the use case of local-to-local NAS/SAN backup?

Huh, did you check out the new options for prune? The option --max-unused unlimited will tell restic to only remove files which are completely unused and just keep the rest (even if they contain both data that’s still in use as well as unused data). If you don’t want unused data lying around you can limit the amount of data that is to be re-uploaded via --max-repack-size 500M to only limit the repacking step.

If you try that then please report back!

Just out of curiosity: which version of restic do you use? Which step in the prune process takes so long?

1 Like

We’re using restic 0.12.0. I see there is a 0.12.1 last month so will update. Backing up to B2 with up to ~180ms network latency.

Thank you for those optimization ideas, they sound good because storage is often cheaper than time. What we’ve done to optimize it ourselves is split back-ups into silos, with each backup target having its own server-side repo and client-size cache. That means we can break up prunes into tasks that are no more than about 5 hours each.

The forget/prune time is heavily dominated by deleting things (>95% of total forget/prune time), which doesn’t seem at all parallel? Although finding used snapshots and repacking takes more time per operation, there is not very much of that to do. However if you are deleting 100,000 things at ~0.17 seconds per delete, that prune is going to take about 5 hours. With our silo approach we can break up the 24-48 hours into silos that take 5 hours or less and we can do in parallel.

Here the average restic B2 operation times we observe with network latency ~180ms:

Deleting snapshots for forget: 0.18s / delete
Deleting obsolete indexes: 0.16s / delete
Removing old packs: 0.16s / delete

Finding in use data: 0.35s / snapshot
Repacking pack: 0.6s / repack

The delete operation seems basically the same at the network latency time. If so, then if your latency is 10ms you can delete ~100 things/second, is latency is 100ms you can delete ~10 things/second, and at our ~180ms latency we can delete ~5 things/second. Obviously bandwidth & CPU cores is not relevant here, everything is latency-bound. If we could run 20 delete operations in parallel we could probably reduce forget/prune time from ~5 hours to ~15 minutes.

There is also a trade-off with frequent smaller forget/prunes or doing that less often. Because we have to suspend back-ups to run prunes, we tend to do that less often, usually monthly.

Back-ups themselves are incremental with retained client-side caches and run frequently (down to 15 mins for production). They’re no problem, most take less than one minute to run since only uploading a couple new files. Restores with the new improved restore are pretty excellent and about as fast as you could hope for I think. The forget/prune is slow, which is a problem because it needs an exclusive lock the whole time. Slow and non-exclusive would be no issue. Or exclusive and faster.

I’ll add your suggested optimization options for next forget/prune run and see what the impact is :eyes:

Using a VPS located closer to B2 is not an option for you?

1 Like

The number of parallel delete operations is currently limited to 8, see

You could also try setting prune -o b2.connections=8 to get the full parallelism. The default connections limit is 5. This makes me wonder whether the operation times have to be multiplied by five. Which would mean 0.8s/delete.

1 Like

Thanks, I had the impression from this article and others that -o b2.connections=8 went to the B2 library but was basically ignored by restic operations. I’ll definitely try -o b2.connections=8 to see if it helps. If we can uncap that hard-coded delete workers limit maybe we can make a dent in the times!

@764287 I considered running containers closer to the the B2 data center just for the forget/prunes. But the restic cache is necessarily in the countries where the data being backed up is. And we’d have to coordinate suspending the backups is those locations with the prune starting near B2. So possible, but a lot of stuff to make it happen.

Last month I tested 0.12.0 with high-latency (180ms) access to B2. Backup and restore performance is great, but forget/prune performance is poor. This is due to how slowly restic can delete objects. And deleting object is ~95% of the total time a forget/prune takes.

From other comments the problem is the small cap on the number of allowed delete connections/workers. It appears that currently every connection requires a dedicated worker, and the number of workers defaults to 5 and has a hard-coded maximum of only 8.

The result is that restic can only delete about 6 objects/second by default (for 180ms latency), so a prune than needs to delete 100,000 objects takes ~5 hours. The original results are quoted below:

I have done some further testing with 0.12.1 with new options the community suggested:

--max-unused unlimited
-o b2.connections=8

Here the average restic B2 operation times we observe with network latency ~180ms:

Deleting snapshots for forget: 0.1s / delete
Deleting obsolete indexes: 0.09s / delete
Removing old packs: 0.1s / delete

Finding in use data: 0.16s / snapshot
Repacking pack: 1.3s / repack

Being able to increase the number of delete workers from 5 to 8, almost doubled the speed at which restic can delete objects! I saw a about linear speed-up with the number of connections/workers. This almost halves the total forget/prune times.

I would love to test with 16 or 32 connections/workers, but unfortunately the hardcoded limit of 8 workers also places a hard limit on the maximum speed of forget/prune operations. It would be great if this limit could be moved out of the code to a configuration setting. Or at least, in the first instance, to a build configuration.

It is not clear what the --max-unused unlimited did as it wasn’t an identical starting state and the previous test. It probably reduced the number of objects that needed repacking and deleting, but that wasn’t dramatic. This was a smaller forget/prune, deleting ~40K objects and there were only ~100 repacks required with --max-unused unlimited.