As I understand, there are two ways to backup to an rclone remote:
Serving a rest server using rclone, then connecting to that rest server with restic via
rclone:remote:path/dir/subdir/ as the repo and letting restic handle rclone.
Which of these perform better and when is one preferable over the other?
Based on my observation TCP native multiplexing works much better than http2’s one so in identical conditions (let’s say in 24 parallel transfers) via HTTP/REST works much faster than rclone backend which is using http2 via stdio. I’ve already tried to rebuild both rclone & restic with the latest net/http2 a couple of times - changes are insignificant.
Here are some explanation about how it works and how I use restic with rclone:
- Restic sends data to the backend by snapshots one-by-one, i.e. it iterates over snapshots, stores all packs in the backend, stores index and then stores snapshot and only then goes to the next snapshot. So if you have small amount of packs per snapshot higher concurrency won’t give you sufficient benefit no matter what protocol you use.
- Some (rclone) backends requires sufficient time to begin to accept data, so bigger pack means better connection utilization
So I mixing these two approaches: I normally use native rclone backend since it’s just a handy way but if I see it takes too long or it timeouts (I have higher level timeouts on backup copy operation) - I initiate a manual transfer via REST backend.
All the above is related to copy. In backup case this works the almost the same way actually but local file IO (reading files on disk to make a snapshot) and additional CPU for CDC goes into the game and there’s almost no concurrency on IO operations (i.e. no multiplexing is required), so the difference could be very small.
About which performance region in terms of network speed are you talking? 100Mbit/s, 1Gbit/s or even 10Gbit/s? There’s probably only a difference for the latter two options, whereas the first shouldn’t matter much.
Didn’t know there is a possible performance topic by restics http-over-stdin approach. I was just wondering why it is used as it makes things a bit more complicated.
Actually, IMO it is easily possible to change the restic code in order to use TCP to a locally running rclone which then would improve the performance. Does anyone know what the actual reason is why restic uses http-over-stdin?
I suppose it’s because using STDIO (with http2) is simple and performant enough for trivial cases and security stuff - to avoid listening TCP sockets (which are available to everyone on the same machine).
Just to clarify: I’m not proposing to change default behaviour because of points above, since my case is not common - I’m copying amount of snapshots from local repository to few cloud ones at a time and aiming to utilize 1Gbps internet connection, so I need multiplexing, while most of users do not.
I’m pretty sure that a single HTTP2 connection over stdin (which is also not encrypted) should be more than capable of saturating a 1Gbps connection. Everything else sounds like a performance bug somewhere.