Estimating backend storage requirements

HeavyThumper · September 22, 2019, 7:13am

I’ve been using OpenDrive via rclone for my server backup. It’s been…ok. For the space I certainly can’t complain about the price - but the speed leaves much to be desired.

I’m assuming the primary limiter is rclone - that because restic can’t talk to the backend directly there’s a significant level of inefficiency. If I’m wrong please correct me.

But if I’m correct - that means one of the other storage providers that restic can “natively” speak to will be faster. But storage from Google, Amazon, BackBlaze is charged both by total storage and by transactions. As I use cron-based backups with varying schedules (some hourly, other daily, weekly, etc.) for different data areas there can be a range of interaction with the storage.

Is there a way of estimating the transactions/bandwidth for upload/download that can be used with the various backend providers’ calculators to estimate storage costs?

moritzdietz · September 22, 2019, 11:11am

Using rclone as a intermediate layer between restic and the storage backend does not lead inevitably to bad performance.
I think it all depends on the company running the backend, the location of the servers you’re talking to and all kinds of other stuff.
Backblaze’s servers are in the U.S. (I know they opened a new DC in the Netherlands but that requires setting up a new account and have the account pointed there) so you get a lot of latency talking to the B2 backend if you’re a user from Eurasia.

I don’t know OpenDrive so I can’t really say what could be the culprit here.
What you can do is check the rclone documentation and see if there are flags to increase the speed.
Like for B2 there is options to increase the number of concurrent connections to B2 which yields better performance.

cfbao · September 22, 2019, 8:55pm

No. Delays caused by communication between two local programs is unnoticeable, and completely negligible when compared to the network latency.

Not currently supported by restic (if possible at all).
One way to increase speed and reduce number of transactions is to increase pack size. This isn’t a currently supported feature either, but some users have experimented tweaking restic source code to use larger pack size, and with promising results.

Relavant discussions and user experiments:

HeavyThumper · September 22, 2019, 11:25pm

I believe OpenDrive is US based - as am I - so I don’t think that’s the problem. I’ve heard that for “standard” cloud file storage OpenDrive is supposed to be quite fast - so again I’m just “assuming” rclone is the reason for apparent slow backups.

My thinking, based on near total ignorance, is not communication between restic & rclone but rather that between rclone & OpenDrive. In other words, I’m thinking rclone’s abstraction methods mask any available optimization between restic & OpenDrive - assuming there are any possible at this time. Of course, if there’s no difference in the communication between a remote and rclone vs restic then I’m wasting time thinking about it.

cfbao · September 23, 2019, 12:19am

There’s no reason why rclone would make anything noticeably slower.

moritzdietz · September 23, 2019, 11:43am

Yeah I don’t think so either. I think it’s a valid question to make, but as far as my experience goes you shouldn’t worry about that

David · September 28, 2019, 5:34am

Alternatives you can consider:

For years, I used restic to a local destination, and then replicated that to Wasabi.com using the “aws s3 sync” command. Worked perfectly, and Wasabi charges appx $5/month for up to 1TB storage.

Recently, I switched off Wasabi back to AWS (still using “aws s3 sync”) because of their new Glacier Deep Archive storage tier, which is a bit cheaper for me. Although I will need to pay archive retrieval costs if a restore from the cloud is ever needed.

cdhowie · September 28, 2019, 6:02am

Then hopefully you never have to prune.

David · September 28, 2019, 2:17pm

Then hopefully you never have to prune.

You’ve misunderstood my approach. My backup automation performs a forget/prune weekly, and this causes negligible impact to my costs.

As described above, all my restic operations are performed against local storage, and the resulting repository is sync’d to AWS. Accordingly, prunes incur no restore costs inside Glacier.

There is a very small early-deletion fee for files that are deleted before 90 days. The total cost of these events (frequency multiplied by cost) are negligible for me ($0.02 last month). For users with more dynamic data than I, this might be different (although these charges can be mitigated by setting lifecycle policies in S3 that delay the move to Glacier for x days, balancing early deletion fees against the longer duration in more-expensive S3 storage. I’ve setup my s3 lifecycles to cache my data in S3 for 8 days before moving to Glacier).

My backup costs using this approach are about 60% lower than my old Wasabi approach.