Slow repo check with B2 backend

underhillian · September 29, 2017, 5:17pm

I’m evaluating restic as a replacement for CrashPlan. As one test, I created a 20Gb repository on B2. Creating the initial snapshot was as fast as could be expected (limited by my upload bandwidth) and subsequent snapshots are of course very fast. However, running “restic check” on the repo seems unreasonably slow (about 30 min. vs. 2 hours for the initial snapshot) during which there is very little resource usage on my machine and only a small fraction of available bandwidth is used. I’ve been using b2.connections=6 in all cases.

Is slow check speed (relative to available bandwidth) something to be expected in general? Or is it due to some idiosyncrasy with the B2 implementation such that I’d see better performance with a different backend (e.g. S3)?

I realize that “check” isn’t something I’d run every day, so the speed of a check isn’t itself of particular concern…I’m more worried about what speed I’d see if I needed to do a substantial restore and whether there’s something built in to the B2 implementation that would limit this.

I’d do some further tests to answer these questions but I’m in an environment with limited bandwidth and a fairly tight data cap so my ability to run multi-Gb experiments is limited. And I don’t want to put all my eggs in the B2 basket only to discover later that this was a mistake. Hoping someone has been down the same road before and can offer some advice.

Thanks in advance for any comments.

fd0 · September 29, 2017, 5:33pm

Hi, and welcome to the forum!

Which version of restic did you use for your tests (run restic version)? We’ve just recently merged #1040 which adds a local metadata cache that speeds up most operations. By default (for safety) restic check is not one of them, although there is the --with-cache option. Did you give that a try?

For restore, the cache is used so it will be faster, but please be aware that the restore command is not yet as optimized as the backup command. We’ll get there.

What you can always do is download the whole repo to a local directory (e.g. using rclone) and then use that to restore. It’ll probably be much faster.

underhillian · September 29, 2017, 6:25pm

Hi,

In my original tests I tried both version 0.7.3 (no cache) and version v0.7.3-74-gcf80d295 (with cache) and saw no appreciable difference in check speed. Based on your suggestion (thanks!) I ran check again using the --with-cache option and it is indeed much faster (4 min. vs. 30). This is great, and certainly resolves the speed issue, although the paranoid part of me wonders whether running check using the cache really tests the repo integrity?

With the benefit of this result and your other comments I’d shift the emphasis of my question somewhat. I’m at the point where I’m trying to decide between backends and associated providers (mainly B2 and S3, but only because I’m somewhat familiar with these). Leaving aside storage costs, reliability, redundancy etc. (which will of course factor into the decision) is there any particular reason–from the purely restic point of view and, in particular, performance on common operations–to choose one over another?

Thanks again.

fd0 · September 29, 2017, 7:23pm

If you want to be really really safe here, don’t use the cache for check. But that will be slow

The only difference between b2 and s3 (apart from the things you already mentioned) is that for non-American users b2 is really slow. From my home in Germany, there’s about 800ms of latency for each and every HTTP request, which is quite a lot. For s3, I can just select a bucket to be created in Frankfurt, so that’s less than 20ms.

underhillian · September 30, 2017, 8:27pm

Having been born with an extra skepticism gene, I wasn’t completely convinced by your comment so I tried a new experiment. I created two new repos (one on B2 and one on S3) and then created a smaller (~5 Gb) identical snapshot on each. Interestingly, the time to create the initial snapshot was the same (~38 min) for both backends and was limited by my bandwidth, but the time required to run a check (without using the cache) was much longer for B2 (~5 min) than for S3 (~45 s).

Please note that I do not mean to suggest that this points to any issue with restic. I’m only observing that from the standpoint of a simpleminded user like myself trying to choose a storage provider a “check” on B2 seems to be much slower (and IMHO unreasonably slow) compared to an identical “check” on S3. When I have time and more room on my data cap, I will try comparing restore speeds as this (unlike the speed of a check) might become important to me at some point.

fd0 · October 1, 2017, 8:20am

Experimenting and testing things is always a good thing in my opinion, and nothing to excuse yourself for

Just yesterday evening (European time) we’ve discovered a bug in the library we’re using to access B2: For most requests, HTTPS connections to the service would not be reused but instead new connections were established. That did not only slow down operations a lot, it also causes failures for some users (see #1291).

I’ve worked this out with the author of the library and there has been an update, which currently completes the integration tests for restic. Afterwards I’ll merge this into the master branch.

Can you try again and see how long a check takes?

In general, B2 is not the fastest service out there.

fd0 · October 1, 2017, 8:35am

The library has been updated in the master branch, please pull, build restic and retry

armhold · October 1, 2017, 12:16pm

Early report, but this is looking like a ~5x improvement in speed for “building new index for repo” phase of prune (I know, not the same operation as “check”, but I assume similarly affected).

underhillian · October 1, 2017, 2:35pm

I built restic to version v0.7.3-87-g3afd974d and saw little if any improvement to my B2 “check” speed.

Then I noticed in #1291 that @mrzv was initially using -o b2.connections=32. Without giving it much thought, I had set (per my original post) b2.connections=6. Since my bandwidth was saturated during an initial snapshot with this setting, I figured it was good enough and never looked back. But setting b2.connections=32 brought my B2 check to essentially the same speed as my S3 check, with a further modest improvement from setting b2.connections=64.

So between the B2 library update and using a more appropriate value for b2.connections, I’m now seeing similar performance from B2 and S3 and my original question is answered.

@fd0, thanks for a great program and for your help with this issue (as well as for the many other discussions from which I’ve benefited while lurking in the background !)