Very Large Number of B2 Transactions

Hi Everyone,

I had a bit of a shock when by B2 bill arrived. I have about 350GB stored and according to the dash there are about 85k files in my bucket.

I have apparently done 37,526,244 b2_download_file_by_name transactions in march (and downloaded 127GB) I have run restic check 3 times in march, but dont see how I could have done 37M transactions.

Any ideas?

I have 2 machines backing up to 2 buckets, one using 0.8.1 and the other using 0.8.3 (v0.8.3-1-g21f67a0a). I have only been running check on the 0.8.3 bucket.

Thanks

Dean

The checks have only been run from the 0.8.3 instance. The

Hey,

I’m sorry that you had an unexpected B2 bill. Did you know that you can configure B2 to alert you once a specified number of (non-free) transactions have been executed?

Back to restic: In the 0.8.3 release we’ve merged several improvements for cloud-based backends and B2 especially. With the help of the library author we’re using to access B2, we were able to reduce using the API call b2_download_file_by_name a lot. During regular operation with restic < 0.8.3, this API call was also used during operations such as backup. Once we learned that we were able to work around it. A large portion of the costly API requests were likely caused by this.

Since 0.8.0, restic has a local metadata cache, which speeds up operations and reduces backend API requests for metadata. If you have created the repository with restic < 0.8.0, it may happen that there’s data in the repo which isn’t cached, and requested over and over. You can correct this by using restic prune. If you’ve already done that in the past, all should be fine.

The next thing that you may not be aware of: By default, restic check does not use a cache. In general, I think this is a good idea: It’ll notify you of files changed on the server, even if the unmodified file is still in your cache. The downside is that all data is fetched on demand from the repo, which may also cause many transactions because restic’s check process is rather dumb at the moment: Every tiny blob of data is requested separately, so if five blobs are needed from a single files in the repo, restic will send five separate requests.

You could pass --with-cache to restic check, but probably that’s not a good idea because it’ll take the cache for granted and not re-download files. I’m thinking about using a temporary directory as the cache for restic check, but that hasn’t been implemented yet.

3 Likes

Hi Alexander,

Thanks for the update. I agree that check should never use the cache. I am going to upgrade both boxes to the latest and keep an eye on it.

I was just surprised by the 37 million number with as there is only 80k files, that means it read every file 460 times in a month. I backup 4 times a day, so that seems like a lot.

Thanks

Dean

By the way: Backblaze decreased the costs of downloading files by half. See the blogpost here https://www.backblaze.com/blog/backblaze-b2-drops-download-price-in-half/

Just wanted to note that @fd0 recently commited PR #1696, which adds a temporary cache to check by default. Guess you can expect this to be included in the next release.