I started to run the S3 pricing estimate calculator. I know how much data in gb I’m going to have. I’m planning to do nightly backups to S3 and need to understand how to estimate the number of PUTS I should expect each night.
I couldn’t see any guidance on how restic sends data to S3 and I didn’t find any thing in existing discussions on S3.
A lock file is created and periodically updated (I think once every 5 minutes) so the backup duration divided by 5 minutes = the number of lock puts.
Pack files are uploaded. Restic targets pack files to be around 8MB, and all of your data will go in packs. So for packs, take your total data set (absent deduplication – we’ll assume this is a first-time backup with minimal duplication) and divide it by 8MB. This is approximately the number PUT requests for pack files.
Index files are uploaded. These indexes scale with the number of blobs (data chunks) so this approximately scales with the total size of your backup set, but is also affected by how many individual files are in the repository. A small set of huge files means the indexes will scale more with total amount of data backed up; a large set of tiny files means the indexes will scale more with the number of files. There is no formula I’m aware of to accurately estimate this.
A single snapshot file is created at the end of the process.
My workstation, which should have a decent mix of different file sizes (though may lean more towards the “lots of small files” side due to node_modules) has the following characteristics:
211GB contained in 44,980 pack files. (~4.8MB per pack)
49 indexes (after recreating them, to better illustrate a fresh upload).
The most recent snapshot contains 152GiB of data in 4.2 million files. With deduplication, this is reduced to 104GiB of data in 1.2 million blobs.
The real-life example of @cdhowie pretty much reflects what I am seeing. I would say that the pack files average to around 4.7 MiB (there are open PRs which allow to increase the pack size for very large repositories).
Also, usually everything except pack file uploads can be neglected. These are at most some dozends PUTS requests which should not contribute to your bill.
@nexar About your question how many PUTS you should expect per night: This depends on how much your data changes. More precisely, it depends on which new blobs restic needs to save in the repository.
For example, if you add completely new data, all data needs to be saved in new pack files. If you just copy or move files around, restic detects that tha file contents are already stored and just needs to save some new tree information which is almost no data. If you did not change anything, all restic has to save is a single new snapshot file.
Thanks very much guys for the detailed information. It also helps me better understand what restic is doing behind the scenes. For the moment I’ve gone with Wasabi rather than Amazon S3 which doesn’t factor PUT/GET etc calls or bandwidth into it’s pricing. I just started with this yesterday so I have no feel for how good or otherwise it is. I did have a small problem which their Support responded to very quickly and resolved.
Truly appreciate the amount of effort put in to respond in detail.
I am using wasabi for 2 years with the regular s3 backend. 20 repositories, daily forgets and daily prunes.
Never had any problems. Doing check and restores from time to time, everything worked as expected by now.
Thanks @doscott & @betatester77 for your input. I’ve currently run into a problem with Wasabi whereby my Access Keys are not allowing restic access to the bucket. Unfortunately it’s the weekend so without shelling out $100 I’ll have to wait till Monday for a resolution. I tried creating a new Key set and use that but even that is not allowing access.
However it’s good to know that once things settle down Wasabi is a good choice.
@doscott I don’t understand your logic for the sequence of forget, check, prune. I would have thought you would want it to be forget, prune, check. That way any problems from prune can be caught.
I am currently planning a weekly check and a monthly prune as my data set is small and also doesn’t change much. I may however do a forget and a prune separately.
@doscott I understand what you are saying about the sequence now.
I am however completely lost about your ‘policy’. Is this set on Wasabi? I must admit I haven’t bothered looking at Wasabi’s offering in detail as I was hoping to rely simply on restic doing ‘it’s thing’.
The policy is a set of rules for permissions on the bucket.
Log in to the wasabi console at https://console.wasabisys.com/#/login
In the bucket list, to the right of your restic bucket press the button with the three vertical dots in the actions column and click on settings. Then select the “Policies” tab. You should be able to copy/paste the policy above into the tab and replace my-bucket-name with your bucket name. If you get the green check mark “Policy is valid” indication you should be able to save it. After that restic should be able to work with the bucket.
Re. why things work and then didn’t, I can’t say. A policy was required with Amazon so when I moved to wasabi I created one from the console when I created the bucket from the console.
Re. the Edit, wasabi billing has been precisely what they indicated it would be. I have never had any surprises. A recent invoice:
Details Unit Price Quantity Total
Timed Active Storage $0.00016243 per GB per day 32770.7 GB-day $5.32
Timed Deleted Storage (applicable for deleted storage < 90 days) $0.00016243 per GB per day 1829.25 GB-day $0.3
Data Transfer (in) (all regions) $0 per GB 4.19573 GB $0
Data Transfer (out) $0 per GB 25.4714 GB $0
API Requests $0 per 1k Requests 439.621 Requests $0
Minimum Active Storage (applicable if Timed Active Storage <1 TB) $4.99 per Billing Cycle 0 $0
Support Charge $0 per Day 30 Days $0
Taxes (US State Sales Tax, VAT, or GST) $0 1 $0
Wasabi Technologies, Inc. Wasabi Technologies B.V.
29th Floor c/o Vreewijk Management BV Tax $0
111 Huntington Avenue Kingsfordweg 151
Boston, MA 02199 1043 GR Amsterdam Paid $5.62
United States of America The Netherlands
email@example.com VAT ID: NL 8597.15.231.B01 GRAND TOTAL $5.62
Thanks very much @doscott. I’m going to ask the question about why things stopped working tomorrow with their support and will report back.
Yes that is about what I’m expecting in billing per month too. It’s a pity that Stanislas doesn’t mention what his grievance was. I have posted a question on the blog. If he replies I’ll report back here too.
Have to say again, two years I am a happy wasabi customer and there were never any surprises on my monthly bill.
Of course, their data detention policy (every Byte gets billed at least 90 days even when deleted) is a disadvantage and could lead to bad surprises, but the fact it exists is no secret. You find it a hundred times on their pages.
It’s still much more cheaper than others and if I look at all the different fees at aws for example, there are way more factors that could lead to bad surprises on bills.
That is an interesting statement. In their FAQ they state:
“Wasabi’s free egress policy is designed for use cases where you store your data with Wasabi, you access this data at a reasonable rate, and your use case does not impose an unreasonable burden on our service. … If your monthly downloads (egress) are less than or equal to your active storage volume, then your storage use case is a good fit for Wasabi’s free egress policy. … If your storage use case exceeds the guidelines of our free egress policy on a regular basis, we reserve the right to limit or suspend your service.”
So if you download your entire repository more than once a month, you may get a notice.
@cdhowie Thanks for the comparison. As you say…pick your poison. I’ve just started with Wasabi and for my volumes there probably isn’t much to pick between the 2. I’m certainly not going to breach their minimum charge volumes.
I was however planning on running an rcheck with --read-data-subset=1/5, 2/5 etch each day which would mean that the whole repository is read once a week and therefore approximately 4 times a month. A question about this:
Am I right in my understanding that the whole repository will be read once a week and that would count towards egress?
I’ve started off the process and let’s see what Wasabi come back with.
On an earlier note about Policy requirement. @doscott I got a reply back from them asking me to delete the old set of Access Keys and just work with the new ones and that seems to be working. So currently I don’t have any ‘Policies’ set up. The bucket’s permissions are set to read/write for the owner only which is the default and also what I need, so I don’t need to change anything.
Thanks again for everyone’s help with advice and suggestions.
There is a good discussion of --read-data-subset here:
In my opinion, all of the major cloud providers are reliable, and a lot of the smaller ones actually use the major ones. I think reading your data 4 times a month is excessive; even once a month is excessive. Personally I don’t bother, but I back up to two local NAS boxes plus the cloud. I occasionally download something from each and have never had any issues with restic recoveries.
If you really want to do this level of checking you might be financially better off using redundant cloud suppliers, say wasabi and B2, skip all of the checking, then when recovering data use wasabi as first choice because of zero egress costs, and B2 in case of problems with wasabi. Combined with your local backup, the odds on not being able to recover all of your data from the three sources is minimal. If you don’t have local backup, you should consider have two cloud backups in different regions.