How to estimate PUTS GETS for S3 pricing?

I started to run the S3 pricing estimate calculator. I know how much data in gb I’m going to have. I’m planning to do nightly backups to S3 and need to understand how to estimate the number of PUTS I should expect each night.

I couldn’t see any guidance on how restic sends data to S3 and I didn’t find any thing in existing discussions on S3.

Please can anyone here help me?

Thanks in advance.

Four things get written when you run a backup:

  • A lock file is created and periodically updated (I think once every 5 minutes) so the backup duration divided by 5 minutes = the number of lock puts.
  • Pack files are uploaded. Restic targets pack files to be around 8MB, and all of your data will go in packs. So for packs, take your total data set (absent deduplication – we’ll assume this is a first-time backup with minimal duplication) and divide it by 8MB. This is approximately the number PUT requests for pack files.
  • Index files are uploaded. These indexes scale with the number of blobs (data chunks) so this approximately scales with the total size of your backup set, but is also affected by how many individual files are in the repository. A small set of huge files means the indexes will scale more with total amount of data backed up; a large set of tiny files means the indexes will scale more with the number of files. There is no formula I’m aware of to accurately estimate this.
  • A single snapshot file is created at the end of the process.

My workstation, which should have a decent mix of different file sizes (though may lean more towards the “lots of small files” side due to node_modules) has the following characteristics:

  • 211GB contained in 44,980 pack files. (~4.8MB per pack)
  • 49 indexes (after recreating them, to better illustrate a fresh upload).
  • The most recent snapshot contains 152GiB of data in 4.2 million files. With deduplication, this is reduced to 104GiB of data in 1.2 million blobs.

The real-life example of @cdhowie pretty much reflects what I am seeing. I would say that the pack files average to around 4.7 MiB (there are open PRs which allow to increase the pack size for very large repositories).
Also, usually everything except pack file uploads can be neglected. These are at most some dozends PUTS requests which should not contribute to your bill.

@nexar About your question how many PUTS you should expect per night: This depends on how much your data changes. More precisely, it depends on which new blobs restic needs to save in the repository.
For example, if you add completely new data, all data needs to be saved in new pack files. If you just copy or move files around, restic detects that tha file contents are already stored and just needs to save some new tree information which is almost no data. If you did not change anything, all restic has to save is a single new snapshot file.

1 Like

Thanks very much guys for the detailed information. It also helps me better understand what restic is doing behind the scenes. For the moment I’ve gone with Wasabi rather than Amazon S3 which doesn’t factor PUT/GET etc calls or bandwidth into it’s pricing. I just started with this yesterday so I have no feel for how good or otherwise it is. I did have a small problem which their Support responded to very quickly and resolved.

Truly appreciate the amount of effort put in to respond in detail.

1 Like

I have used wasabi for some time with restic and have encountered very few problems. Some things that seem to have prevented most problems for me:

  • I switched to using the rclone backend (I did this before the next item, so I am not sure if it is still applicable).
  • I do not use the --prune option with forget. This caused me the most issues. I do a forget, followed by a check, followed by a prune.

I am using wasabi for 2 years with the regular s3 backend. 20 repositories, daily forgets and daily prunes.
Never had any problems. Doing check and restores from time to time, everything worked as expected by now.

Thanks @doscott & @betatester77 for your input. I’ve currently run into a problem with Wasabi whereby my Access Keys are not allowing restic access to the bucket. Unfortunately it’s the weekend so without shelling out $100 I’ll have to wait till Monday for a resolution. I tried creating a new Key set and use that but even that is not allowing access.

However it’s good to know that once things settle down Wasabi is a good choice.

@doscott I don’t understand your logic for the sequence of forget, check, prune. I would have thought you would want it to be forget, prune, check. That way any problems from prune can be caught.

I am currently planning a weekly check and a monthly prune as my data set is small and also doesn’t change much. I may however do a forget and a prune separately.

Thanks again for your inputs.

I actually forget, check, prune, check. Check is relatively quick and imposes a time delay to allow wasabi to settle down (best guess is problems are timing related).

Following is the policy I use on my restic bucket:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": "s3:ListBucket",
      "Resource": "arn:aws:s3:::my-bucket-name"
    },
    {
      "Effect": "Allow",
      "Action": [
        "s3:PutObject",
        "s3:GetObject",
        "s3:DeleteObject"
      ],
      "Resource": "arn:aws:s3:::my-bucket-name/*"
    }
  ]
}
1 Like

@doscott I understand what you are saying about the sequence now.

I am however completely lost about your ‘policy’. Is this set on Wasabi? I must admit I haven’t bothered looking at Wasabi’s offering in detail as I was hoping to rely simply on restic doing ‘it’s thing’.

The policy is a set of rules for permissions on the bucket.

Log in to the wasabi console at https://console.wasabisys.com/#/login
In the bucket list, to the right of your restic bucket press the button with the three vertical dots in the actions column and click on settings. Then select the “Policies” tab. You should be able to copy/paste the policy above into the tab and replace my-bucket-name with your bucket name. If you get the green check mark “Policy is valid” indication you should be able to save it. After that restic should be able to work with the bucket.

@doscott Thanks very much for your reply. I was able to create the repository AND backup to it yesterday without any specific policies and just my Access Keys.

So I’m still not understanding why we need the policy that you have set up.

I set up restic first on Amazon and then moved to wasabi because of cost. The guide I used:

which was very similar to the amazon s3 setup.

My memory is that the s3:DeleteObject permission is key, but it’s been a while and I won’t swear to it.

wasabi is compatible with most of the amazon s3 things (versioning being one area they don’t completely match), so looking at the restic Amazon S3 example may help:
https://restic.readthedocs.io/en/latest/080_examples.html

@doscott thanks for your reply and reference to the article. You probably haven’t seen the article recently but it has the following at the very beginning:

Edit (2020): I highly discourage using Wasabi. They have a very misleading pricing policy and you will end up with bad surprises on your invoices at the end of the month.

There is no further information about what the ‘bad surprises’ are. I take it you don’t share the same experience.
xxxxxxxxxxxxxxxxxxx

I’ve been reading up on Wasabi Policies and it would appear that I do require one. I’ll copy your’s over tomorrow. However my question still remains.

On Friday I was able to create the bucket AND then create a repository with restic AND multiple snapshots without having any policies attached to the bucket. So what has changed?

Re. why things work and then didn’t, I can’t say. A policy was required with Amazon so when I moved to wasabi I created one from the console when I created the bucket from the console.

Re. the Edit, wasabi billing has been precisely what they indicated it would be. I have never had any surprises. A recent invoice:

Details                                                                            Unit Price                  Quantity      Total
SERVICE CHARGES
 Timed Active Storage                                              $0.00016243 per GB per day             32770.7 GB-day      $5.32
 Timed Deleted Storage (applicable for deleted storage < 90 days)  $0.00016243 per GB per day             1829.25 GB-day       $0.3
 Data Transfer (in) (all regions)                                                     $0 per GB              4.19573 GB          $0
 Data Transfer (out)                                                                  $0 per GB              25.4714 GB          $0
 API Requests                                                               $0 per 1k Requests         439.621 Requests          $0
 Minimum Active Storage (applicable if Timed Active Storage <1 TB)       $4.99 per Billing Cycle                       0         $0
SUPPORT CHARGES
 Support Charge                                                                     $0 per Day                   30 Days         $0
TAXES
 Taxes (US State Sales Tax, VAT, or GST)                                                     $0                        1         $0
                                                                       Subtotal                                          $5.62
 Wasabi Technologies, Inc.          Wasabi Technologies B.V.
 29th Floor                         c/o Vreewijk Management BV         Tax                                                  $0
 111 Huntington Avenue              Kingsfordweg 151
 Boston, MA 02199                   1043 GR Amsterdam                  Paid                                              $5.62
 United States of America           The Netherlands
 billing@wasabi.com                 VAT ID: NL 8597.15.231.B01         GRAND TOTAL                                    $5.62

Thanks very much @doscott. I’m going to ask the question about why things stopped working tomorrow with their support and will report back.

Yes that is about what I’m expecting in billing per month too. It’s a pity that Stanislas doesn’t mention what his grievance was. I have posted a question on the blog. If he replies I’ll report back here too.

Have to say again, two years I am a happy wasabi customer and there were never any surprises on my monthly bill.
Of course, their data detention policy (every Byte gets billed at least 90 days even when deleted) is a disadvantage and could lead to bad surprises, but the fact it exists is no secret. You find it a hundred times on their pages.

It’s still much more cheaper than others and if I look at all the different fees at aws for example, there are way more factors that could lead to bad surprises on bills.

We use Backblaze B2 for backups. The timed storage cost is almost exactly the same. Other than that, both services have a few disadvantages:

  • B2 has egress fees ($0.01/GB); Wasabi does not. (B2 is still considerably cheaper than S3, which charges $0.09/GB for egress.)
  • (Related) Wasabi has an egress cap per month; B2 does not.
  • Wasabi has a minimum storage duration per object; B2 does not.

Pick your poison. :slight_smile:

1 Like

That is an interesting statement. In their FAQ they state:
“Wasabi’s free egress policy is designed for use cases where you store your data with Wasabi, you access this data at a reasonable rate, and your use case does not impose an unreasonable burden on our service. … If your monthly downloads (egress) are less than or equal to your active storage volume, then your storage use case is a good fit for Wasabi’s free egress policy. … If your storage use case exceeds the guidelines of our free egress policy on a regular basis, we reserve the right to limit or suspend your service.”

So if you download your entire repository more than once a month, you may get a notice.

@cdhowie Thanks for the comparison. As you say…pick your poison. :slight_smile: I’ve just started with Wasabi and for my volumes there probably isn’t much to pick between the 2. I’m certainly not going to breach their minimum charge volumes.

I was however planning on running an rcheck with --read-data-subset=1/5, 2/5 etch each day which would mean that the whole repository is read once a week and therefore approximately 4 times a month. A question about this:

Am I right in my understanding that the whole repository will be read once a week and that would count towards egress?

I’ve started off the process and let’s see what Wasabi come back with.

xxxxxxxxxxxxxxxxxxxx

On an earlier note about Policy requirement. @doscott I got a reply back from them asking me to delete the old set of Access Keys and just work with the new ones and that seems to be working. So currently I don’t have any ‘Policies’ set up. The bucket’s permissions are set to read/write for the owner only which is the default and also what I need, so I don’t need to change anything.

Thanks again for everyone’s help with advice and suggestions.

There is a good discussion of --read-data-subset here:

In my opinion, all of the major cloud providers are reliable, and a lot of the smaller ones actually use the major ones. I think reading your data 4 times a month is excessive; even once a month is excessive. Personally I don’t bother, but I back up to two local NAS boxes plus the cloud. I occasionally download something from each and have never had any issues with restic recoveries.

If you really want to do this level of checking you might be financially better off using redundant cloud suppliers, say wasabi and B2, skip all of the checking, then when recovering data use wasabi as first choice because of zero egress costs, and B2 in case of problems with wasabi. Combined with your local backup, the odds on not being able to recover all of your data from the three sources is minimal. If you don’t have local backup, you should consider have two cloud backups in different regions.

1 Like