Best Way to Backup to Cloud?

I’ve been using restic for over a year now. First, I used local disk as a backup storage and everything worked fine. Now, for a few months, I’ve been backing up to Backblaze B2.

I did some estimate calculations regarding price and my calculations were off by a factor of three (at least) because I didn’t take Transaction Class B transactions (https://www.backblaze.com/b2/b2-transactions-price.html - download_file_by*) into account, which I’m being charged for two times more than the storage itself.

My usage of backblaze has been so far like this, ran by a cron multiple times per day:

  1. back stuff up to the cloud
  2. run forget with --prune
  3. run check command

Does this workflow even make sense or should I run forget with --prune and check less often? What is the command from these, which causes most of the download transactions?

And by the way - I’m also using local cache directory with all these commands (if it does even help).

Both forget and check walk all packs in the repository. check ignores the local cache, and forget rebuilds the index anyway so the local cache is of limited utility. To some degree, forget will also the check the repository; it refuses to operate if certain errors are found.

You probably don’t need to run check more often than once a week or even once a month.

As far as how often to run prune, this depends on multiple variables and so I can’t give you an exact suggested period. It depends how much data is in the repository, how many individual packs are in the repository, and how much data would be discarded over time. Think of it this way: prune reduces your storage bill but increases your requests and egress bills. There is an optimal moment to prune for your repository to save the most money, but nobody else knows what that is.

Thanks!

Okay, as I understand then it makes sense to keep backup separate from check and forget/prune. However, if I only run check once a month, isn’t there a risk of the repository being broken somehow without me knowing it? And by knowing Murphy’s law - that would probably the moment I actually need something to restore from the repo.

I’m having ~5TB backed up right now - I did read from some other topics in this forum that increasing pack size did reduce costs of backing up to cloud. Would that make any difference as I understand that downloading more separate files adds to the cost (at least for Backblaze B2)?