Backup lots of b2_download_file_by_name API calls

Hi there,

I’m quite new in using restic for backing up my data. I have a dedicated server and I’m using restic with backblaze b2 cloud storage to backup my docker stack and my home directory.

I wrote a small script to backup my data automatically every night as below:

set -euo pipefail

. /opt/bash/bash_colors.sh

export B2_ACCOUNT_ID=<>
export B2_ACCOUNT_KEY=<>
export RESTIC_PASSWORD=<>

error() {
  >&2 clr_red "$@"
  exit 1
}

[ "${UID}" -ne 0 ] && error "This program needs to be run as root, exiting..."

clr_green "[+] Backing up docker volumes"
restic -r b2:nullbyte:/var/lib/docker/volumes --verbose backup /var/lib/docker/volumes/

clr_green "[+] Backing up home directory"
restic -r b2:nullbyte:/home/lionel --verbose backup /home/lionel

However, this seems to create a lot of b2_download_file_by_name API call, like he is trying to reconstruct the cache everytime:

Sep 12 05:23:07 xxxxxxxx backup.sh[26023]: Load(<index/2202c4ef98>, 0, 0) returned error, 
retrying after 9.997698328s: b2_download_file_by_name: 403: Cannot download file, download 
bandwidth or transaction (Class B) cap exceeded.
Sep 12 05:23:08 xxxxxxxx backup.sh[26023]: Load(<index/5d5107556e>, 0, 0) returned error, 
retrying after 6.222894868s: b2_download_file_by_name: 403: Cannot download file, download 
bandwidth or transaction (Class B) cap exceeded.

This cause the second backup (to one of the home directory) to never be fulfill successfully, because transaction caps for downloading files on backblaze is limited.

What should I do to reduce these API calls ?

Best regards.

Hi @nullbyte and welcome to the restic community :slight_smile:

Can you share with us how low you have set your cap on the b2 website for the class B API calls?

Hi @moritzdietz

Thanks for the help :slight_smile:

Right now, I’m using a free account on backblaze because the amount of data to backup is < 10G. So I have a daily hits caps to 2500 API call per day.

Here is a screenshot from my cap & alerts page on b2:

I was planning to keep a free account until the amount of data I have to backup > 10G, but this issue is really blocking and preventing me to do a proper backup of my data :frowning:

Thanks again for your help :slight_smile:

For you information, I’m using the latest stable release of restic:
restic 0.9.5 compiled with go1.12.4 on linux/amd64

Currently I am wondering if it is the --verbose flag that causes the high API calls for b2_download_file_by_name.
The reason for this is that restic will print all the file names of the files it handles and this possibly is the reason for this specific API call to be so high - as your repo probably has a couple thousand files in it.

Could you try to run your restic command without the verbose flag and then check again tomorrow or when your cap is back to 0?

I don’t think that’s the case, the --verbose flag only configures what restic prints and does not influence the backend communication. The file names come from the local file system, the metadata used for detecting modified files should come from the cache.

@nullbyte can you please ensure that the cache restic uses is kept between runs? You can either set it manually via --cache-dir or by setting XDG_CACHE_HOME. Does restic print something along the lines of created new cache in [...]? If that’s the case, the cache directory is not kept between runs.

Good to know! I thought it was reading in the filenames from the repo also. Thanks for clarifying that.

1 Like

Hi @fd0,

Oh yes indeed, good catch ! I see that from the log file that it is not able to open the cache directory :open_mouth:

Sep 12 03:00:02 xxxxxxxx systemd[1]: Started Backup of nullbyte.
Sep 12 03:00:02 xxxxxxxx backup.sh[26023]: [+] Backing up docker volumes
Sep 12 03:00:02 xxxxxxxx backup.sh[26023]: open repository
Sep 12 03:00:09 xxxxxxxx backup.sh[26023]: unable to open cache: unable to locate cache directory (XDG_CACHE_HOME and HOME unset)
Sep 12 03:00:09 xxxxxxxx backup.sh[26023]: lock repository
Sep 12 03:00:19 xxxxxxxx backup.sh[26023]: load index files

backup.sh[26023]: unable to open cache: unable to locate cache directory (XDG_CACHE_HOME and HOME unset)

Seems that he cannot find the cache directory. That’s indeed likely the issue.

1 Like

@nullbyte you might have a better time backing up locally then using Backblaze’s b2 utility to sync the repo. In the future when you want to prune the repo, this will save you a huge amount of bandwidth.

Hi @pwr,

I have an unlimited transfer bandwidth, so I don’t really care about saving bandwidth.
I already have a local backup, but to be honest, I also prefer to have an off-site backup as well, you never know what could happen.

Cheers.

I think what he meant was that you can sync your local backup to B2 separately instead of having a separate repository with different backup times. Especially now that you let us know that you have a local backup also. But I don’t know your usecase exactly. So ¯\ _(ツ)_/¯

Yep, what @moritzdietz said.

Doing a prune directly on b2 requires downloading the entire repo, repacking, and re-uploading, and while you may have unlimited bandwidth from your ISP, Backblaze charges $0.01 per GB download.

By comparison, using restic for your local repo, then doing a b2 sync --delete on that repo to a b2 bucket is much faster, as it only uploads the changed blobs, deletes remotely, requires no downloading and uses far fewer API calls. You can then mount or restore from your remote repo in the same way you would with your current setup.

1 Like

Hi @pwr

Oh I see what you mean ! That’s indeed a better idea. I’ll have a look in that direction then.
So using restic to backup to a local repo and then using that repo and back it up using b2 utility.

There’s more overhead in this approach but it seems more cost effective indeed.

Thanks for your help !

Cheers

You want to make sure that you’re not backing it up once again. This would result in another layer of encryption on top of already encrypted files. You just need to synchronize it to B2.
There is a terminological difference between syncing and backing up. As far as I know, the B2 client from Backblaze itself encrypts the files beforehand and then sends it to B2. As your data is a restic repository, it is already encrypted.

If I can make a suggestion: Use rclone instead. Rclone is a great and very popular piece of software.
In fact, restic relies on rclone components for some backends.

For example here is a command I use to synchronize my local backup repository to B2 using rclone:
rclone -P --bwlimit 4M --fast-list --transfers 15 --b2-hard-delete sync /mnt/ssd/Backup/ b2:backup

rclone sync

Make source and dest identical, modifying destination only.

Synopsis

Sync the source to the destination, changing the destination only. Doesn’t transfer unchanged files, testing by size and modification time or MD5SUM. Destination is updated to match source, including deleting files if necessary.

Important : Since this can cause data loss, test first with the --dry-run flag to see exactly what would be copied and deleted.

Note that files in the destination won’t be deleted if there were any errors at any point.

It is always the contents of the directory that is synced, not the directory so when source:path is a directory, it’s the contents of source:path that are copied, not the directory name and contents. See extended explanation in the copy command above if unsure.

If dest:path doesn’t exist, it is created and the source:path contents go there.

Note : Use the -P / --progress flag to view real-time transfer statistics

1 Like

I think it’s all clear now. Thanks for the subtle difference between backing up and synchronizing files.

I know rclone, I already use it once and it’s indeed a great piece of software :slight_smile: That’s probably the solution I will choose.

Cheers.

1 Like

Yep, that’s a common problem when running in a restricted environment. In this case, restic can’t find the cache path to user since neither $HOME nor $XDG_CACHE_HOME is set. Either set one of those, or use --cache-dir to specify the cache directory. And make sure it is kept between runs, otherwise restic will work in a degraded mode where each tiny bit of data is fetched directly from B2. That’s what caused your API calls. :slight_smile:

2 Likes

I actually think this is not the case, i.e. b2 sync just syncs files as-is. I’ve used it to sync a folder of video masters, the files in which I could then share publicly, and I’ve sync’d my local restic repo, and then accessed the repo directly with restic, as if it were backed up directly with restic. I don’t think I’m doing anything special to avoid double encrypting.

(Unless you mean it just encrypts in transit?)

Hi,

Indeed I adapt the script as below to use a cache dir as first argument:

CACHE_DIR=${1:-${XDG_CACHE_HOME}}

# if the cache directory does not exists, create it.
[ -d "${CACHE_DIR}" ] || mkdir -p "${CACHE_DIR}"

[ "${UID}" -ne 0 ] && error "This program needs to be run as root, exiting..."

clr_green "[+] Backing up docker volumes"
restic -r b2:nullbyte:/var/lib/docker/volumes ${CACHE_DIR:+--cache-dir ${CACHE_DIR}} --verbose backup /var/lib/docker/volumes/

clr_green "[+] Backing up home directory"
restic -r b2:nullbyte:/home/lionel ${CACHE_DIR:+--cache-dir ${CACHE_DIR}} --verbose backup /home/lionel

And it’s indeed using far less API calls.

Thank you very much, this is a satisfying solution for now :slight_smile: I’ll probably refine my design later on :wink:

Regards,

2 Likes

Awesome, I’m glad you figured it out! I think it would be valuable to make this error a lot bigger, so people recognize that restic works in a “degraded mode” without the cache (which may be fine, but I think most of the time people would rather use a cache).

I learned some nifty shell magic from your post :smiley:

1 Like

Thanks for your help @fd0 I would have spent days to figure it out without your help :wink:

I learned some nifty shell magic from your post :smiley:

Haha yes ! I’m far from beeing an expert in bash but I know a few tricks :blush:
This is call shell parameters expansion, this is a subset from shell expansion technics. You can go really deep and do crazy / advanced stuff with this :grinning:
You can find more info here:

and there:

1 Like