Maximize backup speed to GCS

Hello,

I am backing up a few terabytes (mostly >100MBs files) to Google Cloud Storage using restic 0.17.3 with a 5Gbps internet connection.

Google’s copy tool (gcloud storage rsync) reaches 300MB/s average throughput, which is reasonable.
Restic on the other hand maxes out at 100MB/s regardless of the settings I tried.

I get 75MB/s with the following base command line:
$ GOOGLE_PROJECT_ID=“my_project” GOOGLE_APPLICATION_CREDENTIALS=“gcs_credentials.json” restic -r gs:my_bucket:/ backup --files-from include.txt --exclude-file exclude.txt

I tried to:

  • Increase the number of connections to GCS (-o gs.connections=10), speed is 100 MB/s.
  • In addition, increase the read concurrency (–read-concurrency=10), speed is about 100 MB/s.
  • Increase pack size (–pack-size=128), speed is about 100 MB/s.
  • Further increase the number of GCS connections (to 20), speed drops to 65 MB/s.

In all cases, CPU is not a bottleneck : restic usage is around 200% CPU on a 8 core system, and no restic thread uses more than 50% CPU.
Nor is the source storage (fast NVMe drive), nor the internet connection.

Any idea what could be tweaked in restic to increase the backup speed, since the hardware does not seem to be the bottleneck?
I have read Tuning Backup Parameters — restic 0.18.0-dev documentation but I does not seem there is anything else to try there.

Should I try the rclone backend?

Further info:

$ restic version
restic 0.17.3 compiled with go1.24rc1 on linux/amd64

System has a AMD Ryzen 7 PRO 7840U CPU (8 physical cores), a fast NVMe drive, 64GBs RAM, 5Gbps internet connection.

Thanks,

Raphaël

Just as a test maybe try disabling compression?

Thanks, I now reach 140MB/s when disabling compression:

-o gs.connections=10 --read-concurrency=10 --pack-size=128 --compression=off

hello @raphael
If you have such a fast network connection, i’d suggest to also try REDUCE the number of backend connections, as recommended for (also fast) local storage:

Backend Connections

Restic uses a global limit for the number of concurrent connections to a backend. This limit can be configured using -o <backend-name>.connections=5, for example for the REST backend the parameter would be -o rest.connections=5. By default restic uses 5 connections for each backend, except for the local backend which uses a limit of 2. The defaults should work well in most cases. For high-latency backends it can be beneficial to increase the number of connections. Please be aware that this increases the resource consumption of restic and that a too high connection count will degrade performance.

TMPFS
Then since you are on linux, i’d suggest to put one or both restic CACHE and restic TMPDIR folders in RAM. You can do this with tmpfs mounts in /etc/fstab. Tune your sizes to restic usage and available memory, example:

# ramdisk for restic, apt and logfiles.
#myramdisk                              /tmp/ramdisk                            tmpfs   defaults,size=32G       0 0
restic_tmp                              /tmp/restic                             tmpfs   defaults,size=32G       0 0
restic_cache                            /tmp/.cache/restic                      tmpfs   defaults,size=6G        0 0

and then in your backup/script:

# first set some variables to use the RAMDISK
export TMPDIR=/tmp/restic
export RESTIC_CACHE_DIR=/tmp/.cache/restic

One side-effect and benefit from using fast RAM is that it will reduce wear out of your local storage too.

Thanks for the suggestions.

No improvement, and some degradation with low number of connections (-o gcs.connections=1 or 2).

Done (and checked that the on-RAM tmpfs was used), but unfortunately, it does not improve the throughput.

I dug much deeper, and I think I found the (or at least, one) issue:

  • A test golang program using GCS golang API maxes out at about 140MB/s, regardless of the parallelism. This shows that the performance issue is not specific to Restic.
  • A test rust program using OpenDAL reaches 500MB/s (with a parallelism of 5).

The culprit therefore seemed to be GCS golang API, or the way it’s commonly used. And indeed, it seems that by default, the http client re-uses a single connection across GCS clients, which creates a huge bottleneck (see here). When disabling keepalive which is the solution in that PR from another project, I now reach 500MB/s with the GCS golang API!

I’ll check if the gains transfer to Restic’s code and at some point send a PR.

1 Like