Moving a local repository to B2

I have about 100Gb of restic backups from my cloud servers which I’m thinking of moving to B2, partly to cut costs and partly to get some geographic redundancy. The backups are currently on a virtual drive attached to one of the cloud servers which the backup clients access via SFTP.

The plan is to copy the existing repo into a new bucket using rclone and then update the configuration for the backup clients to use B2. I have tried this with a small test repo and everything looks OK.

Has anyone else done a migration like this? Any pitfalls to watch out for?

That worked great for me. I regularly copy my local repo up to B2 and it is just fine. The repositories are portable. I have about 400 GB of data backed up in this instance. Pretty silky smooth. I just used rclone.

I have also done this, and have had success exactly as @matt did, using rclone also (in fact, I regularly do this - back up to rest-server, push that via rclone to b2 for redundancy).

Btw guys, I also use ‘local repo + rclone to b2’ scenario. Any ideas how to optimize rclone part? It takes a lot of time just to figure out list of changed files. Plus now rclone uses too much of ‘class C’ transactions (more than B2 offers as free limit). It’s not very expensive for me (~$1 in monthly bill).

But I think that under certain assumptions (nobody uses b2 directly except rclone) it should be possible to optimize this.

rclone has ‘cache’ remote. Any experience using it?

Have you tried using --fast-list?

Sure. I use followed command:

rclone -v -v sync --log-file=/srv/backup/restserver/rclone.log --fast-list --size-only --exclude rclone.log --exclude private_key --exclude .htpasswd --delete-after

Have you tried with --old-sync flag? I’ve heard this reduces the Class C’s

I’ve tried to investigate it. According to https://forum.rclone.org/t/new-sync-method-in-v1-36-and-backblaze-b2-class-c-transaction-cap/1441 --fast-list should work exactly same way as --old-sync.

Btw rclone has cool option --dump bodies that will show every transaction. So for my case (with --fast-list option) most of time is spent on /b2api/v1/b2_list_file_names with something like this in body:

{“bucketId”:“xxx”,“startFileName”:"data/00/00xx ",“maxFileCount”:1000}

So it actually tries to list all files in bucket with 1000 files per transaction. And 1000 is maximum value to request (according to b2 docs). My repo size is a bit more than ~300K files. So I’ll get at least 300 transactions just to get file list. Currently rclone is triggered using incond once last lockfile is removed from locks subdirectory.

And about a way to optimize this:

  1. Watch for snapshots/ change instead of locks in incrond
  2. Use rclone copy instead of rclone sync and provide file names to upload (something like find -newerXY ${LATEST_SYNCED_SNAPSHOT})
  3. Use full sync weekly after cleanup+prune.

PS. One more thing to consider: use one B2 bucket per restic repo. And never ‘share’ B2 bucket for multiple synced repositories or unrelated stuff.

Interesting - why is this, can you explain?

Sure. As far as I understand rclone --fast-list gets whole list of files in bucket. And if you have unrelated stuff in it, you’ll probably also ‘pay’ to get this unrelated list.

But this needs to be checked.

Really? I would be surprised if --fast-list didn’t limit the listing to a subfolder (or technically, a prefix).

One potential other reason would be if you ever needed to make a snapshot and download the snapshot, the snapshot can only happen on a per-bucket level. Additionally, when B2 fully implements the ACL API’s, then you can easily separate permissions for the different buckets to different servers, etc.

If you are copying to B2, you may want to turn the lifecycle settings to only keep the last file in B2 (since restic will be keeping the ‘history’ in its own files).

If you choose to do that be aware that even if you have versions turned off in B2, if you sync your backup to B2 using rclone, you need to use --b2-hard-delete or versions are created anyhow.

You can use:

rclone -q --b2-versions ls remote:bucket

to see them - look for -vYYYY-MM-DD-HHMMSS-### at the end of the filenames.

You can use =rclone cleanup remote:bucket= to clean them up.

Look at:
https://rclone.org/b2/

For more info…

1 Like

Good point, yes, both these things I do too and recommend others do the same, generally.

Finally I was able to implement sync script that uses rclone copy instead of sync and provide file list to upload. I assume that repository is usually append-only (no files are changed at all). forget/prune requires full sync, but they are not used so often.

Still testing it, but I think that it already works as expected.

Unfortunately for copy command rclone performs b2_list_file_names anyway. If called without --fast-list it queries non-recursive list for parent directory for every uploaded file. The good thing is that it’s possible to estimate number of b2_list_file_names transactions by counting number of changed directories.

So right now I’m comparing number of changed directories with [total_files_in_repo]/1000 and choosing which sync method is cheaper (copy vs sync). For my case copy requires ~30% less transactions if b2 sync is triggered after every backup. And this is already pretty cool, since now I’m under free b2 limit again.

1 Like

Hi there.
Can you share the script please?