Backup a subset of restic data to remote

catleeball · April 10, 2021, 5:48am

I’m planning automated backups of my home network machines to a local restic database, then rclone the restic DB to offsite storage.

To save space (and money), I’m trying to figure out how to:

For all hosts in the DB, send only a subset of backed-up directories to offsite storage
Maintain only the most recent snapshot of all hosts on the offsite storage

I’d like to maintain a larger set of directories and more snapshots on local storage since I have plenty of space there, but some things (binaries, build directories, containers, VMs) aren’t worth spending money for extra offsite backup space.

Ideally, the local restic DB would have some way to only send the diffs from the most current local snapshot subset compared to what’s on the offsite snapshot. I was considering using Backblaze or similar and wanted to cut down on transfer and access costs, so if the diff could be calculated locally, that would be super ideal.

Also if I could do this all inside the local database to maximize deduplication (e.g. not having a separate staging DB with a copy of the snapshot at head) that would save a lot of local space.

In my head, I imagine some way of tagging “current-offsite” in the snapshot chain, diff against the most current local snapshot, then uploading the diff and updating the local tag. I have no idea if restic would support this workflow, especially exporting only a set of directories from those snapshots.

I’m not sure the best approach to accomplish this, or if there’s any practical way with restic in its present state. Any tips are appreciated! Thanks in advance!

alexweiss · April 10, 2021, 5:19pm

@catleeball Welcome to the restic forum!

About the terminology, we don’t call it “restic database” but “restic repository” or simply “repo”.

This is always the case if you backup into a repo or use restic copy to copy a snapshot from one repo to another.

Copying only some directories of a snapshot to another repo is not implemented. In fact, most commands work on the snapshot level.

I would recommend you to:

set up a local repo and a remote repo with identical chunker parameter (use init --copy-chunker-params)
make several snapshots with different paths. E.g. one with paths to have in every backup and another one with paths which you only want to backup locally. Or even more granular. You always have deduplication if snapshots are within the same repo, no matter whether it’s within a file, a dir, a snapshot or same data across several snapshots.
use the copy command to copy the snapshots you want to the remote repo. Using tags is an easy way to achieve this.
alternatively, you can also run two backups (one to the local and one to the remote repo). This is effectively the same (if the paths to backup don’t change in between) but might be a bit faster as the copy command is not yet fully speed optimized.
regularly run forget and prune on the local and on the remote repo. The parameters for forget can be individually chosen to suit your needs for the local and the remote repo, respectively.

catleeball · April 10, 2021, 7:02pm

Oh, thanks for the nice explanation!

Each machine backing up to the remote repo independently is probably the most direct solution, I don’t know why that hadn’t occurred to me. Tagging snapshots of different subsets of directories is a very elegant approach too though!

Maybe I could even combine the two approaches, so that if I need to restore from the remote, I can selectively restore snapshots of just the subsets of things I need.

Once things are set up, I’ll put the scripts/configs/etc into a github repo and share it to this thread for any folks who come across this via web searches.

Thanks again!