Restic repository advanced mirror between local and off-site copies

haneef · November 16, 2021, 11:42am

Hi,

I would like to run periodic backups/snapshots with specific tags and different retention periods for both local and 2 x off-site snapshots.

Tag	Local	off-site
bi-daily	3	30
nightly	3	90
weekly	4	104
monthly	2	24

(the numbers above are the number of snapshots to keep for each tag both locally and on remote (respectively)

Local will be on HDD.
off-site will be a remote cloud storage, like B2/OneDrive/Google Drive (to be decided).

What is the best way to implement this?

Thanks,

rawtaz · November 16, 2021, 11:34pm

First of all, please note that you can use the --dry-run option to the restic forget command to test forget policies without actually forgetting any snapshots. This is probably a good idea to do now that you’re trying to establish your policies and thereby the corresponding commands to use.

Next, please check the forget options by running restic help forget, you can also check: Removing backup snapshots — restic 0.12.1 documentation

So, if you want to apply different policies to different tags, you probably need to use the --tag option with the tag you want to apply a policy to, so that you only apply the policy and forget snapshots that has this tag on them. Then you’d run one forget command for each tag.

Once you have that baseline, it should just be a matter of adding the proper --keep-* options for each of the tags you want. E.g. to keep a snapshot every four weeks back you add the --keep-weekly 4 option, to keep a snapshot every month for two months back add --keep-monthly 3, and so on. The --dry-run will let you see which snapshots restic would forget and which ones it would keep. Just try it and see

So in summary, for each of your repositories, you would run one forget command (with appropriate options) for each of the tags. So that would be 2 (repositories) * 4 (tags) = 8 forget commands. Note that after that you only need to run one prune command per repository though.

Does that make sense?

haneef · November 17, 2021, 1:01am

thanks that really helps.

So, every time restic -r /srv/restic-repo backup is done, I’m considering my options with mirroring that over off-site.

rclone sync
restic copy

Since, both local and remote will have different snapshots, I guess rclone sync is not an option. So, I read this, which confirms my theory.

What’s the most efficient way of using restic copy in this scenario?

Thoughts

Would the restic copy be more efficient if the last few snapshots are the exact same? More so, there’re more snapshots in the destination (off-site/remote) than local.

If I go with restic copy:

If both local and remote repos have the same encryption key, could I avoid decryption and re-encryption?
How could I avoid restic copy crawling through everything?
Could I just do restic -r /srv/restic-repo copy --repo2 rclone:remote after every backup run.

Thanks again in advance,

Kindest regards,

MichaelEischer · November 18, 2021, 7:27pm

There is no option to avoid reencryption. However, most somewhat modern CPUs have hardware acceleration for AES which means that restic would be able to reencrypt a few gigabytes per second. Thus this is not a bottleneck.
restic copy will only copy snapshots and data chunks which are not present in the destination repository
Running copy after each backup run should work well. If the last few snapshots are identical, than copy will just copy the snapshot but nothing else.

The performance of restic copy is not optimized yet, but will be once Speed-up copy command by MichaelEischer · Pull Request #3513 · restic/restic · GitHub is merged.

haneef · November 18, 2021, 8:03pm

Thanks @MichaelEischer, that’s helpful to know.

Let’s say the destination is OneDrive via rclone, how will it find which are (and not) present in the destination.

Thanks

haneef · November 18, 2021, 8:04pm

will it go through each file to find which exists in the destination?

MichaelEischer · November 18, 2021, 8:07pm

copy adds a mark to snapshots when copying them, which allows it to recognize the already copied snapshots in later runs. It will have to read each file in the snapshots folder of the repository once to do that, but that should be reasonably fast, especially when cached. For the data chunks in the repository, restic uses an index which lists which chunks exist in a repository. Then copy just has to check whether a chunk exists in the target repository and copy it if it’s not the case.

haneef · November 18, 2021, 8:57pm

wow amazing! Thanks!

Out of interest, the mark is added in the source repo or the destination repo?

So, for the subsequent runs, it only has to download the index from OneDrive and it’ll send the ones that it doesn’t have?

Thanks,

MichaelEischer · November 19, 2021, 10:43pm

The mark is part of the snapshot created in the destination repository.

restic by default caches the index and some other metadata for a repository. The checks which data has to be copied is done solely based on that metadata. That is with an enabled cache restic should most of the time be able to avoid downloading data from the destination repository. (It will have to download a few small files, but not much).