Backing up round robin to initially cloned repositories <-> local restic cache

cdhowie · September 29, 2019, 4:48pm

I’m not sure if the cache is totally rebuilt as in downloading all index files (@fd0 may be able to answer that) but it would have to at least download new index files and remove local copies of missing ones or very basic functionality would break in any situation where multiple restic clients share access to the same repository.

My point is just that what is being described here would look exactly the same to restic as though the same repository was just modified by another client. Either both would have to work or neither would.

On my system, ~/.cache/restic contains directories that are named after the repository ID, so I do believe that they would share a cache, unless you use --cache-dir when using one of the repositories to maintain a separate cache.

The following basic script is used. I’ll explain each line.

#!/bin/sh

cp -aln /var/restic/front/data /var/restic/back/ && \
cp -aln /var/restic/front/snapshots /var/restic/back/ && \
/usr/local/sbin/restic-front forget --prune --group-by host,tags --keep-within 7d && \
/usr/local/sbin/restic-back rebuild-index

Lines 1 and 2 hard-link all absent data and snapshots from the front repository to the back. Running as two separate commands with the packs processed first ensures the repository remains consistent (otherwise there is a small window of time where snapshots are added but the requisite data is not).

The cp options are as follows:

-a: copy recursively, preserving basically everything (ownership and timestamps are what I care about there)
-l: instead of copying contents, hard-link copied files (this means no additional disk space is used, and it’s safe since repository files are never changed)
-n: do not copy files that already exist in the destination

Line 3 is a standard forget+prune line to remove all snapshots except those created in the last 7 days from the front repository. Note that a significant amount of duplicate data is removed here, and this is expected since we’re copying all pack files from an “outside” perspective; we don’t (and can’t) avoid copying in duplicate blobs. (Edit: My bad here, this applies when pruning the back repository, not the front one. Sorry for the confusion.)

Line 4 rebuilds the index for the back repository, which is required for future actions on the back repository to function correctly; we added new data to the repository but it’s absent from the index.

Note that this script should not be run while any exclusive lock is held on the back repository, especially if that locking operation is prune; throwing new data in the middle of a prune could easily result in some of the new data being incorrectly removed. This can’t be enforced by the script since restic has no way to acquire a lock for use by commands outside of restic.

Once this feature is implemented, we will alter our scripts to use it instead:

github.com/restic/restic

Add command to copy all data to another repository

opened 10:33AM - 25 Oct 15 UTC

closed 08:18AM - 30 Aug 20 UTC

fd0

type: feature suggestion state: work in progress

During the discussion in #320 we discovered that functionality may be helpful to… copy all data (data blobs, tree blobs, snapshots) from a repository to a new one, recreating pack files and indexes on the fly. This allows creating a new repository in a different location (e.g. moving from a local repository to an sftp-server) and using that from now on without losing any history and old snapshots. This issues tracks the implementation of this feature and can be closed when it is implemented.