Git-annex to manage a restic repo?

I am just reading up on restic over the last week or so and thinking of switching. I’m wondering if anyone has tried using restic this way or if it doesn’t make sense at all…

I would like to find a way to offload most of my repo to s3 deep glacier for the savings. Reading the forums it seems like this does not work well at the moment. I also would not mind having multiple repo copies.

Would it work to create a local restic repo and commit the restic repo into git annex in order to push it to glacier (and maybe other places, eg a nas)? Then even drop some of the local data objects locally (assuming I have another local copy eg nas) and get the speed of local incremental/deduplicated backups with ability to offload to glacier? Or is this a terrible idea?

I am not familiar with git annex, but if the homepage gave me an idea correct enough: This can be a bit hard with restic repositories. Since what you’ll see on the repository itself is some hashed filenames with encrypted contents (e.g. data/de/def36db6e32633ff7ec9fb42f12258fb58d41f540a4f1b33f468762e9da6bb35), looks like your only choice to have a commit on annex after each backup, to know what is used by what. Even then, you won’t be able to differentiate completely since restic does deduplication:

  • You backup something, restic puts some data files on the repo
  • You backup something else, which has some deduplication, so restic puts new data, but the new snapshot needs some old files under data/ , which is not visible to you since these files are not changed

(I might be not understanding the plan or git-annex well, so consider my answer as a theoretical mumbling.)

So I tested this out and can confirm that it ‘works’, at least on the surface level. I also see some references to people using this workflow with borg, but they are sparse on details. Eg Backing up with borg and git-annex | Blog of Julian Andres Klode and suggest remote backup storage options? · Issue #2177 · borgbackup/borg · GitHub. Oddly, git-annex has official support for the opposite approach - committing files to git-annex, and then storing the repo in borg, vs the other way around which I am discussing.

Here is the workflow, and what happens. Note that the restic repo still ‘looks’ normal:

mkdir my_repo
cd my_repo
restic -r . init

git init
git annex init
git annex add .
git commit -m initial commit

Now, the repo looks like a normal restic repo. The only difference is that every file is a symlink to another file under .git/annex/objects. But basic stuff seems to work:

restic stats
restic backup /some/files

git annex add .
git commit -m update

Now the fun part - create a second annex repo as the backup: walkthrough

# if you git annex move . --to usbdrive, then things break, since config is no longer present
git annex move data/ --to usbdrive

restic check # still works!

# Everything still seems to work... confirmed incremental deduping works too
restic stats
restic backup /some/files
git annex add .
git commit -m update

After the git annex move data/ --to usbdrive the data/ folder still has symlinks, but they are broken (until the data is restored at a later point). My initial testing though indicates that the data is not needed to actually work w/ restic, do backups, etc - i assume it would only be needed to restore or check the integrity of the repo.

1 Like

I genuinely have no idea what the goal is behind this approach, or what the perceived benefit will be. Restic’s repository structure is actually very close to git itself, so this seems a bit like using git to version a git repository. What is the point?

2 Likes

The point would be to encrypt/backup once and distribute the backup (to potentially multiple) offline. Eg use a fast local drive or nas for the backup repo, and then replicate that to glacier, which restic as I understand it does not like. Git-annex provides a bit more formality than eg rclone/rsync for copying/sharing a single restic repo around (vs if you backup restic to two repos, the backups are not identical since they encrypt or even chunk differently if care is not taken).

This very well could be a Rube Goldberg, but I saw some buzz around this with borg, so I thought maybe worth exploring

It seems like git-annex is just an intermediary way to shuffle files around, which doesn’t solve the underlying problems with using cold storage tiers with restic. If the indexes become damaged, for example, you need to run rebuild-index and then you need all pack files accessible, which will require costly restore fees. You can also pretty much never prune, as the amount of money you save is going to be miniscule in comparison to the restore/transfer charges required to figure out what data is used and repack used objects into new packs.

Rclone directly to S3 with the various Glacier tiers makes a lot more sense than introducing git-annex, unless there is something git-annex is doing that’s going over my head. But the fundamental problem remains – sometimes, restic needs to be able to access all pack files on-demand, and cold storage is fundamentally incompatible with this.

2 Likes

Note that with 0.12.0 this is no longer true. rebuild-index only needs to access pack files which are not fully covered by the present index (unless you specify --read-all-packs). So if your index is lost, all pack files will be accessed. But if your index is damaged and missing only few pack files, those are the only ones to be accessed.

About cold storage discussions, I also opened the following issue:

2 Likes

This is a really interesting idea. restic and git-annex are my two primary data management tools. I use git-annex extensively and really love it for it’s stability, sparseness (i.e. dropping unneeded files) and for tracking data redundancy and consistency across many different types of backends.

That said I have a hard time wrapping my head around the possible benefits here. My intuition tells me the restic repo is better left to restic to handle, but I definitely applaud the creativity! :slight_smile:

Edit: I think the main problem with this approach is keeping track of which files in the repo to drop and which to get. It’s definitely possible, but feels like a lot of book keeping.

1 Like