Data-less / lightweight repository?

rakoo · July 12, 2023, 3:54pm

Hello there !

I was thinking about a use case and was wondering if restic could handle it: basically since restic has an index I’d like to know if it is possible to backup only with the index.

Here’s an example flow, with my computer that I want to backup and a backup machine

Initial steps

init a repo, do a first backup, send the whole repo to the backup machine
remove everything in $REPO/data/*

On each backup

Do a standard backup
send everything to the backup machine: only new blobs/trees/snapshot objects will actually be sent
remove everything under $REPO/data/

Why would I want to do that ? Because I don’t have the space to keep all the backups locally or I don’t trust it enough to keep it. The backup machine serves this purpose.

Of course I wouldn’t be able to restore locally without data files, but that’s not the purpose. I can always fetch it from the backup machine before doing it.

My quick tests show me that it should be possible, but I haven’t tested it thoroughly. Is that a possible flow ? Is anybody doing something similar ? Does that even make sense ?

alexweiss · July 12, 2023, 4:52pm

What is your issue with just using a remote repository on your “backup machine”? All files which are performance-critical will be anyway stored in the local cache…

rakoo · July 12, 2023, 6:31pm

I’d like to reduce as much as possible my usage of networks. A new backup doesn’t need connectivity, I understand it’s easier to assume it nowadays but I want to be able to live untethered.

alexweiss · July 12, 2023, 8:24pm

How do you

without a network connection?

rakoo · July 13, 2023, 9:07am

2 things:

my goal is to live as much untethered as possible, not necessarily 100% of the time. It is ok to periodically connect to tranfer files, for instance when I’m on public wifi such that I don’t need to use cellphone tethering or a landline
it is also possible to use a sneakernet and any method like a copy, git-annex, nncp, whatever puts files on a usb key or drive that is then put into the backup machine

kapitainsky · July 13, 2023, 9:32am

I think you are pushing boundaries here:) You have very special requirement and I doubt anybody gave any thoughts about restic “offline” mode when creating this software. Still it can work but not by design but by chance - and things can change anytime. You risk that you (or restic) mess up something and all your backup will be lost.

In your situation I would use restic (or whatever else) to keep one backup on external disk - you can get nowadays 1TB thumb SSD drives. And second backup on remote server I would backup to when online. This way you have two backups - one always with you - one off-site.

This is what I actually do myself when travelling sometimes to places without internet. My first line of defence is local file system where I take hourly snapshots. Then every evening I connect tiny external disk and it triggers backup. When back online I do remote backup.

rakoo · July 13, 2023, 10:49am

Yep, I want to push boundaries to see what is possible

Those requirements aren’t really special, being offline and buying as
little new hardware is a path many are taking in a course towards
reduction of our environmental impact. The less we consume, the better.
My question is to open the discussion about whether restic can fill the
needs of preserving shelf life of computers, extending their usefulness,
reducing the outside dependencies.

I was more interested in whether someone already did something like that,
or put some thought in it.

teran · July 13, 2023, 1:16pm

It’s a good practice to store backups with almost any tool to have >= 2x space of the data size - this will almost guarantee no out of space during the backup (obviously there’re corner cases).

So if you have 2x of space here’s an another suggestion: why not store just some snapshots on your backup machine, one day push them to remote repository and forget them locally, like with policy to store just 1 weekly backup or something similar? This will dramatically reduce the amount of data stored locally while will stay in restic design so risks to stay with corrupted data are almost zero. It obviously up to your data amount and change rate but it’s possible to play a bit with forget policies

If not - that’s probably no option to make it offline and reliable at the same time with all the operations such as backup, restore, maintenance stuff, but I’m not a restic maintainer, just read its design document - probably maintainers will say for sure how it gonna fail in that case

alexweiss · July 14, 2023, 1:25pm

@rakoo: If you like to copy the newly added data (by the last backup run) to your “backup machine”, this will be exactly the same traffic as when running backup to a remote repository located on “backup machine”. Moreover the time backup takes is usually just restricted by the upload rate to the repository .

So I would propose to use “backup machine” as a remote repository and whenever you intend to have a connection, just run the backup command by hand. Even if your connection is lousy and gets interrupted, a follow-up backup run will not need to restart from scratch but use the data from the aborted run. (run rebuild-index after an interruption to salvage even the last couple of seconds)

About the setting you are trying to work: This heavily relies in the fact that your local cache keeps all needed data. Try to remove the cache and see what’s happening

zer0 · August 26, 2023, 1:11am

Create 2 repos: one locally and another on the backup machine.
Backup on local machine.
restic copy whenever you want to copy snapshots to backup machine.
restic forget --keep-last 1 --prune.
restic forget --prune id-of-last-snapshot.