(not) Understanding snapshots

jeff · June 23, 2021, 7:08am

Hi,
afaict, to make a backup (snapshot), I always have to call it like this
restic -r <repo> backup <folder>

I don’t quite understand why (under regular circumstances). Why shouldn’t restic store info about and might simply create a new snapshot when called without the parameter?

IMO there’s not much use in storing different folders in a repo, because the repo is organized by snapshots, not folders. So I would need to scan/search snapshot for folder instead of just using the latest (or any other by time).

Once, I thought about adding single files from different locations - but if each file has its own snapshots that’s not useful.

So what am I missing?

Thanks
Jeff

doscott · June 23, 2021, 10:02am

Personally I have 2 repo’s: local NAS and remote cloud.

Most people will run the backup from a script, so that line you don’t like to type over and over will be typed for you.

torfason · June 23, 2021, 10:48am

You could set the $RESTIC_REPOSITORY environment to <repo>. That way you could just type:

restic backup <folder>

This could be convenient if what you want to do is just an ad hoc restic backup ~ every now and then but run restic backup ~/src more frequently when working inside that directory, which would work fine (and files in both backup sets would be correctly deduplicated).

But @doscott is right, in most cases, you’d write a script around this whole thing and then you don’t have to type the parameters anyway.

rawtaz · June 23, 2021, 11:10am

You gave an example of a command with two parameters, so it’s unclear which parameter you are referring to in your question. I’m assuming you mean the <folder> parameter, since the use and need for the <repo> parameter (or it’s equivalents such as the environment variable RESTIC_REPOSITORY etc) is obvious.

It’s simply a matter of different use cases. If you don’t feel a need for being able to back up separate folders into the same repo, that’s fine. But a lot of other people back up different paths into their repositories.

Restic is designed to match snapshots and paths when it backs up and forgets (unless you change this in your forget command), so this is simply the basic functionality that is present here.

When you back up using restic, it tries to find the latest snapshot that matches (were taken with the same paths) the paths you back up this time, in order to compare metadata and know which files it doesn’t have to scan again. I guess technically it could be changed to look up the latest snapshot when you don’t give it a path, and try to use the metadata in that, but there simply hasn’t been much of a use case for that. I mean, if you give it a path the first time, then why would it be a problem to give it the same path the next time(s) you run the backup? It’s simply a non-issue.

If you mean that you want to be able to run the backup without paths the first time, then the answer is simply “because not everyone wants to back up their entire system” and possibly “because most people don’t want to back up their entire system, but instead only the things that matter, e.g. their home directory or similar”.

gurkan · June 23, 2021, 11:28am

Btw, it shouldn’t be hard to write an alias/wrapper with something like shell or Python, which will trigger whatever command you’d like to shorten (even choosing the repo/target depending on which directory you’re in etc.).

In general, maybe a config file support would help people with simple needs. I wouldn’t support such decision, but understandable.

rawtaz · June 23, 2021, 11:53am

Indeed. Wrapping commands in a simple script, even if it’s just one line, is extremely common and more the norm than not when needed.

jeff · July 2, 2021, 5:33pm

Thanks for all your answers! Much appreciated.

The point, however, wasn’t how I might call restic in a simple way. I’m not afraid of scripting at all.

It’s just that I thought I didn’t understand the basic concept: how could it be useful to put different paths into a single repo. I mean, there’s a difference whether I backup “/home” entirely on the one hand or “/home/user1”, “/home/user2”, “/home/user3” seperately on the other hand. Or if there should be only one repo for “/home” and “/var” - or two.

If you say, this isn’t uncommon (as I read your answer @rawtaz ) then I’m fine with it. Of course

Thanks

rawtaz · July 3, 2021, 2:07am

There’s nothing wrong with using multiple different “sets” of paths in your repository. In your case it might help to not think of snapshots as the first citizen, but think of the path sets as being the first citizen, and snapshots just the points along a timeline for each of those path sets.

What is right or applicable for you depends on your use cases. Personally I back up two different computers to the same repository, so in my repo there are two different path sets.

As you know restic takes paths into account when you run the forget command to remove snapshots, such that the forget policy you specify applies individually to each group of paths/hosts/tags depending on your --group-by setting.

donisewell · July 3, 2021, 6:30am

Hopefully I understand you correctly so that this makes sense.

Personally, I try to stuff everything I can into a single repo. I figure the more I do that, the more I benefit from dedupe - assuming this is all on the same storage.