Implications of storing by paths

I’m new here. When I look at a mounted repo I see that it’s organised like hosts/HOSTNAME/absolute/path/file.

Sometimes my data moves around a bit, e.g. the host changes but also the path. I was pretty happy to see the effect of deduplication on this. I created the first backup in the repo from my existing backup server (a simple nightly mirror copy), then I ran a backup from the production machine and even though the paths and hosts were different (paths: /backup/server1/var/www on my backup machine, and /var/www on the prod machine) it ran through quickly because it recognised the data as the same.

However, it leaves me with a path (as shown when mounted) like hosts/oldbackupserver/backup/server1/... that I’m not sure how to delete. I know it’s not taking up much space but it could be confusing later on.

Is there a way to specify a storage path, like you can in rsync (-R from memory) so that two hosts that have the data in two locations could store it in the same place? Feel free to tell me this is not a good idea :slight_smile:

And is there a way to delete unwanted hosts or paths?

Thanks.

Not sure if I understood you correctly but you could just use the forget and prune functions: https://restic.readthedocs.io/en/stable/060_forget.html . That way your initial snapshot would be gone and the files would still exist in all the other snapshots.

1 Like

There’s currently no way to “modify” the path that’s being saved for the items you back up, you’d have to cd into an appropriate base directory and run the backup from there to make it be a relative path that will be the same on all your hosts backing this data up.

If you really want to get rid of it, delete the snapshots. That said I don’t really see the problem.

1 Like

just use the forget and prune functions

yeah right, because each snapshot only belongs to one host… Makes sense, thanks.

Ah, I must have missed the relative paths feature, I’ll go RTM again! Thanks.

It’s not really a feature, it’s just how your position yourself in the filesystem when you run restic and give it paths.

Let’s say you have a directory mysite that on server 1 is placed in the /var/www/ dir and on server 2 is placed in the /data/web/htdocs/ dir.

If you back those up by running e.g. restic backup /var/www/mysite and restic backup /data/web/htdocs/mysite on server 2 you will obviously have snapshots with different paths in them, which you don’t want.

But if you before backing up on each server first cd into the dir where mysite is located, and then run e.g. restic backup mysite then all snapshots will have just mysite as the path.

That’s really helpful, thanks.

Sorry to resurrect this five year old thread but I am a little confused and am seeking clarification.

The scenario is that I have a very large and old rsnapshot setup which I would like to decommission and import the most recent daily.0 snapshot in to restic on another host, accessed through rest-server.

(I have the rest-server working well I believe and restic on my rsnapshot machine can talk to it, query the repo etc.)

After I have imported the latest rsnapshot run in, I would begin supplementing this with runs of restic on the backup clients.

So, the filesystem layout on the rsnapshot server is like this:

/srv/ranspshot/daily.0
.
├── foo.example.com
│   ├── etc
│   │   ├── blah
│   │   .
│   │   .
│   ├── home
│   │   ├── andy
│   │   .
├── bar.example.com
│   ├── etc
│   │   ├── blah
│   │   .
│   │   .
│   ├── home
│   │   ├── andy
.   .   .

I was expecting that I could [write a script to] cd into each host directory above and do the equivalent of:

restic backup \
    etc/ \
    home/ \ 
    srv/ \
    --one-file-system \
    --tag from_rsnapshot \
    --host foo.example.com

for each one to populate my restic repo with initial rsnapshot data.

However when I list the snapshots afterwards, the paths shown are the full paths from the rsnapshot machine, e.g. /srv/rsnapshot/daily.0/foo.example.com/home/andy/.bash_profile.

From reading this thread and others like it, I had thought that I was doing the right thing by changing to the root of the backups and specifying relative directories.

Won’t this cause me issues later when I come to do the real restic backups from host foo.example.com for example, like:

cd / && restic backup etc/ home/ srv/ --one-file-system

I assume that deduplication will still match between the data from rsnapshot and the “new” data, but file paths will be all different so trying to determine changed files, paths that appear when restored etc., will all be out of whack won’t they?

Is there something I should be doing differently here?

Thanks,
Andy

As far as I know, restic uses the full path internally always, so that behaviour is expected.

The data chunks will still deduplicate, but the first backup of the “real” system will be slower as restic will need to scan everything to make sure it has all the data already stored. Subsequent backups of the real system will be significantly faster, because the full scan won’t be necessary.

I’m sorry, I don’t really understand what you mean by this? When you perform a restore, the contents of the snapshot (or the parts of the snapshot you specify) will be restored. So if you restored /srv/rsnapshot/daily.0/foo.example.com/home/andy/.bash_profile from a snapshot, that file and path would be restored inside the target directory you specified. The contents/metadata of the restored file would be as they were when the snapshot was created.

I think a chroot would work to get you the path structure you want? But it would involve some setup. There’s a bit of discussion about it in Backfilling snapshots from non-restic archives - #2 by cdhowie , and some reports of using proot to accomplish the same in Backup option to remove a leading path prefix · Issue #2092 · restic/restic · GitHub , and some other solutions too that sound promising.

1 Like

Using the example file I gave as a way to explain what I mean:

As part of the first backup I did, one of the paths I asked to be backed up was home/ so that would include home/andy/.bash_profile.

Despite the fact that this came from an old rsnapshot store, it was done with --host foo.example.com. After the backup finished, rewrite was used to set the time to the time that rsnapshot originally did its work. So, intuitively, some later `backup1 from the real host will have a matching host and a later timestamp so will be logically related.

Because this was actually done from inside my old rsnapshot hierarchy though, restic has stored its path as /srv/rsnapshot/daily.0/foo.example.com/home/andy/.bash_profile.

Later when I come to run a backup from the real host foo.example.com I’ll specify the relative path home/ (while running from /) and that same file will appear in that snapshot as /home/andy/.bash_profile.

Is it not the case that all parts of restic that match upon file paths will consider these two file paths as completely different files, there will be no way to fully follow a history of change here.

For example from my testing and understanding of restic find --path:

–path path only consider snapshots including this (absolute) path (can be specified multiple times, snapshots must include all specified paths)

there would be no way to return both the origin snapshot and some later one because the path “must be absolute” and “snapshots must include all specified paths”.

There’s probably other issues I haven’t learned about yet.

So as far as I can see while it might be handy to have those historical backups in restic I’ll still need to be treating it as a slightly separate pool of data, albeit still helping with deduplication.

It looks like a lot of this was discussed in this issue. I see too that rustic has some notion of relative paths and the ability to do a path remapping at backup time but I realise that is off-topic here.

Thanks,
Andy

Yes, I suppose that’s one way to think about it. I’m not sure that a “history of change” is really meaningful though, except in terms of a forget policy. And you can use a --group-by option to have restic group the snapshots together regardless. But perhaps I’m still missing something key here :slight_smile:

The --path option is used to filter the list of snapshots being queried, not to specify the path of the file being searched for. So you could pass it the path of the directory that was backed up, e.g. /srv/rsnapshot/daily.0/foo.example.com/, or / to limit the number of snapshots searched by find.

If you want to search all the snapshots for a file that exists in more than one location in multiple snapshots, I’d suggest the use of a wildcard. E.g. restic find *home/andy/.bash_profile would return information about both /home/andy/.bash_profile, and /srv/rsnapshot/daily.0/foo.example.com/home/andy/.bash_profile from their respective snapshots.

Does that help at all?

Just follow up, I already wrote a script to help with the one-off import of the rsnapshot directories, setting correct host and time and such, so it wasn’t that hard to also make it do it in a chroot as you suggested. Thanks!

That does take about 2 hours for each rsnapshot/interval.x/ directory tree and there are 50 of them to do, so it’s still ongoing but it’s looking promising so far. :grinning_face:

1 Like