Restic forget directory grouping

I am trying to better understand how restic handles the same directory being backed up by different commands.
The situation is as follows, running restic snapshots gives me:

ID        Time                 Host        Tags        Paths
------------------------------------------------------------
fa575c2d  2020-01-02 12:38:31  dispenser               /home

1839dc19  2020-01-25 01:50:10  dispenser               /home

d89b7470  2020-02-07 20:54:14  dispenser               /home

47965192  2020-02-12 17:42:15  dispenser               /home

1e7f191a  2020-02-13 22:48:26  dispenser               /etc
                                                       /home

8129d134  2020-02-14 22:29:52  dispenser               /etc
                                                       /home

e367c717  2020-02-15 20:36:48  dispenser               /etc
                                                       /home

dc35865b  2020-02-16 16:40:40  dispenser               /etc
                                                       /home

4ef3c951  2020-02-26 23:57:06  dispenser               /etc
                                                       /home

35ce3ebc  2020-02-29 00:56:58  dispenser               /etc
                                                       /home

09006ec3  2020-05-02 18:08:33  dispenser               /etc
                                                       /home

a619023b  2020-05-21 17:57:49  dispenser               /etc
                                                       /home

739a9cfc  2020-07-11 19:16:04  dispenser               /etc
                                                       /home

a2213d75  2020-07-12 13:55:01  dispenser               /etc
                                                       /home

a15eff04  2020-10-07 00:15:04  dispenser               /etc
                                                       /home

984db7cd  2020-10-24 00:11:59  dispenser               /etc
                                                       /home
------------------------------------------------------------
16 snapshots

By issuing the following command restic forget --keep-weekly 6 --prune --dry-run this is the output:

Applying Policy: keep the last 6 weekly snapshots
snapshots for (host [dispenser], paths [/etc, /home]):
keep 6 snapshots:
ID        Time                 Host        Tags        Reasons          Paths
-----------------------------------------------------------------------------
dc35865b  2020-02-16 16:40:40  dispenser               weekly snapshot  /etc
                                                                        /home

35ce3ebc  2020-02-29 00:56:58  dispenser               weekly snapshot  /etc
                                                                        /home

09006ec3  2020-05-02 18:08:33  dispenser               weekly snapshot  /etc
                                                                        /home

a619023b  2020-05-21 17:57:49  dispenser               weekly snapshot  /etc
                                                                        /home

a2213d75  2020-07-12 13:55:01  dispenser               weekly snapshot  /etc
                                                                        /home

a15eff04  2020-10-07 00:15:04  dispenser               weekly snapshot  /etc
                                                                        /home
-----------------------------------------------------------------------------
6 snapshots

remove 5 snapshots:
ID        Time                 Host        Tags        Paths
------------------------------------------------------------
1e7f191a  2020-02-13 22:48:26  dispenser               /etc
                                                       /home

8129d134  2020-02-14 22:29:52  dispenser               /etc
                                                       /home

e367c717  2020-02-15 20:36:48  dispenser               /etc
                                                       /home

4ef3c951  2020-02-26 23:57:06  dispenser               /etc
                                                       /home

739a9cfc  2020-07-11 19:16:04  dispenser               /etc
                                                       /home
------------------------------------------------------------
5 snapshots

snapshots for (host [dispenser], paths [/home]):
keep 4 snapshots:
ID        Time                 Host        Tags        Reasons          Paths
-----------------------------------------------------------------------------
fa575c2d  2020-01-02 12:38:31  dispenser               weekly snapshot  /home
1839dc19  2020-01-25 01:50:10  dispenser               weekly snapshot  /home
d89b7470  2020-02-07 20:54:14  dispenser               weekly snapshot  /home
47965192  2020-02-12 17:42:15  dispenser               weekly snapshot  /home
-----------------------------------------------------------------------------
4 snapshots

5 snapshots have been removed, running prune

My expectation from reading the docs (not the code) would be that the last 4 snapshots wouldn’t be kept, since 6 snapshots for /home already exist; instead it seems I’m effectively keeping 10 snapshots for that directory.
Why is this happening?
If restic works by considering restic backup /data1 /data2 as a single group of directories, it would probably be better for me issue two separate commands (1 per directory) so as to retain only X snapshots per directory, correct?

By default, forget applies to each group of snapshots based on host and path, and you have two path groups on the host:
/home
/home /etc

To have forget treat these two path groups as a single group, add the flag
–group-by host
to your forget command.

This will then apply your snapshot policy to the entire group.

However, if you wish to apply the forget commands as follows:
/home and /home (of /home /etc)
it cannot be done.

That’s what I gathered, thank you.
It’s a bit unfortunate though, it would be a nice addition to consider each path in the group as independent.

What’s the use case for this? What problem are you trying to solve?

No problem in particular.
I’m generally using restic to either back up my home folder or some media.
In the case of media this can be a large directory, as an example:

$ du -sh /media/photos
24G	/media/photos
$ restic backup /media/photos

The way it works now, if I understand correctly, is that the command above will produce a snapshot for that particular folder, but if I then run:

$ restic backup /media/photos /media/music

and try to apply a certain policy to the repository the /media/photos will be left intact until manual deletion, which doesn’t seem optimal.
Maybe this is something solvable with the --parent option?

I’m not sure if you’re varying the paths you back up a lot or if that was mostly for example purposes. In general I’d say, don’t vary them too much if you’re backing up mostly the same stuff every time. Also, why not just back up all the paths you want, every time? Why do one at a time? Restic will only upload the data that has changed since last time, so it’s not like you lose much by doing everything in one go.

That was something I actually did, as the /media/music directories didn’t exist when I first started backing up.
If I lose the ability to apply policies when backing up multiple directories in the same command than a loop would be more appropriate my use case:

for dir in '/media/photos' '/media/music'
do
  restic backup "${dir}"
done

It seems the only way to have policies apply to each directory individually.

You never lose the ability to apply forget policies. But they are applied to a combination of hosts, tags and paths. If you want policies per path that is part of a path group then yes you have to back those paths up separately.

I’m still not sure what the actual problem is. Just back up all your stuff at the same time and apply forget policies to that.

I wasn’t clear, what I meant was that policies aren’t applied as I was expecting as restic currently considers a specific list of paths as single entity when grouping by paths and I think this isn’t intuitive or flexible.

I may not have all my stuff available at all times, I may add new things to back up over time or simply re-organize existing things in a new folder structure.

$ tree /media
/media/
└── videos
    ├── Movies
    ├── 'TV Shows'
    └── personal
$ tree /media
/media/
├── videos (only containing personal stuff)
├── Movies
└── 'TV Shows'

I don’t think this is an issue or a problem, and there is an easy workaround to achieve the result I want.
However, I believe this is something that could be considered for improvement.

Imagine if restif applied policies to each individual path and you have something like:

/home/foo
/etc
/mnt/stuff
/srv/db

And a bunch of other paths now and then. You’d have to do a lot of snapshot maintenance for each path :slight_smile: I think it makes more sense that restic doesn’t try to guess your backup sets and just forgets on the sets you gave it. But a workaround for you might be to back up each path individually instead of some of them together.

A better approach is to use tags to give snapshots a semantic name. Then you can use --group-by host,tags when applying forget policies. For example, we take two snapshots for each server: one for the files and one for database dumps. These are tagged system and database, respectively. If the paths backed up in the system snapshot are changed, it doesn’t matter to the forget command since there is a consistent tag.

2 Likes