Help understanding forget policy

I have hourly jobs that run backup, followed by a call to forget with the options

--prune --keep-within 7d --keep-within-daily 3m --keep-within-weekly 1y --keep-within-monthly 100y

I have not changed anything about my jobs, except for on Jan 02 adding a new path to back up. However, the number of snapshots is confusing me. I did not expect there to be so many snapshots remaining for the days prior to Jan 02. Is there an explanation for why these are not getting pruned away?

Date: Snapshots
Jan 15: 4
Jan 14: 4
Jan 13: 11
Jan 12: 11
Jan 11: 11
Jan 10: 11
Jan 09: 11
Jan 08: 1
Jan 07: 1
Jan 06: 1
Jan 05: 1
Jan 04: 1
Jan 03: 1
Jan 02: 2
Jan 01: 5
Dec 31: 9
Dec 30: 10
Dec 29: 11
Dec 28: 12
Dec 27: 11
Dec 26: 11
Dec 25: 1
Dec 24: 1
Dec 23: 0
Dec 22: 1

I noticed in the logs that many of these snapshots are kept for the reason “within 7d”. For example, here’s a snapshot from Dec 31:

a15f2264  2022-12-31 11:01:52  lenny                   within 7d            <path1>
                                                                            <path2>
                                                                            <path3>
                                                                            <path4>
                                                                            <path5>
                                                                            <path6>

and here’s an entry from Jan 14 (with the new path that was added on Jan 2):

661fd748  2023-01-14 11:00:40  lenny                   within 7d            <path1>
                                                                            <path2>
                                                                            <path3>
                                                                            <path4>
                                                                            <path5>
                                                                            <path7>
                                                                            <path6>

Paths 1-6 have been included in every backup for the past year. I am a little confused about how both these snapshots are being kept for the reason “within 7d”, when they occurred 14 days apart.

This is because restic groups snapshots by host and path in the snapshots. Since you have the same hostname in those snapshots, but two different set of paths, two groups of snapshots are created, which are then individually getting the same forget policy applied.

Please read Removing backup snapshots — restic 0.15.0 documentation for more information. But in short, if you want to treat all snapshots for the same hostname as equals, and apply the forget policy to all of them, you can use --group-by host. See also restic help forget.

Do you have any other hosts backing up to that repository? Does the hostname ever change or have you configured your backup command to set it explicitly (or is it just always the same)? If the hostname differs at some point, there will of course be a different group for that as well.

1 Like

Thanks, that makes sense. The hostname has remained the same, and there are no other hosts backing up to that repository. For some reason I assumed that since the new paths form a superset of the old paths the behavior would be different, but I understand how it works now.

I understand how it works now

:+1: