How often should I run "forget"?

I’ve been running it daily for a while now, and I think I just realized my policy is wrong. In this post I’m trying to understand how it’s wrong, and if I should even bother running forget very often at all.

My scenario: I take daily backups of my stuff, my wife’s stuff, and some business stuff, tagged “husband”, “wife”, and “business”, respectively. I was running this daily for our personal files:

restic forget --tag husband --tag wife \
    --keep-within-daily 1m \
    --keep-within-weekly 3m \
    --keep-within-monthly 1y \
    --keep-within-yearly 75y

First - this doesn’t every forget anything does it? Because it’s only considering snapshots that have both tags, not one or the other. I think I can solve this by adding a “user” tag to all snapshots with “husband” or “wife” tags, and then running it with just --tag user, right? The grouping feature should ensure the two sets of paths are considered separately, right? Or should I run it twice, once for each user/tag.

Assuming I fix that, I just need to check my understanding that this will not actually collect many snapshots. Lets say I’m taking 1 snapshot per day, and running the above every day after the snapshot is taken.

  • Days 0-31: No snapshots forgotten.
  • Day 32:
    • The 31 most recent snapshots will be kept due to --keep-within-daily 1m
    • The oldest (32nd most recent) snapshot will be kept due to --keep-within-weekly 3m
  • Day 33:
    • The 31 most recent snapshots will be kept due to --keep-within-daily 1m
    • The 32nd most recent snapshot will be kept due to --keep-within-weekly 3m
    • The oldest (33rd most recent) snapshot will be forgotten

And this pattern will continue, and I’ll never get more than 32 snapshots kept?

Only if that snapshot is the only one in its week.

Nope. At the latest on day 40 (probably much earlier, I haven’t thought it through completely) another snapshot will be kept by --keep-within-weekly. --keep-within-weekly will keep the latest snapshot in each week alive, which is not older than 3 months compared to the latest overall snapshot.

A snapshot is kept alive when any of the keep rules wants to keep it. So with your rules it keeps one snapshot for each of the last 31 days, one snapshot for each of the last roughly 13 weeks, the last 12 months and last 75 years. A snapshot can be kept by multiple rules at the same time.

So with your rules the following will happen:

  • Days 0-31: no snapshots forgotten
  • Day 32:
    • the 31 most recent snapshots are kept.
    • the 32th snapshot is kept if it is the last/only one in its week
  • Day 33:
    • the 31 most recent snapshots are kept.
    • the 32th is kept if it still exists
    • the 33th will be forgotten
  • Day ?: the oldest snapshot that was kept by --keep-daily-within is no longer kept and it is from a different week than the next newer snapshot
    • snapshot x is now kept by the weekly rule.

You might want to try the --dry-run option provided by forget to get a feeling for what it does.

For example for --keep-within-daily 7d --keep-within-weekly 1m --keep-within-monthly 1y you could end up with the following scenario. If you add new daily snapshots in 2022-08-26 and 2022-08-27, then 6ba41893 and 70baeba3 will be deleted. If you add another snapshot on 2022-08-28, then f2dd23e4 is still kept as it is still selected as weekly snapshot.

4539a326  2022-07-31 21:02:50  host                    weekly snapshot  /
                                                       monthly snapshot
03a5e85c  2022-08-07 21:09:51  host                    weekly snapshot  /
5bdfe63c  2022-08-14 21:02:49  host                    weekly snapshot  /
6ba41893  2022-08-19 21:01:18  host                    daily snapshot   /
70baeba3  2022-08-20 21:09:16  host                    daily snapshot   /
f2dd23e4  2022-08-21 21:01:48  host                    daily snapshot   /
                                                       weekly snapshot
9737ea65  2022-08-22 21:05:56  host                    daily snapshot   /
d0c8a9b2  2022-08-23 21:04:11  host                    daily snapshot   /
eaa14d90  2022-08-24 22:00:58  host                    daily snapshot   /
25c59cda  2022-08-25 22:06:33  host                    daily snapshot   /
                                                       weekly snapshot
                                                       monthly snapshot

It might also help to think of the snapshot expiry more in terms of a calendar, than “x days old snapshots”. In fact, using a calendar it’s pretty simple to manually apply the -keep-* rules. Just mark the latest snapshot in each day/week/month and throw away everything without a mark.