Forget policy -

I do a single backup every day. I want to keep one backup one day old and one backup one week old. forget with -w2 does’t work because:

All the --keep-* options above only count hours/days/weeks/months/years which have a snapshot, so those without a snapshot are ignored. (here)

That means, if I start daily backups, forgetting with -w2, there is one backup in the past week and none in the week before that, so that earlier week is ignored. I’ll be left with only the most recent (daily) backup, and I’ll never get the week-old backup.

The only thing I can see to do is collect 8 daily backups, and then start the forget/prune procedure. But this seems nuts. If, for example, I wanted to run monthly backups and keep one yearly and one monthly, I’d have to keep monthly backups for 13 months until running a policy of -y2 would work. This can’t be how the policy is expected to be applied, can it?

btrbk’s retention policy makes sense to me. Unless I’m entirely minunderstanding, restic’s seems broken.

1 Like

Welcome back!

For the first couple of days, restic will forget the backup from the previous day, so you have only the most recent backup. Once you have the first backup made on a Monday (which starts a week for restic), restic will keep Sundays backup as well as Mondays backup. Afterwards restic keeps two snapshots: the most recent one and the one from the previous Sunday.

Here’s an explanation in an older issue, maybe it helps you: forget: Document that time spans (week, month) are absolute · Issue #2747 · restic/restic · GitHub There’s also a script included that you can use to simulate different retention policies

The main issue is that this exact policy that you want is not supported by restic. I’m also not sure how such a policy could work. Let’s make sure I understand correctly: for any given day, you want the latest backup as well as the backup exactly one week before. So, let’s say it’s Thursday and the daily backup was already run, so the repository should contain two snapshots: From today and from last Thursday. Is that correct?

How about the next day, Friday, and the daily backup has run. Which snapshots should exist at that point?

I’m not familiar with btrbk, but what I read from the manpage sounds very similar to what restic implements.

Here’s the policy for daily, similar for hourly, weekly, monthly, etc.:

daily
Defines how many days back daily backups should be preserved. The first backup of a day (starting at preserve_hour_of_day) is considered a daily backup.

So if I keep one daily and one weekly: when the first backup of the week is considered the weekly backup, then once you have two dailies, you end up keeping the daily from today and the first daily from the present week (so both). When you get the third daily, the one in the middle is deleted. Once you get to one of those dailies being the previous week, that become the first daily of the previous week, and you end up keeping the last pair of dailies from the present week. Repeat.

So you’re not keeping one from exactly a week ago. You end up keeping one in the window between 8 & 14 days, and the earliest in the 1 to 7 day window slowly migrates its way toward the previous week.

It similarly works nicely if you want to keep say one hourly and one daily. Or one weekly and one monthly.

Okay, that still sounds similar to what restic does. The main difference seems to be that restic keeps the newest (most recent) snapshot in a time slot (hour, month, week, day) instead of the oldest one, so you need to add one. This was a design decision way back, and in order to not break backwards compatibility we won’t change it.

--keep-weekly 2 should roughly do what you want.

We’ve established that restic works a bit different, and I hope you understood now how you can get the behavior you want. Any other questions?

After the first backup on a Monday, restic will see two weekly backups: The one from Sunday is the last week, and the one from Monday is in the current week. [From the link you offered.]

It appears the idea is that I have to run the backup/forget cycle daily for up to a week before I get two backups. That’s certainly uncomfortable, as I would rather have one 5 days old and one one day old before I’ve gotten to a week. If instead of weeks, I was using months or years, I would have to wait up to a month or up to a year to get to the month or year boundary and end up with two backups. Really?

I’ll add that while btrbk’s policy comes across to me as immediately transparent, even with careful perusal of the docs, I can’t make any sense of restic’s. I went to IRC to ask, and discussed with someone, apparently an old timer, who couldn’t get it clear either. Regardless, if what I’m seeing above is correct, I just don’t think this is really workable.

Not all all. Restic’s retention policy implementation is deterministic (modulo bugs of course), so you’ll get the same (end) result no matter when you run it. You can do a daily backup for a week and then run forget, or you can run forget daily: the end result will be the same.

I suggest that you play around with the script that I linked in the issue above.

Another option would be to write a script that parses the output of restic snapshots --json and pass the snapshot IDs you’d like to remove to restic forget. That way, you can implement a retention policy yourself, not limited by what restic offers. :slight_smile:

I’m sorry that the docs do not bring the point across. Do you maybe have an idea on how to improve them?

After the first backup on a Monday, restic will see two weekly backups: The one from Sunday is the last week, and the one from Monday is in the current week. [From the link you offered.]

It appears the idea is that I have to run the backup/forget cycle daily for up to a week before I get two backups.

Not all all. Restic’s retention policy implementation is deterministic (modulo bugs of course), so you’ll get the same (end) result no matter when you run it. You can do a daily backup for a week and then run forget, or you can run forget daily: the end result will be the same.

Ok, I’m not clear how this might clarify what you quote above means. If I run daily backups for 7 days starting on Monday, running forget -w2 each day, then on each of those days, I will have only the most recent daily left, correct? (That’s what happened to me when I ran it.) I gather it’s only when I have run the daily for 8 days, running forget -w2 every day, that on the 8th day I will have one weekly and one daily (and these will be on adjacent days). Is that the meaning of your quote?

Ah, yes, you’re right. What I meant was that the snapshots still present in the repository is the same on the 8th day whether you run forget daily or just once on the 8th day. Sorry for the confusion.

When writing the retention policy part of restic, what I had in mind was that users can specify a policy which will gradually thin out the snapshots. Personally, I use something along the lines of --keep-yearly 10 --keep-monthly 24 --keep-daily 14, so restic will always keep the last 14 daily snapshots.

Ok, now we can go back to my question. If I want to take a snapshot every day, keeping one (roughly) a week old and one a day old, -w2 is the only option, but I don’t get anything more than one daily snapshot until I’ve crossed a week boundary. If I wanted to do this with -m2 or -y2, I have to wait until I cross a month boundary or a year boundary before I get more than my single snapshot.

Even in your case, when you begin your daily snapshots, you get 14 of them and then don’t get anything older than 14 days until you cross a month boundary. It’s the same problem, though in your case it is not as stark because you are keeping a half a month of dailies.

I’m sorry, but this is broken. btrbk’s approach is the correct one.

It’s true that what @diagon wants restic to do is not possible, but it does not immediately follow that restic is broken. The fact is that restic snapshots are extremely cheap in all usual circumstances, which is why there are no special features to make it simple to keep exactly two snapshots.

For example, with -d7 -w2 you would ensure you get the two snapshots you desire, but you would get perhaps 5-6 extra snapshots. If snapshots were expensive, that would be a big drawback, but they are not, so why not benefit from the extra safety?

My personal policy is -H30 -d90 -w9999 -m9999 -y9999 and it looks to me like it is costing me an extra 10%, maximum, in repo size, compared to something like -d2 -w2 -m2 -y2.

This has been: “How I learned to stop worrying and love having a big bunch of snapshots around” :slight_smile:

1 Like

Since I’m writing, I actually have two related thoughts/questions:

First, are there any ballparks for how many snapshots are too many? I’ve seen some discussions about forget/prune taking a long time, but on my 40gb backup it has always seemed to go very smoothly.

Second, there is one small problem with how the forget policy is implemented, which applies to my use case of irregular backups (I’m backing up a laptop, so it’s not always on). Which means that my -H30 sometimes span a bit over a day, but sometimes they span a lot more than that. What I want is to keep all hourly snapshots younger than a specific age (relative to the newest snapshot).

So I’ve been looking into implementing a --keep-hourly-within switch, allowing me to specify, for example: --keep-hourly-within=2d (any hourly backups younger than 2 days would be kept, regardless of how many hourly backups have been made in that period). To me this feels like a more natural way to specify the policy.

I’m hoping that this would be seen as a useful addition to restic. Any thoughts on whether this would be a good feature? Would a different syntax be preferrable? We’d start by submitting an issue on github, before submitting the pull request, but it seems the forum is a good place to float early stage ideas.

Wouldn’t --keep-within 2d do something very similar?

Yes, exactly, except that --keep-hourly-within would be aware of the hourly logic.

The relationship between the current --keep-hourly and a new --keep-hourly-within would be the same as the relationship between --keep-last and --keep-within. One keeps last n snapshots of a given type, the other keeps any snapshots within a given time period (relative to the latest snapshot) of a given type.

Putting this in a table, we would have the following (with italic denoting the parameters that are not currently implemented):

Type Keep last n snapshots of a given type Keep snapshots within a given time period of a given type
any –keep-last –keep-within
hourly –keep-hourly –keep-hourly-within
daily –keep-daily –keep-daily-within
weekly –keep-weekly –keep-weekly-within
monthly –keep-monthly –keep-monthly-within
yearly –keep-yearly –keep-yearly-within

Does that make sense?

The fact is that restic snapshots are extremely cheap in all usual circumstances

Backups are expensive for me as this is a laptop and my sdcard space is limited. -d30 -m2 would be even more expensive. Backups to my server also have limited space and I’m dealing with limited bandwidth. I doubt these are unusual conditions.

The answer you both seem to want to give to the obvious problem is, “just keep lots more.” I think instead, the problem itself should be solved.

1 Like

It may not have been clear that when I said snapshots are cheap I was not suggesting that disk space is not expensive. I was saying that an additional snapshot typically requires very little extra disk space. This is the philosophy of restic, and restic achieves this by using pretty sophisticated methods to avoid duplicating any data in the repository.

I encourage you to try it out by setting up a backup to your sd card and checking the size of the repository after each backup until you have the weekly backups that you want. You could do:

export RESTIC_REPOSITORY=<your-repo>
restic backup <your-working-directory>
restic forget -d7 -w2
restic prune                                            # This is what actually removes unused data
du -hs $RESTIC_REPOSITORY >> ~/restic-repo-size.log     # To log the size of the repo

You can then open restic-repo-size.log in your home directory to see how the size of the repository changes each day. I expect that after the first day only a very small amount of disk space will be added. I am also backing up my laptop and was worried about space, which is why I looked into this.

Do let us know how it goes!

1 Like

Apparently we disagree here, I don’t think restic’s retention policy settings are broken. It just uses a different default (keep the most recent snapshot in each time span, instead of the least recent like btrbk).

In your case, I’d either try what restic supports out of the box (e.g. --keep-daily 7 --keep-weekly 2) and test if it works for you or implement your own retention policy (parse restic snapshots --json and use restic forget to remove snapshots).

I’d be interested in your results if you want to share them. Good luck!

2 Likes

It may not have been clear that when I said snapshots are cheap I was not suggesting that disk space is not expensive. I was saying that an additional snapshot typically requires very little extra disk space . This is the philosophy of restic, and restic achieves this by using pretty sophisticated methods to avoid duplicating any data in the repository.

I think I have a pretty good idea what deduplication is. And no, even if -d7 -w2 is able to work, and becomes a moot point after this week of discusison, -d30 -m2 is not going to.

It just uses a different default (keep the most recent snapshot in each time span, instead of the least recent like btrbk ).

I know you’ve had this discussion before in the link you offered earlier, and I suspect you knew what I was talking about from the beginning. I think you’re stuck in a legacy system and don’t want to change; but if you did want to fix this, you could add a switch for the “least recent” policy.

Why do you want to retain something that existed 7 days ago and yesterday, but not 6, 5, 4, 3 or 2 days ago? With your desired snapshot retentions, anything you added 6 days ago, accidentally deleted 5, 4, 3, or 2 days ago is permanently gone.

You can achieve what you want if you daily sync to a daily backup location using rsync, and then weekly sync your daily backup location to a weekly backup location using rsync. This will be simpler for restoring individual files from than from btrfs snapshots.

I’m probably missing something, but I really don’t see what actual practical problem is being discussed here. I looked in one of my personal repos and most snapshot files takes 388 bytes on disk (some take 932 bytes). That’s nothing, and it’s especially not something that’s problematic to keep on disk. So, presumably disk space isn’t the issue here.

The example where one wants to keep the previous day’s snapshot and then one snapshot being a week old, where using -w2 wasn’t satisfactory is one case where I don’t understand what the actual problem to solve is. Like someone else here suggested, why not simply --keep-daily=14 to keep one snapshot for every day for the last two weeks? Why would you want to keep less than that (for example)?

Restic’s retention policy might not work the exact same way that you want it to and that another program you use does, but it still solves the problem of keeping stuff around, and it is flexible enough such that you can specify that you e.g. only want to keep one snapshot per week if for some reason you don’t want more frequent snapshots than that around. Since snapshots take almost no disk space or processing at all, I fail to see how this is a problem.

If I had a dime for every time I had to adapt to some software that’s designed in a different way than I would like it to (at least in parts), I’d be rich by now :slight_smile: I’m not, but it’s still the right choice (in my cases at least) to live with the slight discrepancy between how they do work and how I’d like them to work, because that lets me use great software that solves problems, even if it’s in a slightly different way than I originally wanted. It still adds a lot of value, and it’s hard to find something that 100% matches what you want. Gotta be pragmatic about it.

7 Likes