I’m trying to implement a backup strategy at my office, but the higher-ups specifically don’t want to retain anything longer than two years. I know I can restic forget --keep-within 2y but it will retain the last snapshot per host. I understand the reasons we might be hesitant to add such a flag, but it would be a pain with 100+ hosts to have to go through and manually remove the last snapshot once the snapshots get old enough (we get a lot of people coming and going here, and once they leave the backup isn’t needed).
My thoughts are, if there were a --forget-last-snapshot switch that made it obvious what it would do, it would solve the problem, and also not make it the default behavior.
Another idea I had for the meantime was to tag ALL my snapshots with “cleanup” or something, then do `restic forget --keep-within 2y --group-by tags’ but then I can’t use tags for other purposes (right now I have the usernames as the tags, and so unless I removed all the tags, Restic would still keep the last copy of each username).
Is there any chance at all of getting a --forget-last-snapshot with A BIG WARNING BESIDE IT IN THE --HELP SWITCH perhaps?
The reason restic keeps the last snapshot for each host is that grouping of snapshots is by default done on hosts and paths. If you configure restic to only group on paths, then it should hopefully do what you want. Just add --group-by paths to your forget command.
Hmm that doesn’t seem to work, since most of the paths are unique. So if I have an employee named Bob, and the machine name is BOB123 and the tag is bobusername and the path is C:\users\bob - doesn’t matter if I group by the machine name, tag, or path - there’s still always going to be one snapshot leftover, even if Bob is long gone and doesn’t need his backup anymore - unless I go in and manually prune every user that leaves.
I basically just need a switch that will forget anything past a certain date, no matter what. With all the warnings of perils that lie ahead if necessary - but still, the option for power users would be nice. Otherwise I’m going to have to quarterly go through, try to figure out who has left (there’s lots of users, and lots of techs - I won’t always hear about it), and manually prune each one. That doesn’t sound fun.
I hear you, but I think the insistence to not allow any policy to remove the final available snapshot in a group is pretty well reasoned and will probably be maintained.
But for advanced usage (backing up 100+ hosts seems to qualify ), it seems like implementing a script to run restic snapshots --json and then use that to locate snapshots older than 2y for removal would not be too difficult.
I threw together a quick script in R to do this, I’m sure it would not be too hard to do something similar in python or another scripting language of choice.
# Packages
library(tidyverse)
library(jsonlite)
library(lubridate)
# Prepare the data
d <- fromJSON("snapshots.json") %>%
as_tibble() %>% select(time,id) %>%
mutate( time = as_datetime(time) ) %>%
mutate( old_snapshot_2y = ((d$time+dyears(2))<now()) )
# Keep only the correct snapshots
d.drop = d %>% filter(old_snapshot_2y)
# Use the list of snapshots to execute forget
I think you need to go back a step in your design and reconsider how you tag your snapshots and what hostnames you have. For starters, how do you tag your snapshots?
If you just want to back up multiple systems and then keep only the last X snapshots, completely regardless of whether or not one of the systems actually has any snapshots in that series of X snapshots to keep, then just setting the same tag on all snapshots (or even better, no tag at all) and then running the forget with e.g. --keep-within 2y should do what you want.
If you then reason that “but I need to tag snapshots with each individual user/system’s name” then the question becomes why do you need to do that? If you really want to separate the users’ snapshots, then use the hostname for identification.
I think a broader overview of the design is needed to verify it’s sane
The policy doesn’t sound unreasonable to me. For the general retention policy, it makes sense to group by host, path, and user (as a tag). The problem arises with the special (but also reasonable) two-year policy because --keep-within 2y is never relative to <now> but relative to <last_snapshot_in_group>.
So stale users will never be cleaned out, because all the policies are done relative to their last snapshot. Creating a dummy tag to force all the backups into one group makes sense, but makes tags unusable for other things.
The concern is legitimate, but the use case is nevertheless not very compatible with the philosophy behind the restic retention policy. I can think of a few features that could address this, but none of them are currently implemented, and I’m not sure if they would get accepted:
Add switch to run forget policy relative to current time. That would clean out anything older than 2 years. I’ve seen this discussed on the forum and decided against.
A switch to --forget-last-snapshot would not work without a the above feature. Because the last snapshot looks brand-new to the forget command (relative to the newest snapshot in that group).
Allow --group-by=none to disable all groupings. This would clean out anything more than 2 years older than the most recent backup, which is probably just a few hours old. Not seen it discussed, but it does seem a bit dangerous.
Allow some more fine-grained control over grouping by tags. Selecting just specific tags for grouping. Seems somewhat complicated and somewhat dangerous.
FWIW, the original concern was that the latest snapshot for each “group” is preserved, it wasn’t about how the forget times are calculated and what they relate to.
This means that the concern is about how the grouping is done, and e.g. that if Bob and his computer is no longer with the company since three years, his latest snapshot is still kept.
One concern I have with that concern is that if you want to deviate from the grouping, then you are also deviating from the safety measure of making sure that each system (“group”) has at least one snapshot. The way you want it to work, you could theoretically end up with a situation where you have e.g. 100 systems backed up (or you think you do) and a lot of them for various reasons having failed backups are completely lost and don’t even have one single backup left in the repository for those failed systems (but happily have a bunch of snapshots for the rest of the systems). That’s a situation that the current design of the forget command prevents.
All in all I would say that your use case is better met by using separate repositories for each user (that’s more normal I’d say), or if you don’t want to do that (e.g. because of making use of deduplication between the users) simply script this. It’s just a matter of getting the list of snapshots and remove all those that are older than a certain date. As already shown, it’s literally just a very few lines of code.
I don’t think the suggested option would be implemented in restic, when there are better ways to solve this.
Host is just that - many users have two computers so this is important. Tags are just usernames, and occasionally notes for myself (“before_reimage, before_upgrade” etc). Also some machines are shared by multiple people, so the user matters. Some backups are manual, and include C:\users, others are C:\users\aspecificuser. Just sort of depends. And I like being able to say “restic snapshots --tag aspecificuser” and seeing every machine I have a backup from, for them.
I would do this - and might HAVE to do this eventually, depending on how large the repo grows - but a LOT of users from the same labs have duplicate data. I may just make a repo per-lab, but then that wouldn’t solve my dilemma here.
I’ll have to play around with this. Haven’t used R yet. But thanks!
I still think it wouldn’t be unreasonable to have a flag that is well-described with WARNINGS for power users that would do this, though. But eh, I get it, too. Just frustrating from a power user’s perspective. If I want to nuke my repo, I should be able to nuke my repo haha