I’m looking for some advice about how people are using tags, or whether there is a better approach to forgetting old snapshots than what I am doing.
restic has been a fantastic set-it-and-forget-it solution for me, with the one exception that I can’t quite figure out how to automatically forget
content. I have a set of scripts across various machines that do backups at their own convenience/schedule, in some cases users a --files-from
is generated automatically.
My goal is to do something like this:
restic forget --tag userdata --keep-last 3 --keep-daily 7
I was using tags based on the type of content (configuration, logs, userdata, vm-images, etc) with the intention of being able to do prune by content type.
Where this falls down is when a user adds a folder to their list (as provided by --files-from
or a wildcard), it causes the prior snapshot to be retained forever as restic sees it as unique.
For example, I have this snapshot that I wanted deleted years ago:
dfae38ff 2019-10-21 11:04:22 yar.example.com userdata /mnt/backups/important/bob /mnt/backups/important/joe
But it will be retained forever, because the subsequent snapshot added a new path, which is seen as unique:
`dfae38ff 2021-10-21 11:05:55 yar.example.com userdata /mnt/backups/important/bob /mnt/backups/important/joe /mnt/backups/important/sue`
In reality there is nothing special about dfae38ff
that would make me want to retain it – I understand that when a path is removed, the situation could be different in some cases, although for me it is all the same.
I’m guessing the best way to handle this would be to use unique tags for each “job”, and then in forget
use --group-by "host,tags"
? Is there a better way to handle this? Or is there a better way to use restic as a whole?