Changing Restic backup policies

jl_678 · January 7, 2020, 2:16pm

Hi,

I am pondering changing my Restic settings and am looking for guidance on the best way to do it. Here is the current folder configuration:

/foo
/foo/a
/foo/bar
/foo/c
/foo/d
etc…

When I originally configured the system, I wanted different retention for /foo/bar vs /foo/ and so I configured Restic as follows:

Restic backup - /foo/* excluding /foo/bar
Restic backup - /foo/bar

Then I run forget twice with two separate retention policies. When originally configured, the /foo/bar retention was shorter than /foo:

Restic forget /foo --> Longer retention
Restic forget /foo/bar --> Shorter retention

Now, I have decided that I don’t need different retention for /foo/bar, and I think that it would be easier to have one policy. I am not sure the best way to make this change. As I ponder this change, a couple of questions came up.

If I remove restic “forget /foo/bar”, will restic apply the /foo retention policy to /foo/bar or asked another way, does “restic forget /foo” apply to the separate /foo/bar backup?
If I remove the exclude from /foo/* what happens to the retention of the previous backups of /foo/bar? Will Restic realize that the old /foo/bar is now covered in the total /foo backup? My guess is no, and so how do I expire the historic /foo/bar backups?

Thank you for answering these questions. Is there anything else that I should consider before making this change? Can you suggest any best practices to achieve my objectives? Ideally, one backup and one forget job.

ProactiveServices · January 7, 2020, 4:48pm

By default forget groups all snapshots by hosts and by paths and runs the forget policy against each of these groups, which can be changed using --group-by. In your case policy will treat /foo and /foo/bar as separate entities and apply the given policy to each. n.b. if you use only --keep-last then it will keep the last snapshot(s) that are members of all (specified) groups.

How forget applies policy depends on what your policies are, in particular what --group-by you are using, so I don’t think I can answer 1) yet.

As for 2), if your (new) policy excludes grouping by paths then the path prefixes will be ignored when deciding which snapshots will be forgotten.

I strongly advise you to a) take a copy of the repo, if practical and b) run a forget with the --dry-run switch, then examine the output to see what would happen. Ideally, first list the snapshots and manually figure out what you expect to be removed and see if restic agrees with you, this’ll help confirm that you’ve wrapped your head around it!

Best practices - do a check --read-data before repo maintenance

moritzdietz · January 7, 2020, 4:57pm

If you have a backend that is remote like B2 or S3, keep in mind that this operation can get expensive depending on the amount of data and it will take a lot of time.

jl_678 · January 7, 2020, 5:00pm

Thank you. A couple of clarifications.

The repository is stored locally on a NAS and so bandwidth and storage is relatively inexpensive.
I just changed backup servers and so am running --group-by paths

That is a very insightful post and will spend more time on it shortly and come back with questions.

moritzdietz · January 7, 2020, 5:03pm

If your setup allows it, you can try to see if you can run the restic operation locally on the NAS instead of a different host. Just a thought
Since restic is written in GO you most likely will be able to find a compiled version for your NAS.

jl_678 · January 7, 2020, 5:06pm

Good idea, but NAS is old, and I am not terribly confident in running this. I run Restic on an x86 single board computer which mounts the production volume and the backup NAS. It works well, and I feel safer about this than running Restic on the same hardware as the backup storage.

moritzdietz · January 7, 2020, 5:07pm

Gotcha! Make sense

jl_678 · January 7, 2020, 5:21pm

Hi,

So I am thinking about your answer in more detail. As mentioned in a follow-up, I current use group by path with “restic forget”. This is a temporary setting as I migrated to a new host, and I can turn this off when all the data from the old host expires. (It will be a few months.)

One solution appears to be to wait until that expiration occurs. When it does, I could switch to group by host. This would treat all data backed up by my current host with the same retention. I guess at that point, I would switch to one backup job of /foo and then run forget with group by host.

Is that right?

I am also interested in your perspectives on question #1.

Thank you!

ProactiveServices · January 7, 2020, 5:43pm

Since you don’t seem to be in a hurry to free up the storage, waiting for the natural expiry sounds like a good idea. wrt question 1, my answer depends on correct understanding of how you’ve run restic backup and how you intend to use forget, but I believe that since --path works on an absolute path, the different path prefixes between your two backup sets will mean your longer retention policy will also apply to /foo/bar. That is to say: it should work as you expect. --dry-run is your friend here, since the coffee is not working as well as I hoped and it feels like the middle of the night already…

jl_678 · January 7, 2020, 6:03pm

I just checked and it appears that /foo and /foo/bar have separate ID’s and so retention must be applied to each backup set separately. (e.g. Running forget on /foo does not expire data that was created by the /foo/bar backup job.)

What do you think of this proposed solution?

Drop the exclude from the /foo backup <-- /foo backup will now pickup /foo/bar
Stop running the dedicated /foo/bar backup and dedicated /foo/bar expire <-- The old /foo/bar repository will never grow and never expire
Run forget normally on /foo

Wait…

Eventually, I will have enough copies of the directory /foo/bar in the new /foo repository that I can manually expire the old /foo/bar repository.

As an added benefit, the strategy above should not use much additional disk space as the data in the new /foo/bar backup will dedupe against the old /foo/bar job. (Is that correct?) As a side note, /foo/bar is not rapidly changing and so there is very little new data being added.

Does that make sense?