RFC: script for simpler restic backups and maintenance

underdpt · December 16, 2021, 8:17am

Hello all,

I’m developing a script to simplify restic configuration, schedule backups and maintenance in an easy way. My goal is to have one file with all the configuration (so you can save that for easy recovery) and the main features would be:

Compatible with Windows and Linux (wip)
Restic download and setup, including gsudo on windows, and cron/scheduled tasks setup (working)
Backup schedule on cron format (working)
Automated repository maintenance (pruning and health check) (wip)
Easy recovery (navigation through snapshots and folders to select what to recover) (working)
Pre and post backup scripts (wip)
Email reports (working)

I intend to open-source the package once it’s stable enough or sooner if there’s anyone interested or willing to help (it’s a php script by the way, not ideal but it’s what I’m familiar with).

I’m having some “is-this-a-good-practice?” doubts I would like to share to make it as bullet-proof as I can:

The script does a backup, then prunes the backup repository. I think this is easier than configuring another schedule for pruning, but would like to know if there’s any shortcomings with this approach.
Currently I’m developing a post-prune check, with a data-subset of 1% so we get an early warning if somethin is wrong with the repository. Do you think this is a good idea?
When testing on user computers, I find that sometimes the computer is shutdown when doing a backup or pruning, so the repository is locked. We have no way to know if a repository is locked beforehand, so I’m running an unlock before the backkup and before the pruning. What happens if we have a backup running and we try to unlock the repository?

Thanks!

NobbZ · December 16, 2021, 8:46am

Pruning on remote repositories can become quite expensive due to repacking. As all packa that should be repacked have to be downloaded and then uploaded again. For most remote storage traffic is the annoying cost factor as it’s hard to predict, so you try to avoid it. I’ve seen a lot who do prunes only once a week or month for their remote storages.
I think 1% is very little. I check my local repository once a week fully. I’m not yet sure about how I’d set it up for a remote repository.
Only stale locks are removed. So far I never had any problems with overlapping backups and prunes. If the repository is locked due to an incoming backup from one of my 3 hosts, the prune will fail due to active locks, and systemd will retry a couple of minutes later. If there is an ongoing prune and a backup comes in, it will fail because the repo is locked. Systemd will try the backup again a couple of minutes later.

underdpt · December 16, 2021, 9:16am

Thanks @NoobZ for your help!

Pruning on remote repositories can become quite expensive due to repacking. As all packa that should be repacked have to be downloaded and then uploaded again. For most remote storage traffic is the annoying cost factor as it’s hard to predict, so you try to avoid it. I’ve seen a lot who do prunes only once a week or month for their remote storages.

Ok, I did miss that. I’m working with Wasabi, they don’t have any ingress-egress costs, only that files have to be kept for at least 90 days (they’re billing for the remaining days if they’re deleted earlier). In this scenario and when doing user backups (not server backups) I think it makes sense to prune after backup, as we don’t know when the computer is going to be powered on/off.

So the best would be to have an option to prune after backup, or schedule the prune.

I think 1% is very little. I check my local repository once a week fully. I’m not yet sure about how I’d set it up for a remote repository.

This is because on remote repositories you might have ingress-egress costs, and also it would be very time-consuming, so checking all the data at once might not be desirable.

This woud need some configurable options like “check after every backup”, “check on schedule” and “how many data to check”.

Only stale locks are removed. So far I never had any problems with overlapping backups and prunes. If the repository is locked due to an incoming backup from one of my 3 hosts, the prune will fail due to active locks, and systemd will retry a couple of minutes later. If there is an ongoing prune and a backup comes in, it will fail because the repo is locked. Systemd will try the backup again a couple of minutes later.

Yes, I understand that. The question was more like “What happens if I’m doing a backup/prune and then, on the same host, I try to unlock the repository?” I will do some tests, but having some insights from restic experts would be good.

MichaelEischer · December 27, 2021, 2:28pm

Which backup frequency do you intend to use? For hourly backups, running prune every time is excessive. As prune currently has to rewrite the full repository index, this usually uploads more new data than a few individual hourly backups.

For restic versions after 0.10.0 it should be relatively safe. If you run backup, unlock, prune all sequentially on the same host, then everything should be fine. If different hosts use the repository in parallel, then there’s currently a possible race condition if one of the hosts enters standby while running a backup, see Strict repository lock handling by MichaelEischer · Pull Request #3569 · restic/restic · GitHub .

underdpt · January 5, 2022, 9:02am

This is configured per user, I’m currently testing with 1-day backups and every-3th hour backups. Pruning strategy is also configured per user, as I’m testing with Wasabi I’m pruning only backups older than 90 days (Wasabi charges you for files removed before 90 days).

I’m putting this on my TODO list. I implemented a missed backup strategy that I can apply to pruning, so pruning is done weekly for example, but I must be sure that the backup and pruning don’t overlap.

Thanks for the link! During testing I’m only using one repository per client, so this should be no problem. I hope that PR gets merged and then there should be no issue even when multiple clients use the same repository.

MichaelEischer · January 6, 2022, 8:32pm

For the index rewrite step of prune it does not matter at all how old the removed snapshots are. Whenever prune removes data from the repository it will delete the whole index folder and upload an updated version of it (actually it first uploads the new index, then deletes the old index). That is whenever you run prune to remove data it will delete the old index files now matter how old or new they are.

underdpt · January 10, 2022, 7:35am

Ok, I see. Didn’t think about that, because my index is only about 300Mb and even pruning several times per day it’s not having a noticeable impact on my bill.

Anyway, I’ll change that strategy to prune every x backups and do it after the backup, to be sure the computer is online when pruning. I don’t want also to prune after a long time because that would result in longer pruning times and that could increase the possibilites of the computer shutting down while pruning or event a backup overlapping with pruning.

mrodent · July 15, 2022, 7:53pm

I rather like this idea of yours.

At the moment I have Python scripts which run the “bare metal” processes of restic using cron or TaskScheduler (+1 for acknowledging that people do actually use OSs other than Linux, even though they may hate M$ and all its works).

These scripts of mine work OK, but I’m very conscious that so much more infrastructure needs to be built to have an application which would, for example, automatically and thoroughly check the health of repositories at a configured frequency, and allow you to access functions and change settings using a much friendlier GUI.

And maybe do things like issue a “health OK”, by email, once a week, or an email stating that a possible problem has been detected…

Your initial post was 8 months ago… how’s it coming along?

rawtaz · July 16, 2022, 10:18pm

@underdpt Have you seen this one already? GitHub - creativeprojects/resticprofile: Configuration profiles manager and scheduler for restic backup

Seems like a bit of overlap, perhaps just join efforts?

underdpt · July 17, 2022, 7:04am

Hello,

So, I’ts been pretty stalled but I’m using it on about 40 computers on varying environments (servers, workstations, laptops, windows, linux…) and I’m very satisfied with it.

I’m on vacation for the next weeks and intend to work on it on august to have something I can share on github. I’ve been stalled mainly because the recovery process was very rude but I recently found a way to make it more friendly (menu based on cli). I’ll publish it here once I have it ready to be shown on github.

underdpt · July 17, 2022, 7:26am

Wow

That’s pretty awesome. I dind’t notice that project. It does already almost everything I wanted (I’m missing the notification email after backup) and a lot more. And it has the most wanted feature I needed: to have a backup configuration in one file, so with that file I can recover the backup anywhere.

I’m not sure I could join that project (I know nothing of golang!). I intend to finish my project and publish it, and will have an eye on resticprofile, maybe I can ditch my configuration system and use theirs.

mrodent · July 17, 2022, 11:35am

Yes, I’m also stumped by the Go aspect of this (although it does look very good), so please do publish yours (what language?) in due course.

underdpt · July 17, 2022, 2:32pm

The language used is PHP