RFC: script for simpler restic backups and maintenance

Hello all,

I’m developing a script to simplify restic configuration, schedule backups and maintenance in an easy way. My goal is to have one file with all the configuration (so you can save that for easy recovery) and the main features would be:

  • Compatible with Windows and Linux (wip)
  • Restic download and setup, including gsudo on windows, and cron/scheduled tasks setup (working)
  • Backup schedule on cron format (working)
  • Automated repository maintenance (pruning and health check) (wip)
  • Easy recovery (navigation through snapshots and folders to select what to recover) (working)
  • Pre and post backup scripts (wip)
  • Email reports (working)

I intend to open-source the package once it’s stable enough or sooner if there’s anyone interested or willing to help (it’s a php script by the way, not ideal but it’s what I’m familiar with).

I’m having some “is-this-a-good-practice?” doubts I would like to share to make it as bullet-proof as I can:

  1. The script does a backup, then prunes the backup repository. I think this is easier than configuring another schedule for pruning, but would like to know if there’s any shortcomings with this approach.
  2. Currently I’m developing a post-prune check, with a data-subset of 1% so we get an early warning if somethin is wrong with the repository. Do you think this is a good idea?
  3. When testing on user computers, I find that sometimes the computer is shutdown when doing a backup or pruning, so the repository is locked. We have no way to know if a repository is locked beforehand, so I’m running an unlock before the backkup and before the pruning. What happens if we have a backup running and we try to unlock the repository?

Thanks!

  1. Pruning on remote repositories can become quite expensive due to repacking. As all packa that should be repacked have to be downloaded and then uploaded again. For most remote storage traffic is the annoying cost factor as it’s hard to predict, so you try to avoid it. I’ve seen a lot who do prunes only once a week or month for their remote storages.
  2. I think 1% is very little. I check my local repository once a week fully. I’m not yet sure about how I’d set it up for a remote repository.
  3. Only stale locks are removed. So far I never had any problems with overlapping backups and prunes. If the repository is locked due to an incoming backup from one of my 3 hosts, the prune will fail due to active locks, and systemd will retry a couple of minutes later. If there is an ongoing prune and a backup comes in, it will fail because the repo is locked. Systemd will try the backup again a couple of minutes later.

Thanks @NoobZ for your help!

  1. Pruning on remote repositories can become quite expensive due to repacking. As all packa that should be repacked have to be downloaded and then uploaded again. For most remote storage traffic is the annoying cost factor as it’s hard to predict, so you try to avoid it. I’ve seen a lot who do prunes only once a week or month for their remote storages.

Ok, I did miss that. I’m working with Wasabi, they don’t have any ingress-egress costs, only that files have to be kept for at least 90 days (they’re billing for the remaining days if they’re deleted earlier). In this scenario and when doing user backups (not server backups) I think it makes sense to prune after backup, as we don’t know when the computer is going to be powered on/off.

So the best would be to have an option to prune after backup, or schedule the prune.

  1. I think 1% is very little. I check my local repository once a week fully. I’m not yet sure about how I’d set it up for a remote repository.

This is because on remote repositories you might have ingress-egress costs, and also it would be very time-consuming, so checking all the data at once might not be desirable.

This woud need some configurable options like “check after every backup”, “check on schedule” and “how many data to check”.

  1. Only stale locks are removed. So far I never had any problems with overlapping backups and prunes. If the repository is locked due to an incoming backup from one of my 3 hosts, the prune will fail due to active locks, and systemd will retry a couple of minutes later. If there is an ongoing prune and a backup comes in, it will fail because the repo is locked. Systemd will try the backup again a couple of minutes later.

Yes, I understand that. The question was more like “What happens if I’m doing a backup/prune and then, on the same host, I try to unlock the repository?” I will do some tests, but having some insights from restic experts would be good.

Which backup frequency do you intend to use? For hourly backups, running prune every time is excessive. As prune currently has to rewrite the full repository index, this usually uploads more new data than a few individual hourly backups.

For restic versions after 0.10.0 it should be relatively safe. If you run backup, unlock, prune all sequentially on the same host, then everything should be fine. If different hosts use the repository in parallel, then there’s currently a possible race condition if one of the hosts enters standby while running a backup, see Strict repository lock handling by MichaelEischer · Pull Request #3569 · restic/restic · GitHub .

This is configured per user, I’m currently testing with 1-day backups and every-3th hour backups. Pruning strategy is also configured per user, as I’m testing with Wasabi I’m pruning only backups older than 90 days (Wasabi charges you for files removed before 90 days).

I’m putting this on my TODO list. I implemented a missed backup strategy that I can apply to pruning, so pruning is done weekly for example, but I must be sure that the backup and pruning don’t overlap.

Thanks for the link! During testing I’m only using one repository per client, so this should be no problem. I hope that PR gets merged and then there should be no issue even when multiple clients use the same repository.

For the index rewrite step of prune it does not matter at all how old the removed snapshots are. Whenever prune removes data from the repository it will delete the whole index folder and upload an updated version of it (actually it first uploads the new index, then deletes the old index). That is whenever you run prune to remove data it will delete the old index files now matter how old or new they are.

Ok, I see. Didn’t think about that, because my index is only about 300Mb and even pruning several times per day it’s not having a noticeable impact on my bill.

Anyway, I’ll change that strategy to prune every x backups and do it after the backup, to be sure the computer is online when pruning. I don’t want also to prune after a long time because that would result in longer pruning times and that could increase the possibilites of the computer shutting down while pruning or event a backup overlapping with pruning.