Running unlock --remove-all before pruning

Hi,

We have a case where user might doing several hundred backup schedules from multiple host into one restic repo. A backup schedule can be set as frequent as every 30 minutes. So worst case, backups might always be running at any time.

There is another dedicated server that will do forget and prune.

It’s hard to find a correct time to do forget and prune where exclusive lock is needed. So we use restic unlock --remove-all every time before forget and prune (or at least when we get lock error). remove all because sometimes lock aren’t being removed though its quite old and no restic process running anymore (not sure if thats because we operate restic in several different hosts at the same times)

Is this bad practice or have any problem? We want to prevent human intervention as much as possible at somewhat quite a large scale.

Thanks

Yes, this is a bad practice and can lead to repository corruption.

If backups are running most of the time then the odds of running prune concurrently with a backup are near 100%. Here’s what is likely to happen:

  • A backup process uploads a pack file.
  • Prune starts and loads the snapshot list.
  • Prune notices there is no snapshot that refers to any data in that pack file, concludes that it is unused, and deletes it.
  • The backup process finishes and add the snapshot to the repository, which now depends on data that is missing.

If you have been doing this a long time, a simple restic check should highlight dozens of missing repository objects, and many snapshots probably cannot be restored anymore.

1 Like

Yes, this is a bad practice and can lead to repository corruption.

Thanks for the confirmation.

If you have been doing this a long time

We just doing this recently and luckily haven’t push to production yet.

So I reckon we still need a window to do cleanup everyday, and a simple unlock is suffice. During the window, all backup should stop, and forget/prune should retry after 5 minutes or so if there are still active lock.

To keep the “cleanup window” small I would recommend you to try out the newest betas where prune speed is improved (try to play around with the --max-unused option)

Also you could think about having a second repo with same chunker parameters (e.g. a copy) which you could redirect your backups to during that “cleanup window”. After cleanup is finished, you can than copy the newly generated snapshots to your original repo. I think @gurkan came up with that idea.

Thanks for the suggestion. We will play around with it first. Although we want to stick with released version only as much as possible, but if there’s a huge gap there, might worth to try.

I think I read it somewhere in this forum, but imho its not something that easily maintained, especially if you have tons of repositories to manage.