We’re developing a small tool ENACrestic that is a simple qt app that does the automation of running a restic backup every x minutes and a restic forget every y backups. It runs on Ubuntu.
A concern we have is that when everything is automated, behind the scene, it happens that the user will just shutdown it’s computer (or suspend or …) when a backup or a forget is running … leaving a lock behind.
We’re thinking about scenarios to solve it transparently …
One scenario would be to restic unlock whenever we fall into this issue and try again.
unlock removes locks that are older than 30 minutes or if created by the local host, if the corresponding restic process no longer runs. Thus in most cases calling unlock should do everything you need. Just make sure that you use restic >= 0.10.0. For older restic versions, the timestamp in the lock file is not properly refreshed.
unlock --remove-all is somewhat dangerous, unless you can 100% guarantee that no other restic processes are accessing that repository.
You talk about timestamp refreshed.
Does that mean that for restic >= 0.10.0 , the timestamp in the lock represent the time when last restic internal operation has been done ?
So when timestamp is older than 30 minutes, that means more than 30 minutes of inactivity by the process that created the timestamp ?
e.g. a restic forget that runs actively for 52 minutes, the timestamp would always be some seconds old ?
As long as restic runs, it refreshes the lock file every five minutes. If a restic process fails, then the latest lock is left behind. When a lock is older than 30 minutes, then either the corresponding process has failed or the host’s clock is wrong.
The lock is absolutely essential to prevent prune and backup from running at the same time. If both command are active at the same time, this will most likely damage the snapshot created by backup and also affect further snapshots. Thus the design choice was to play it safe and just bail if there is any other lock left.
In addition, until a few days ago there were some situations in which one could end up with a lock file older than 30 minutes and a still running restic process (see Strict repository lock handling by MichaelEischer · Pull Request #3569 · restic/restic · GitHub, actually this is just a 99% fix, I’m not sure whether 100% are even possible). (restic < 0.10.0 also had a refresh bug). Now the only remaining problem is wrong clocks. That is something against which we can do little in restic (except maybe checking timestamps in the backend???).
Unfortunately, it is far too easy to end up with clocks that are off by a few hours. Ideally, restic doesn’t break a repository in that case. I plan to eventually add some sort of automatic unlock, but probably with a much longer timeout than 30 minutes.
Dual-booting operating systems is an easy way to run into this since Windows defaults to storing BIOS time in local timezone, Linux defaults to storing UTC in the BIOS, but less technical users often change the time to be “correct” without even noticing the timezone is wrong.
You can still have the time off by any unknown value easily enough.
It might be worth considering using Google’s Roughtime protocol to grab the “real” time and toss it into the lock file, and then use roughtime for deciding whether automatic unlock is an option? You could still fail to the local time if roughtime was not available so that there is no network dependency just to get restic to run.
It depends on how frequently this actually comes up, and if there is a common trigger.