Detecting stale locks

I am open to accepting your arguments but find gaps in your approach and programming logic. You claim, for example, that nothing you’re doing here ensures that the jobs will have a unique pid.

How did you arrive at this conclusion? Did you take a look at the code above and made your decision then and there? Or, did you experiment with the code under different scenarios? For example, once with restic command in the foreground and once in the background?

This is based on what you said earlier:

Based on this script and your own words, two restic processes are not running at the same time. The body of this loop starts restic in the background and then immediately waits for it to finish before proceeding, which is equivalent to simply not sending restic to the background in the first place.

If two restic commands are not running at the same time, then nothing prevents a subsequent restic invocation from reusing the same pid (though, as I explained, this is highly unlikely given how Linux assigns pids).

The claim I’m making is simple: command & wait $! is exactly the same as command including how pids are assigned. If there is something else in this script that does allow the restic commands to run in parallel then that would be a mechanism by which you can guarantee that their pids are not the same. However, you’ve already indicated that you don’t want two restic commands running in parallel. In that case, there is nothing you can do to prevent pid reuse short of forking and not reaping the child. However, the wait command will reap the child, freeing the pid.

This is based on analysis of the code you’ve posted with probably close to 20 years of experience writing shell scripts on Linux.

I’m open to any evidence you have to the contrary, preferably with example scripts that I can run myself.

I was curious about the

command &
wait $!

stuff, and looking at restic scripts I saw the same here with a small note that explain why

It’s related to trap execution, it’s easy to reproduce running a script with a trap and trying to kill it.

In the script linked above, the trap kills (restic) background processes and then call restic unlock in order to remove stale locks.

Just tried some lock and unlocks here and there, these are my conclusion:

1- When a ‘restic backup’ is running it creates a lock
2- If you try do ‘restic unlock’ the last ‘restic backup’, restic won’t delete the lock, cause it is still running
3- If you try to start a new ‘restic backup’, restic will create a second backup instance and a second lock is created
4- If you ‘kill -9’ the first ‘restic backup’, the process will be suddenly killed (simulation a power loss) and the lock will keep in the repository
5- Running a new ‘restic backup’, restic won’t unlock the previous, it will create a new lock file
6- If you ‘restic unlock’, restic will remove the lock that was stalled

Based on these, some obvious, assumptions, it’s possible to create a good shell script to manage restic lock files.

If you’re thinking of automatically unlocking (to remove stale locks) before backup/maintenance runs, I believe this is an unsupported scenario, so it may be best to avoid doing so.

What are the alternatives? Could you share your thoughts? @ProactiveServices

Most operations will continue despite locks, as they do not remove/rewrite data. For forget/prune I now check for the existence of locks (restic --no-lock list locks) and if any exist, abort and flag it for manual checks.

That implies forget/prune never happen automatically. That’s … not really an alternative.

Check for existence of locks -> If locks, wait and retry, or abort and report. If no locks, begin check/forget/prune.

I got it @ProactiveServices . My post wasn’t saying anything different, I was just writing some facts (some obvious and redundants) for people that wants to write a script for automation.

1 Like