Detecting stale locks

Go to your repo and look in the “locks” folder. Anything in there is a lock. :slight_smile:

There isn’t a restic locks command or anything… but ls /path/to/repo/locks will basically give you that.

Use restic list locks - thanks @cdhowie lol

1 Like

For a bash script you can use an if statement to check on the repository directory if there is any lock. For example:

if [[ $(ls -A /path/to/repository/locks) ]] ; then
  restic unlock
fi

If the directory is not empty then it will execute unlock. For other backends as sftp, for example, the command will change. For sftp you will need to replace the command for ssh user@ip_addr ls -A /path/to/repository/locks.

1 Like

For a Windows batch file, in case someone else finds this thread and wants one (works with an SMB share):

:START
IF EXIST \\path\to\repository\locks\* (
	ECHO Unlocking database, please wait...
	ECHO.
	restic unlock
	TIMEOUT 300
	GOTO START
	ECHO.
)
EXIT

This one will loop every 5 minutes and quit once nothing exists in the locks folder.

1 Like

You mean like restic list locks? :wink:

1 Like

Yeah, but when I use this command it always displays a lock. For example:

# Using ls in the locks directory
 ~$ ls ~/Downloads/restic-test/locks
 # Output nothing, but using list locks...
 ~$ restic -r ~/Downloads/restic-test list locks
7cb54323938375130092e0a705b950a1e85998d96a5c6138da260883c8d116b4

After that I have ls again and the locks directory is empty. I think when you use list locks restic creates a lock and therefore, will have a “lock” to list. Is restic supposed to create a lock for list command? This is only a testing repository, using restic 0.9.5 and there is no other machine backing up in this repository.

Whoops! I stand corrected lol. I actually went looking for a restic show locks and list never crossed my mind. That’s what I get for just skimming the list of commands haha

And actually I’m showing something similar to what @sulfuror just mentioned. Out of morbid curiosity, I started a test backup. I then ran restic unlock while it was running. Everything in /repo/locks was cleared out. Everything. Stale locks, and also a lock that had a modified date within ~4m of the current time (my test backup). It was at this point I saw your reply, ran restic list locks and it actually listed a lock - even though /repo/locks was empty. I waited a few minutes, and something popped in to the locks folder. I ran ‘restic list locks’ again and there were TWO locks still listed.

EDIT: Moved the “restic removed an active lock” bit to my own thread.

Is this normal behavior or a bug? Does restic list locks use a lock of it’s own or something? :thinking:

1 Like

I have just remembered the --no-lock option and when I used restic list locks --no-lock it doesn’t display anything. But still, it is kind of weird that list creates a lock (IMHO) and I was wondering the same thing, if it was a bug, but with the --no-lock option it makes more sense.

1 Like

Ahhhh okay, it uses a lock of it’s own. Makes sense. Kind of lol

1 Like

That seems to be the case. This is how I tested.

In one terminal I ran watch -n .1 ls -lh.
In a second terminal sleep 10; restic list locks.
I went back to the first terminal and saw the locks directory for a brief moment when restic command executed.

Thanks to you guys, I now feel a bit wiser!

1 Like

Looking for a way to avoid the stale-lock problem that occurs when a backup fails in the middle and then weeks or months go by without any backups.

Found this thread, and it just trailed off. What’s the solution? Scanning the above, I don’t think the basic use-case was addressed, which is that a neglected lock prevents backups from happening and a user has to go look for the problem (not by using restic list locks but by restic snapshots and noticing there’s nothing recently saved).

The locks left behind by a failed backup run, won’t block other backups from working. That only happens for locks created by check / prune.

which is that a neglected lock prevents backups from happening and a user has to go look for the problem

You really should check the exit code of the backup command (exit code 1 == no backup was created at all) if you care about your backup. There’s not much that can be done about that as lots of possible errors just can’t be resolved automatically by restic.

I use a bash script to run backups on a daily basis. My script is a collection of functions which initially verifies the repository is accessible and then ensures each restic command finishes successfully.

To find out if the repository is accessible, the script runs

restic list keys

In case the repository is already locked, the script logs a message, sends me an alert and terminates.

Here is a part of the backup function, modified for the sake of readability

restic backup <single_source> &
wait $!
if [ "$?" -eq 0 ]
then
    logger "Info: Backed up <source>"
else
    logger "Warning: Unable to backup <source>"
    echo "Warning: Unable to backup <source>" | mail -s "restic backup
    issue" <recipient>
fi

What’s the purpose of doing this:

restic backup <single_source> &
wait $!

Versus doing this?

restic backup <single_source>

Is there more going on in the original script that makes this beneficial for some reason?

I should have explained more!

The script backs up multiple sources to a single repository. The sources are saved in an bash array.

for source in ${source_list[@]}
do
    restic backup $source &
    wait $!
    if [ "$?" -eq 0 ]
    then
        logger "Info: Backed up <source>"
    else
       logger "Warning: Unable to backup <source>"
       echo "Warning: Unable to backup <source>" | mail -s "restic backup
       issue" <recipient>
    fi
done

Without the wait command, the script initiates the first restic process to back up the first source and then immediately initiates a second process to back up the second source. There are two issues here.

The second process is bound to fail because the first process is most likely still going i.e. the repository is locked.

Also, you will have no idea whether the first process failed or successfully finished.

With the wait command, the for loop pauses so that a single restic process can run its course and return an exit code back to the script.

Yes, but you can accomplish the same thing by omitting the trailing &. This directive explicitly puts the job in the background, and then you wait for it. This is redundant. Just don’t put the job in the background.

This pattern:

command &
wait $!

Seems like a longer way to say:

command

Starting backup jobs in the background ensures that each one has a unique PID. The reason for doing this is left as an exercise for the reader!

That is only guaranteed to be true if the jobs are actually running at the same time, which you’ve ensured won’t happen with the wait call. No, nothing you’re doing here ensures that the jobs will have a unique pid. command & wait $! is for all intents and purposes identical to command even when it comes to pid assignment.

However, Linux assigns pids sequentially. So unless somewhere around PID_MAX_DEFAULT - RUNNING_PROCESS_COUNT processes get started between jobs, they will have different pids.

I am open to accepting your arguments but find gaps in your approach and programming logic. You claim, for example, that nothing you’re doing here ensures that the jobs will have a unique pid.

How did you arrive at this conclusion? Did you take a look at the code above and made your decision then and there? Or, did you experiment with the code under different scenarios? For example, once with restic command in the foreground and once in the background?

This is based on what you said earlier:

Based on this script and your own words, two restic processes are not running at the same time. The body of this loop starts restic in the background and then immediately waits for it to finish before proceeding, which is equivalent to simply not sending restic to the background in the first place.

If two restic commands are not running at the same time, then nothing prevents a subsequent restic invocation from reusing the same pid (though, as I explained, this is highly unlikely given how Linux assigns pids).

The claim I’m making is simple: command & wait $! is exactly the same as command including how pids are assigned. If there is something else in this script that does allow the restic commands to run in parallel then that would be a mechanism by which you can guarantee that their pids are not the same. However, you’ve already indicated that you don’t want two restic commands running in parallel. In that case, there is nothing you can do to prevent pid reuse short of forking and not reaping the child. However, the wait command will reap the child, freeing the pid.

This is based on analysis of the code you’ve posted with probably close to 20 years of experience writing shell scripts on Linux.

I’m open to any evidence you have to the contrary, preferably with example scripts that I can run myself.

I was curious about the

command &
wait $!

stuff, and looking at restic scripts I saw the same here with a small note that explain why

It’s related to trap execution, it’s easy to reproduce running a script with a trap and trying to kill it.

In the script linked above, the trap kills (restic) background processes and then call restic unlock in order to remove stale locks.