Locks being created and not cleared

beewoolie · March 7, 2022, 4:04am

Greetings.

I am performing backups from several hosts and, for the most part, it works as expected. One of these hosts is a laptop that sometimes goes offline while a backup is in progress, so I believe a lock is being created and not cleared as it normally would. I know I can use unlock to clear this lock. But, I’d like to automate the process of removing a stale lock.

It might be fine to just have the lock clear automatically when the host that left the stale lock reruns the backup, but this doesn’t appear to happen. Instead, even that host fails to backup when the lock isn’t cleared.

It would most helpful if I could

list locks in the repository with metadata about the host, time, and directories being backed-up
have a guarantee from restic that locks are ‘touched’ every X minutes during a procedure that depends on them so that after, say, X*2 minutes we can be confident that the lock is no longer valid

Is this possible with restic as it is? I haven’t see a command to do this. And, since I would need to decrypt the repo to see the contents of the locks, I think I need a restic command to support this function.

Thoughts?

MichaelEischer · March 7, 2022, 9:33pm

unlock only removes locks it considers stale (not changed for half an hour). restic will update the lock file every 5 minutes, such that the lock of a running restic process won’t be deleted by unlock (assuming the host(s) have somewhat accurate clocks). For this to work properly you must use restic version >= 0.10.0.

Depending on the storage backend there may be some further corner-cases (see Strict repository lock handling by MichaelEischer · Pull Request #3569 · restic/restic · GitHub for more details).

This is somewhat curious. backup cannot create locks that prevent other backup runs from working.

beewoolie · March 7, 2022, 10:46pm

OK. This is a fine start. I can simply unlock before the nightly prune and it can be considered safe.

As for why backup is adding a lock, here is the command (edited for IDs):

restic backup -r sftp://restic-sftp@HOST:PORT/repo --password-file DIR/key --exclude-file /var/folders/jn/dlfttsps4v524bcgrs1p4r940000gn/T/__reback.0iBU5vXE --exclude-caches --one-file-system --host ID /Users/USER

Perhaps it would help if there was a way to browse locks and show some meta-data for each one. I know that the lock in question was created on the computer running this backup. I will check again to see if there is another cron job being started that could cause problems.

MichaelEischer · March 10, 2022, 7:58pm

You could run restic list locks --no-lock to get a list of currently existing locks and then run restic cat lock <ID> (ID is one of the lines listed previously) to show the raw information of the lock file.

beewoolie · March 13, 2022, 4:33pm

The lock that is being generated is a non-exclusive one. The meta-data doesn’t include the /reason/ for the lock, though I suspect it comes from the backup. This time, it isn’t blocking a backup when it runs again. I’ll keep watching to see if I can reproduce the previous failure.

beewoolie · March 13, 2022, 4:38pm

I missed an opportunity here. The host that made the lock was able to backup even though there was a non-exclusive lock in the repo. However, a different host was denied access for its ‘forget’ operation. I’ll wait for this situation to occur again to make sure that it what it happening.

beewoolie · March 14, 2022, 3:49am

I found which command is failing:

restic forget -r sftp://USER@HOST:PORT/repo --password-file PATH --keep-last 8 --keep-hourly 8 --keep-daily 7 --keep-weekly 8 --keep-monthly 6 --keep-yearly 5 --host LOCALHOST
repository 03f32f9f opened successfully, password is correct
unable to create lock in backend: repository is already locked by PID 34633 on AnotherHost by USER (UID 501, GID 0)
lock was created at 2022-03-13 18:45:03 (2h0m25.713684402s ago)
storage ID 796d5d65
the `unlock` command can be used to remove stale locks

So, it isn’t the backup, but the forget command which is being blocked.

Is this expected behavior?

It isn’t really a problem, though, because I can now make sure to unlock before the forget is executed.

moritzdietz · March 14, 2022, 8:19am

Backing up uses non exclusive locks so that you can run several backup commands against the same repository. The forget command on the other hand, potentially removing data, is creating an exclusive lock meaning only it and no other processes can alter the repo at this time - adding or removing data.