Very old lock and concerns about restic unlock

Server is normally in append-only mode, when doing restic forget (on non-append-only), I got:

Fatal: unable to create lock in backend: repository is already locked by PID 19815 on backup by username (UID 1000, GID 1000)
lock was created at 2020-04-13 23:00:02 (1438h10m34.797376502s ago)

The backup system does hourly backups with cron, some of them may have been interrupted by sleep/shutdowns.

Is there any way to prevent this from happening? Running restic unlock sounds like avoiding locks (and creating a catastrophic situation), when forget is running on cron.

1 Like

I’ve struggled with restic locks myself. Seems like the lock needs to work like DHCP lease. By that I mean the lock has a lifetime and will expire if not renewed. Other info about the lock like creation time, expiration time, renewal count could be useful. Maybe the lock lifetime could be programable. IDK.

My first idea was to restic unlock --max-age 1d (if it runs hourly, after 1d something probably didn’t work well).

DHCP lease-like behaviour sounds like a reasonable idea. I have nothing else to really say, other than:
Related: #2214 (not with locks, but wait for lock to be released)

I have a similar issue here backing up my Raspberry Pi 3B+ running Raspberry Pi OS Stable to an NFS folder on my OpenIndiana server. I have a script that backs up the Pi and then prunes snapshots from the repo. The script runs daily at 0330, yet somehow I often get the same Fatal: unable to create lock in backend: repository is already locked by PID error for either the backup or prune phase.

For now I’ve decided to get around this by placing an unlock command before both the backup and prune commands:

#!/bin/sh

set -u

# Unlock repo

/usr/bin/restic -p /root/ResticMatrix -r /mnt/DellOptiPlex390MT/rpool1/Restic unlock

# Backup root filesystem while ignoring mounts

/usr/bin/restic -p /root/ResticMatrix -r /mnt/DellOptiPlex390MT/rpool1/Restic backup / --exclude-file=/home/pi/Sync/Settings/Restic/RaspberryPiOS/Excludes.txt

# Unlock repo

/usr/bin/restic -p /root/ResticMatrix -r /mnt/DellOptiPlex390MT/rpool1/Restic unlock

# Prune repo, keeping only 1 snapshot per filesystem

/usr/bin/restic -p /root/ResticMatrix -r /mnt/DellOptiPlex390MT/rpool1/Restic forget --host RaspberryPi3ModelBPlus --keep-last 7 --prune

This should allow the prune lock approximately a day to “die of natural causes” before the backup job force unlocks the repo.

Per what I’ve read on here (or on GitHub, I don’t recall), you don’t have to worry about unlocked repos harming your backups during concurrent backups, as all that means is restic dedup might be less efficient (it may miss accounting for some files already in the repo due to them changing while the backup is running).

it appears restic does not count interrupted backups as snapshots, so you also don’t have to worry about incomplete backups.

Restic only adds data when it’s backing up, it doesn’t modify or remove any. So it’s fine if there are existing locks when you start backing up, and multiple backups can run at the same time.

If you start a prune, that will create a lock that is exclusive such that other parties can’t back up to the repository (since the prune will modify/remove data). If you do that, and the connection is then severed, and the prune is cancelled, then a lock will obviously be left behind.

In this case, you may have a situation where you have to remove the lock, but you should only do so when you know that there’s no current operation going on with the repository, so you don’t e.g. interfere with a prune that’s running.

Make sure you’re using the latest version of restic, which is 0.12.0 as of writing this.

2 Likes

Ah, thanks for the info! My Pi 3B+ keeps running VERY low on RAM during prunes, so I’d been considering just running the prune operation server-side instead. Now I see that may be a very bad idea. Time to get a Pi 4B+.

As long as you only prune in one place (and the exclusive lock will prevent you from doing anything else), it should be fine to prune on the server instead.

1 Like

OK. My main concern is restic crontab jobs failing because the previous restic job previously ran out of RAM and was killed, which in turn resulted in the repo being locked for the next job. Currently as you can see above my script has an unlock line, but I can’t guarantee a prune operation isn’t running on that repo if the prune process is running on a different machine without possibly spacing out the prune and backup jobs more than I’d like. 12 hours should probably be enough though.

If you are having trouble with restic running out of memory, then you should indeed fix that first of all. Then the rest shouldn’t be an issue - if there’s a lock there already when you try to prune that’s something unexpected and should be investigated.