Stale lock on s3

i0x915 · November 25, 2023, 2:27pm

Hey,

I have a bucket on wasabi with 1.1TB of data before the below upload, i ran below command and it seemed to finish successfully, however when i ran ‘check’ command afterwards i got a message about a stale lock. I have checked and the pid is definitely gone, any ideas what would have caused it to leave a stale lock ?

Locked output

using temporary cache in /tmp/restic-check-cache-2217122378
repository 73c83c13 opened (version 2, compression level auto)
created new cache in /tmp/restic-check-cache-2217122378
create exclusive lock for repository
repo already locked, waiting up to 0s for the lock
unable to create lock in backend: repository is already locked by PID 2036190 on MXX41 by root (UID 0, GID 0)
lock was created at 2023-11-25 03:13:29 (11h8m26.074161799s ago)

Original command

root@MXX41:/mnt# restic -r s3:xxxxxxxxxxxxxxx backup pub ssd
repository 73c83c13 opened (version 2, compression level auto)
no parent snapshot found, will read all files
[0:00] 100.00%  21 / 21 index files loaded

Files:       2760863 new,     0 changed,     0 unmodified
Dirs:        444328 new,     0 changed,     0 unmodified
Added to the repository: 560.354 GiB (549.279 GiB stored)

processed 2760863 files, 1.399 TiB in 3:56:47
snapshot ff6c1704 saved

rawtaz · November 28, 2023, 11:07am

I can’t answer your question in detail, but what I can say is that i often have stale locks in my repositories. However they are usually from clients backing up and e.g. closing the lid, disconnecting and going home before the backup process completes, leaving a lock behind. That’s probably not it in your case.

Is that backup run the only thing that you ran? Did you check what type of lock the stale one is (I presume it’s not an exclusive one)? Did you already remove it or can you check if it is still there (just for kicks, to see if it might be a case of Wasabi showing you a ghost file that was removed but their systems keep showing it for a short while)?

i0x915 · November 28, 2023, 5:53pm

I just ran another large 2TB backup to Cloudflare R2 this time and same thing happened, it was as you said, backup left a lock file, i am able to run more backups but cannot run check or forget operations as those complain about the lock file.

After running unlock, everything works fine again, i suspect there may be a race condition somewhere triggerring the lock to remain in place for larger repos.
I have not seen this issue with smaller repos, only for large uploads that span many hours.

Do you know if there are any debug flags i can run to narrow this down ?

i0x915 · November 29, 2023, 2:22pm

I tried doing a few more fresh 2TB uploads with debug log and some extra logging in the code and wasnt able to replicate it again.
Will keep posted.

kapitainsky · November 29, 2023, 4:06pm

Can be as simple as your Internet connection quality - if it goes down sometimes (some nightly ISP maintenance:)) then it can leave stale locks behind.

Log as much details as possible from all your restic runs - it should reveal what is failing.