Repository is already locked


#1

We are trying to backup the our DB around 1TB it is throwing below error.

{"changed": true, 
"cmd": "/opt/bin/perform-backup.sh", "delta": "0:34:35.872370", "end": "2018-12-29 23:34:48.515967",
"msg": "non-zero return code", "rc": 1,
"start": "2018-12-29 23:00:12.643597",
"stderr": "Fatal: unable to create lock in backend: repository is already locked by PID 14956 
on postgre-barman-host by restic (UID 51011, GID 51011)
lock was created at 2018-12-26 23:00:14 (72h34m33.577288791s ago)
storage ID 0c8ab355",

"stderr_lines": ["Fatal: unable to create lock in backend: repository is already locked by PID 14956 on 
postgre-barman-host by restic (UID 51011, GID 51011)",
"lock was created at 2018-12-26 23:00:14 (72h34m33.577288791s ago)", "storage ID 0c8ab355"], "stdout": "
Files:       17678 new,     5 changed, 18197 unmodified
Dirs:            0 new,     1 changed,     0 unmodified
Added to the repo: 76.334 GiB
processed 35880 files, 708.865 GiB in 34:29
snapshot ccc2c8bf saved
Backup is successfull
using temporary cache in /tmp/restic-check-cache-280462639
create exclusive lock for repository
Repository check has failed",
"stdout_lines": ["", "Files:       17678 new,     5 changed, 18197 unmodified",
"Dirs:            0 new,     1 changed,     0 unmodified", "Added to the repo: 76.334 GiB", "",
"processed 35880 files, 708.865 GiB in 34:29", "snapshot ccc2c8bf saved", "
Backup is successfull", "using temporary cache in /tmp/restic-check-cache-280462639",
"create exclusive lock for repository", "
Repository check has failed"]}

Unable to create lock in backend
#2

@Bharathkumarraju It looks like the backup was successful, but the script also tries to perform some other maintenance (check or prune) and that is what is failing. Have you tried removing the lock?

It looks like a restic process from the 26th of the month exited unexpectedly. You would have to dig into your logs to see why that happened. A common cause is the kernel’s OOM killer killing restic because the machine is out of memory.


#3

@cdhowie Appreciate the quick reply.
I am using AWS s3 as my restic repo, I am backing almost all my services(PROD) all are passing except this postgres(~1TB) let me have a quick look at restic logs and update you.


#4

Actually I don’t have any logs just running below script using restic binary, I have unlocked the repo and running the script now, is there anyway can i check for the progress?

#! /bin/bash

backup_bin_dir="/opt/bin"
backup_dir="/opt/barman/"

# Obtain Temporary credentials via aws sts assume-role
aws_credentials=$(sudo -u restic aws sts assume-role --role-arn arn:aws:iam::2244458972345:role/restic-backup --role-session-name run-backup )
export AWS_ACCESS_KEY=$(echo ${aws_credentials} | jq -c -r .Credentials.AccessKeyId )
export AWS_SECRET_ACCESS_KEY=$(echo ${aws_credentials} | jq -c -r .Credentials.SecretAccessKey )
export AWS_SESSION_TOKEN=$(echo ${aws_credentials} | jq -c -r .Credentials.SessionToken )
# source restic env config
RESTIC_ENV=$(sudo -u restic cat /home/restic/.restic-env)
$RESTIC_ENV

# function to call restic as sudo while preserving env
run_restic() { sudo -E -u restic -- /usr/bin/restic $* ;}

restic_status=$(run_restic snapshots)
if [[ $(echo $?) == 0 ]]
then
  run_restic backup ${backup_dir} --tag ${today}
  if [[ $(echo $?) == 0 ]]
  then
        echo "\nBackup is successfull"
  else
        echo "\nBackup has failed"
        exit 1
  fi
  run_restic check
  if [[ $(echo $?) == 0 ]]
  then
        echo "\nRepository check is successfull"
  else
        echo "\nRepository check has failed"
        exit 1
  fi

else
  run_restic init
  run_restic backup ${backup_dir}  --tag ${today}
  if [[ $(echo $?) == 0 ]]
  then
        echo ""
        echo "\nBackup is successfull"
  else
        echo "\nBackup has failed"
        exit 1
  fi
  sudo -u restic bash -c "source /home/restic/.restic-env ; /usr/bin/restic check"
  if [[ $(echo $?) == 0 ]]
  then
        echo ""
        echo "\nRepository check is successfull"
  else
        echo "\nRepository check has failed"
        exit 1
  fi
fi

#5

Ahh I got it now. thank you looks like my restic check was failing if I am not wrong.

@cdhowie
Now my job got successful but i got different message now. Looks strange :frowning:

{"changed": true, "cmd": "/opt/bin/perform-backup.sh", "delta": "0:05:24.158933",
"end": "2018-12-31 09:07:00.483026", "rc": 0, "start": "2018-12-31 09:01:36.324093",
"stderr": "error: Open: open /opt/barman: permission denied", "stderr_lines": ["error: Open: open /opt/barman: permission denied"],
"stdout": "
Files:           0 new,     0 changed,     0 unmodified
Dirs:            0 new,     0 changed,     1 unmodified
Added to the repo: 0 B 
processed 0 files, 0 B in 0:09
snapshot f68983f2 saved
Backup is successfull
using temporary cache in /tmp/restic-check-cache-123078597
created new cache in /tmp/restic-check-cache-123078597
create exclusive lock for repository
load indexes
check all packs
pack 2d641c28: not referenced in any index
pack baccaef4: not referenced in any index
pack 4078accd: not referenced in any index
pack 4541dd12: not referenced in any index
pack 81aac085: not referenced in any index
pack 0608fd9c: not referenced in any index
pack 8388f2c8: not referenced in any index
pack 4e516d21: not referenced in any index
pack c4ff2538: not referenced in any index
pack d25443ca: not referenced in any index
pack d31751fc: not referenced in any index
4025 additional files were found in the repo, which likely contain duplicate data.
You can run `restic prune` to correct this.
check snapshots, trees and blobs
no errors were found
Repository check is successfull",
"stdout_lines": ["", "Files: 0 new,     0 changed,     0 unmodified", "Dirs:  0 new,     0 changed,     1 unmodified",
"Added to the repo: 0 B  ", "", "processed 0 files, 0 B in 0:09", "snapshot f68983f2 saved", "
Backup is successfull", "using temporary cache in /tmp/restic-check-cache-123078597",
"created new cache in /tmp/restic-check-cache-123078597",
"create exclusive lock for repository", "load indexes",
"check all packs", "pack 2d641c28: not referenced in any index",
"pack baccaef4: not referenced in any index",
"pack 4078accd: not referenced in any index",
"pack 4541dd12: not referenced in any index",
"pack 81aac085: not referenced in any index",
"pack 547b302d: not referenced in any index",
"pack d31751fc: not referenced in any index",
"4025 additional files were found in the repo, which likely contain duplicate data.",
"You can run `restic prune` to correct this.",
"check snapshots, trees and blobs", "no errors were found", "
Repository check is successfull"]}

#6

What looks strange about it? Did you read the full output?


#7

yeah seen permission issue, but restic user has capability as root let me give permission and check.

Thank you @cdhowie i have fixed it by running, setcap cap_dac_read_search=+ep /usr/bin/restic


#8

Glad you got it working.

Note that the “pack not referenced any index” errors are caused by an interrupted backup. This doesn’t have a harmful effect on the repository other than that some unneeded data is stored. A restic prune will discard this unneeded data.


#9

@cdhowie ah thank you so much. But now i am automated it in my CI server the build jobs every mid night and our database getting backed up in restic, I am getting below error.

Fatal: unable to create lock in backend: repository is already locked exclusively by PID 8828 on ip-172-31-254-13.eu-central-1.compute.internal by centos (UID 1000, GID 1000)

I checked in my aws environment, I don’t have any server like above, The full error as below. And interestingly when i run restic unlock and run the job manually from my CI server UI it is working fine :frowning: looks odd to me :frowning:

task path: /opt/bamboo_home/xml-data/build-dir/BTA-BDBBARMANPROD-JOB1/playbooks/perform-backup.yml:11
02-Jan-2019 23:07:57	fatal: [172.20.32.103]: FAILED! => {"changed": true, "cmd": "/opt/bin/perform-backup.sh", "delta": "0:00:05.798650", "end": "2019-01-02 23:07:57.289420", 
"msg": "non-zero return code", "rc": 1, "start": "2019-01-02 23:07:51.490770", 
"stderr": "Fatal: unable to create lock in backend: repository is already locked exclusively by PID 8828 on ip-172-31-254-13.eu-central-1.compute.internal by centos (UID 1000, GID 1000)\n
lock was created at 2019-01-02 12:04:29 (10h3m25.269015084s ago)\nstorage ID adbd8a4a\nFatal: create key in repository at s3:s3.amazonaws.com/rbmh-mit-backup-ffm-prod-ss-db-barman failed: repository master key and config already initialized\n\n
Fatal: unable to create lock in backend: repository is already locked exclusively by PID 8828 on ip-172-31-254-13.eu-central-1.compute.internal by centos (UID 1000, GID 1000)\n
lock was created at 2019-01-02 12:04:29 (10h3m27.751047126s ago)\nstorage ID adbd8a4a", 
"stderr_lines": ["Fatal: unable to create lock in backend: repository is already locked exclusively by PID 8828 on ip-172-31-254-13.eu-central-1.compute.internal by centos (UID 1000, GID 1000)", 
"lock was created at 2019-01-02 12:04:29 (10h3m25.269015084s ago)", "storage ID adbd8a4a", 
"Fatal: create key in repository at s3:s3.amazonaws.com/rbmh-mit-backup-ffm-prod-ss-db-barman failed: repository master key and config already initialized", "", 
"Fatal: unable to create lock in backend: repository is already locked exclusively by PID 8828 on ip-172-31-254-13.eu-central-1.compute.internal by centos (UID 1000, GID 1000)", 
"lock was created at 2019-01-02 12:04:29 (10h3m27.751047126s ago)", "storage ID adbd8a4a"], 
"stdout": "\\nBackup has failed",
"stdout_lines": ["\\nBackup has failed"]}`

#10

Note the IP 172.31.254.13 is a private IP, not a public IP. Make sure that’s what you’re looking for in the EC2 console. (It may also be temporary server that has since terminated, such as a spot instance.)

This error usually happens when a restic process is terminated for some reason. The EC2 unexpectedly terminating could certainly cause that, though there’s not enough information to say that that’s what happened here.


#11

@cdhowie,
Yes , I have checked with private IP as well. More importantly it is saying lock was created at 15 hours before, But in real nobody created any server in AWS before 10hours.

[admin@prod-db-barman001 ~]$ sudo -E -u restic -- /usr/bin/restic list locks
repository 3c742af6 opened successfully, password is correct
Fatal: unable to create lock in backend: repository is already locked exclusively by PID 8828 on ip-172-31-254-13.eu-central-1.compute.internal by centos (UID 1000, GID 1000)
lock was created at 2019-01-02 12:04:29 (15h18m35.960870395s ago)
storage ID adbd8a4a
[admin@prod-db-barman001 ~]$

is there anyway can i check using restic command? who locked and why?


#12

I’m pretty sure that all of the information about the lock is shown in the output you’ve pasted, so no, I don’t believe there is any other command you can use to get more information.

There must be some EC2 with access that you don’t know about, or the private IP address changed, or something like that…


#13

My script is as below, Do i need to include unlock command before restic snapshots and restic backup? One more quick info do restic snapshots command creates exclusive locks?

#! /bin/bash

backup_bin_dir="/opt/bin"
backup_dir="/opt/barman/"

# Obtain Temporary credentials via aws sts assume-role
aws_credentials=$(sudo -u restic aws sts assume-role --role-arn arn:aws:iam::2244458972345:role/restic-backup --role-session-name run-backup )
export AWS_ACCESS_KEY=$(echo ${aws_credentials} | jq -c -r .Credentials.AccessKeyId )
export AWS_SECRET_ACCESS_KEY=$(echo ${aws_credentials} | jq -c -r .Credentials.SecretAccessKey )
export AWS_SESSION_TOKEN=$(echo ${aws_credentials} | jq -c -r .Credentials.SessionToken )
# source restic env config
RESTIC_ENV=$(sudo -u restic cat /home/restic/.restic-env)
$RESTIC_ENV

# function to call restic as sudo while preserving env
run_restic() { sudo -E -u restic -- /usr/bin/restic $* ;}

restic_status=$(run_restic snapshots)
if [[ $(echo $?) == 0 ]]
then
  run_restic backup ${backup_dir} --tag ${today}
  if [[ $(echo $?) == 0 ]]
  then
        echo "\nBackup is successfull"
  else
        echo "\nBackup has failed"
        exit 1
  fi
  run_restic check
  if [[ $(echo $?) == 0 ]]
  then
        echo "\nRepository check is successfull"
  else
        echo "\nRepository check has failed"
        exit 1
  fi

else
  run_restic init
  run_restic backup ${backup_dir}  --tag ${today}
  if [[ $(echo $?) == 0 ]]
  then
        echo ""
        echo "\nBackup is successfull"
  else
        echo "\nBackup has failed"
        exit 1
  fi
  sudo -u restic bash -c "source /home/restic/.restic-env ; /usr/bin/restic check"
  if [[ $(echo $?) == 0 ]]
  then
        echo ""
        echo "\nRepository check is successfull"
  else
        echo "\nRepository check has failed"
        exit 1
  fi
fi

#14

If you have to as a matter of course, then something is broken. A dangling lock means a restic process was interrupted while it was in the middle of doing something. You need to find that problem and solve it. restic unlock is only a workaround.

No. Most commands take out a shared lock. An exclusive lock is only required for commands that require no other processes access the repository, such as prune, check, and rebuild-index.

@fd0 Idea: Have locks contain the command-line arguments to restic. This would be very helpful when trying to figure out which restic process is leaving locks behind.


#15

Hmm, the timezone may be wrong (or at least shown wrong), can you try dumping the raw lock?

First, find the long ID for the lock: restic list locks | grep adbd8a4a

Then print it to stdout: restic cat lock <ID>

Let’s see what’s in there exactly.


#16

yes, thanks for your hints.

[admin@prod-db-barman001 ~]$ sudo -E -u restic -- /usr/bin/restic snapshots
repository 3c742af6 opened successfully, password is correct
Fatal: unable to create lock in backend: repository is already locked exclusively by PID 8828 on ip-172-31-254-13.eu-central-1.compute.internal by centos (UID 1000, GID 1000)
lock was created at 2019-01-02 12:04:29 (19h27m16.011422881s ago)
storage ID adbd8a4a
[admin@prod-db-barman001 ~]$ sudo -E -u restic -- /usr/bin/restic list locks | grep adbd8a4a
Fatal: unable to create lock in backend: repository is already locked exclusively by PID 8828 on ip-172-31-254-13.eu-central-1.compute.internal by centos (UID 1000, GID 1000)
lock was created at 2019-01-02 12:04:29 (19h27m50.486136278s ago)
storage ID adbd8a4a
[admin@prod-db-barman001 ~]$


[admin@prod-db-barman001 ~]$ sudo -E -u restic -- /usr/bin/restic cat lock adbd8a4a
repository 3c742af6 opened successfully, password is correct
Fatal: unable to create lock in backend: repository is already locked exclusively by PID 8828 on ip-172-31-254-13.eu-central-1.compute.internal by centos (UID 1000, GID 1000)
lock was created at 2019-01-02 12:04:29 (19h29m29.812616461s ago)
storage ID adbd8a4a
[admin@prod-db-barman001 ~]$

#17

Since I am unable to do any operation without unlock. first i did unlock and then tried cat the lock previously created

[admin@prod-db-barman001 ~]$ sudo -E -u restic -- /usr/bin/restic unlock
repository 3c742af6 opened successfully, password is correct
successfully removed locks
[admin@prod-db-barman001 ~]$ sudo -E -u restic -- /usr/bin/restic list locks
repository 3c742af6 opened successfully, password is correct
949754ddaa03d59cd6eecac89fb6853bdff121c9cebb66b592aa195af7c7ad5d
[admin@prod-db-barman001 ~]$ sudo -E -u restic -- /usr/bin/restic cat lock 949754ddaa03d59cd6eecac89fb6853bdff121c9cebb66b592aa195af7c7ad5d
repository 3c742af6 opened successfully, password is correct
Load(<lock/949754ddaa>, 0, 0) returned error, retrying after 717.76837ms: The specified key does not exist.
Load(<lock/949754ddaa>, 0, 0) returned error, retrying after 378.350063ms: The specified key does not exist.
Load(<lock/949754ddaa>, 0, 0) returned error, retrying after 1.666120869s: The specified key does not exist.
Load(<lock/949754ddaa>, 0, 0) returned error, retrying after 2.272522944s: The specified key does not exist.
Load(<lock/949754ddaa>, 0, 0) returned error, retrying after 2.584166063s: The specified key does not exist.
Load(<lock/949754ddaa>, 0, 0) returned error, retrying after 5.173128938s: The specified key does not exist.
Load(<lock/949754ddaa>, 0, 0) returned error, retrying after 5.655236044s: The specified key does not exist.
Load(<lock/949754ddaa>, 0, 0) returned error, retrying after 9.052271848s: The specified key does not exist.


[admin@prod-db-barman001 ~]$ sudo -E -u restic -- /usr/bin/restic cat lock 949754ddaa03d59cd6eecac89fb6853bdff121c9cebb66b592aa195af7c7ad5d
repository 3c742af6 opened successfully, password is correct
Load(<lock/949754ddaa>, 0, 0) returned error, retrying after 717.76837ms: The specified key does not exist.
Load(<lock/949754ddaa>, 0, 0) returned error, retrying after 378.350063ms: The specified key does not exist.
Load(<lock/949754ddaa>, 0, 0) returned error, retrying after 1.666120869s: The specified key does not exist.
Load(<lock/949754ddaa>, 0, 0) returned error, retrying after 2.272522944s: The specified key does not exist.
Load(<lock/949754ddaa>, 0, 0) returned error, retrying after 2.584166063s: The specified key does not exist.
Load(<lock/949754ddaa>, 0, 0) returned error, retrying after 5.173128938s: The specified key does not exist.
Load(<lock/949754ddaa>, 0, 0) returned error, retrying after 5.655236044s: The specified key does not exist.
Load(<lock/949754ddaa>, 0, 0) returned error, retrying after 9.052271848s: The specified key does not exist.
Load(<lock/949754ddaa>, 0, 0) returned error, retrying after 15.709834665s: The specified key does not exist.
Load(<lock/949754ddaa>, 0, 0) returned error, retrying after 20.539216426s: The specified key does not exist.
The specified key does not exist.
[admin@prod-db-barman001 ~]$

#18

Next time you see this problem, use --no-lock with restic list locks and restic cat lock to look at the lock.


#19

from that day onwards backup got successfull all days…crazy :wink:


#20

Now i see it got failed again. @cdhowie @fd0 --no-lock works only with list locks but not with cat lock as shown below.

fatal: [172.20.32.103]: FAILED! => {"changed": true, "cmd": "/opt/bin/perform-backup.sh", 
"delta": "0:00:06.635726", "end": "2019-01-07 23:00:19.653406", "msg": "non-zero return code", "rc": 1, 
"start": "2019-01-07 23:00:13.017680", 
"stderr": "Fatal: unable to create lock in backend: repository is already locked exclusively by PID 19309 on ip-172-31-254-13.eu-central-1.compute.internal by centos (UID 1000, GID 1000)\n
lock was created at 2019-01-07 12:03:44 (9h56m31.993275421s ago)\n
storage ID eeb53d35\n
Fatal: create key in repository at s3:s3.amazonaws.com/rbmh-mit-backup-ffm-prod-ss-db-barman failed: repository master key and config already initialized\n\n
Fatal: unable to create lock in backend: repository is already locked exclusively by PID 19309 on ip-172-31-254-13.eu-central-1.compute.internal by centos (UID 1000, GID 1000)\n
lock was created at 2019-01-07 12:03:44 (9h56m34.703089764s ago)\n
storage ID eeb53d35", 
"stderr_lines": ["Fatal: unable to create lock in backend: repository is already locked exclusively by PID 19309 on ip-172-31-254-13.eu-central-1.compute.internal by centos (UID 1000, GID 1000)",
"lock was created at 2019-01-07 12:03:44 (9h56m31.993275421s ago)",
"storage ID eeb53d35",
"Fatal: create key in repository at s3:s3.amazonaws.com/rbmh-mit-backup-ffm-prod-ss-db-barman failed: repository master key and config already initialized", "", 
"Fatal: unable to create lock in backend: repository is already locked exclusively by PID 19309 on ip-172-31-254-13.eu-central-1.compute.internal by centos (UID 1000, GID 1000)", 
"lock was created at 2019-01-07 12:03:44 (9h56m34.703089764s ago)", 
"storage ID eeb53d35"], "stdout": "\\n
Backup has failed", 
"stdout_lines": ["\\nBackup has failed"]}

[admin@prod-db-barman001 ~]$ sudo -E -u restic -- /usr/bin/restic list locks --no-lock
repository 3c742af6 opened successfully, password is correct
eeb53d35a1530ddbec1705d9c0b75eb12ab329f9cb11c46fd9f34008a1245ae3


[admin@prod-db-barman001 ~]$ sudo -E -u restic -- /usr/bin/restic cat lock eeb53d35a1530ddbec1705d9c0b75eb12ab329f9cb11c46fd9f34008a1245ae3 --no-lock
repository 3c742af6 opened successfully, password is correct
Fatal: unable to create lock in backend: repository is already locked exclusively by PID 19309 on ip-172-31-254-13.eu-central-1.compute.internal by centos (UID 1000, GID 1000)
lock was created at 2019-01-07 12:03:44 (16h9m51.202306499s ago)
storage ID eeb53d35
[admin@prod-db-barman001 ~]$