Dear all,
we are using restic now for about half a year in production and are very satisfied so far. It’s convenient, easy to use and fast.
We have about 1,500 restic clients which all use the same S3 storage to save restic snapshots. Each client has its own restic repository on the S3. We wrote small wrappers for backup/restore/etc are sourcing a configuration file located on the client itself which defines some variables like these:
export RESTIC_PASSWORD=*********************************************
export RESTIC_REPOSITORY_BASE=s3:https://<FQDN_of_S3_server>/<S3_BUCKET>/rr
export RESTIC_REPOSITORY=${RESTIC_REPOSITORY_BASE}-<client_hostname>
export RESTIC_REPOSITORY_SHORTNAME=rr-<client_hostname>
export AWS_ACCESS_KEY_ID=***********************************************
export AWS_SECRET_ACCESS_KEY=***********************************************
export AWS_ENDPOINT_URL=https://<FQDN_of_S3_server>
export AWS_BUCKET=s3://<S3_BUCKET>/rr-<client>
export TENANT=<tenant_name>
export SERVICE=<service_name>
export BACKUP_DIR=/var/lib/jenkins/master
export EXCLUDE_FILES="--exclude /var/lib/jenkins/master/caches --exclude /var/lib/jenkins/master/workspace"
export RETENTION_DAYS=10
do_backup(){
logger "backing up tenant ${TENANT} / service ${SERVICE}"
restic backup \
${EXCLUDE_FILES} \
--verbose \
--host ${host} \
--tag ${TENANT} ${BACKUP_DIR} 2>&1 >> $logFile
rcRestic=$?
}
I have now the strange situation where the backup fails for some reason which was not clear from the log what happened:
2024-01-30 04:51:39] dot-restic-backup.sh: backup completed: there was a fatal error with backup (no snapshot created)
I then did a manual backup like this:
restic backup --verbose --host <client_hostname> --tag <client_tag> /var/lib/jenkins/master
which producted the following output:
open repository
repository bfd50002 opened (version 2, compression level auto)
lock repository
Load(<lock/1ffe828352>, 0, 0) returned error, retrying after 264.548424ms: The specified key does not exist.
Load(<lock/1ffe828352>, 0, 0) returned error, retrying after 504.084248ms: The specified key does not exist.
Load(<lock/1ffe828352>, 0, 0) returned error, retrying after 789.516851ms: The specified key does not exist.
Load(<lock/1ffe828352>, 0, 0) returned error, retrying after 1.208177748s: The specified key does not exist.
Load(<lock/1ffe828352>, 0, 0) returned error, retrying after 3.187347333s: The specified key does not exist.
Load(<lock/1ffe828352>, 0, 0) returned error, retrying after 5.085535343s: The specified key does not exist.
Load(<lock/1ffe828352>, 0, 0) returned error, retrying after 4.82923451s: The specified key does not exist.
Load(<lock/1ffe828352>, 0, 0) returned error, retrying after 4.53371898s: The specified key does not exist.
Load(<lock/1ffe828352>, 0, 0) returned error, retrying after 15.720188005s: The specified key does not exist.
Load(<lock/1ffe828352>, 0, 0) returned error, retrying after 17.830241804s: The specified key does not exist.
Load(<lock/1ffe828352>, 0, 0) returned error, retrying after 544.303901ms: The specified key does not exist.
Load(<lock/1ffe828352>, 0, 0) returned error, retrying after 1.096046483s: The specified key does not exist.
Load(<lock/1ffe828352>, 0, 0) returned error, retrying after 1.268239083s: The specified key does not exist.
Load(<lock/1ffe828352>, 0, 0) returned error, retrying after 2.193189615s: The specified key does not exist.
Load(<lock/1ffe828352>, 0, 0) returned error, retrying after 1.617410165s: The specified key does not exist.
Load(<lock/1ffe828352>, 0, 0) returned error, retrying after 3.797922313s: The specified key does not exist.
Load(<lock/1ffe828352>, 0, 0) returned error, retrying after 6.918690894s: The specified key does not exist.
Load(<lock/1ffe828352>, 0, 0) returned error, retrying after 6.321951436s: The specified key does not exist.
Load(<lock/1ffe828352>, 0, 0) returned error, retrying after 6.654076796s: The specified key does not exist.
Load(<lock/1ffe828352>, 0, 0) returned error, retrying after 13.952480804s: The specified key does not exist.
Load(<lock/1ffe828352>, 0, 0) returned error, retrying after 593.510552ms: The specified key does not exist.
Load(<lock/1ffe828352>, 0, 0) returned error, retrying after 402.192182ms: The specified key does not exist.
Load(<lock/1ffe828352>, 0, 0) returned error, retrying after 1.607434279s: The specified key does not exist.
Load(<lock/1ffe828352>, 0, 0) returned error, retrying after 1.18157902s: The specified key does not exist.
Load(<lock/1ffe828352>, 0, 0) returned error, retrying after 3.740422932s: The specified key does not exist.
Load(<lock/1ffe828352>, 0, 0) returned error, retrying after 3.789983103s: The specified key does not exist.
Load(<lock/1ffe828352>, 0, 0) returned error, retrying after 7.432039314s: The specified key does not exist.
Load(<lock/1ffe828352>, 0, 0) returned error, retrying after 5.650148688s: The specified key does not exist.
Load(<lock/1ffe828352>, 0, 0) returned error, retrying after 7.436425516s: The specified key does not exist.
Load(<lock/1ffe828352>, 0, 0) returned error, retrying after 22.812015608s: The specified key does not exist.
unable to create lock in backend: The specified key does not exist.
That sounds like restic is having problems creating the lock on the repo.
I then tried other machines and other commands. All errors occuring have one in common: They fail when they try to create a lock on the repository:
root@<client_hostname>:~# restic cat config
repository bfd50002 opened (version 2, compression level auto)
Load(<lock/1ffe828352>, 0, 0) returned error, retrying after 369.025955ms: The specified key does not exist.
Load(<lock/1ffe828352>, 0, 0) returned error, retrying after 524.253019ms: The specified key does not exist.
Load(<lock/1ffe828352>, 0, 0) returned error, retrying after 1.633300767s: The specified key does not exist.
Load(<lock/1ffe828352>, 0, 0) returned error, retrying after 2.004477789s: The specified key does not exist.
[...]
Load(<lock/1ffe828352>, 0, 0) returned error, retrying after 10.490272322s: The specified key does not exist.
Load(<lock/1ffe828352>, 0, 0) returned error, retrying after 22.586281609s: The specified key does not exist.
unable to create lock in backend: The specified key does not exist.
or like this:
root@<client_hostname>:~# restic list locks
repository bfd50002 opened (version 2, compression level auto)
1ffe828352b54929462ccd788d7cd3279f56496b252bd48ce11b016020c47e65
root@<client_hostname>:~# restic unlock
repository bfd50002 opened (version 2, compression level auto)
Load(<lock/1ffe828352>, 0, 0) returned error, retrying after 651.162656ms: The specified key does not exist.
Load(<lock/1ffe828352>, 0, 0) returned error, retrying after 648.236312ms: The specified key does not exist.
Load(<lock/1ffe828352>, 0, 0) returned error, retrying after 802.909126ms: The specified key does not exist.
It seems that the restic unlock
command does not work anymore on some of my backup clients. I tried to find the error message
The specified key does not exist
in the restic code, but it’s not contained in the code.
My questions here are now the following
- has anybody an idea why the locking mechanism stopped working (which prevents a lot of my wrappers from working because some of them are using locking mechanism)
- where does the error message come from?
It seems to help to use restic unlock --remove-all
. Example from another machine:
root@<client_hostname>:~# restic unlock --remove-all
repository 6a87315d opened (version 2, compression level auto)
successfully removed 1 locks
Afterwards the backup was running fine again. And also the ability to get information about snapshots was running fine again:
root@<client_hostname>:~# restic snapshots
repository 6a87315d opened (version 2, compression level auto)
ID Time Host Tags Paths
-----------------------------------------------------------------------------------------------------------
9bf3bb82 2023-12-13 05:11:03 <client_hostname> dhcp1 /<client_hostname>.sql
5d38f5b1 2023-12-14 05:11:02 <client_hostname> dhcp1 /<client_hostname>.sql
2b7d16a9 2023-12-15 05:11:02 <client_hostname> dhcp1 /<client_hostname>.sql
36fe2fff 2023-12-16 05:11:02 <client_hostname> dhcp1 /<client_hostname>.sql
819da5f0 2023-12-17 05:11:02 <client_hostname> dhcp1 /<client_hostname>.sql
a272f54e 2023-12-18 05:11:03 <client_hostname> dhcp1 /<client_hostname>.sql
e6f214ae 2023-12-19 05:11:02 <client_hostname> dhcp1 /<client_hostname>.sql
de05329c 2023-12-20 05:11:02 <client_hostname> dhcp1 /<client_hostname>.sql
61d3d2f2 2023-12-21 05:11:02 <client_hostname> dhcp1 /<client_hostname>.sql
6d73ee43 2023-12-22 05:11:03 <client_hostname> dhcp1 /<client_hostname>.sql
-----------------------------------------------------------------------------------------------------------
10 snapshots
Many thanks in advance and best greetings,
Joachim.