Backup script fails and lock remains

Hello,

I have a backup script running daily through a cron job, and I send the output (stdout and stderr) by email. I noticed that I do not always receive an email so I investigated. The days I do not receive an mail, I go to run the script manually and the restic commands tell me “Fatal: unable to create lock in backend: repository is already locked exclusively by PID …”. I could do “restic unlock” in my script but I want to understand why the lock is not removed. Any suggestion to further troubleshoot? Also, I do not understand why I do not receive any email, could the cron job stop unexpectedly? I do not see anything special in the sys logs.

cron job:
15 3 * * * /home/pi/backup/backup_email_wrapper.sh

backup_email_wrapper.sh:

#!/bin/bash
(echo -e "Subject: Backup\n"; /home/pi/backup/backup.sh 2>&1) | msmtp -a gmail  xxx@xxx.com

backup.sh (without the config, echo and logging code):

...
# Backup
restic backup --verbose $source_dir
# Differences with the previous backup
restic diff $(restic snapshots --json | jq -r '.[-2:][].id')
# Delete old backups
restic forget --prune --keep-last 2 --keep-daily 7 --keep-weekly 5 --keep-monthly 12 --keep-yearly 3
# Check repository status
restic check
# Repository stats
restic stats
# List the locks (debug, there should be none)
restic list locks --no-lock
...

It would indicate that sometimes your script does not finish gracefully leaving lock behind. It is impossible to tell why without some data.

Save every restic command output in logfile, e.g.:

echo "Restic script starts `date`" >> /path/to/restic.log`
...
restic backup ... | tee /path/to/restic.log
...
restic forget ... | tee /path/to/restic.log
...
echo "Restic script ends `date`" >> /path/to/restic.log
echo  >> /path/to/restic.log

Then you will have some info captured to investigate.

The locks is either left behind by restic forget --prune or restic check. Please also check dmesg for warnings.

Ok I logged everything to a file and the output of the restic forget --prune is the last thing present in the file. This command ran at 03:20:52 AM and the remaining lock was created at 03:20:53 AM so it was created by this command. The log file was last written at 04:00:22 AM. At 04:00:35 AM, the journal indicates that the CRON job finished.

I still do not understand why the script does not continue up to the end, and even if it somehow fails why I do not get the email (with the pipe to msmtp).

Last data in the log file:

--> Wed 15 Nov 2023 03:20:52 AM CET - Delete old backups
Applying Policy: keep the last 2 snapshots, 7 daily, 5 weekly, 12 monthly, 3 yearly snapshots
snapshots for (host [pi4], paths [/mnt/data/mifrey]):

keep 9 snapshots:
ID        Time                 Host        Tags        Reasons           Paths
-----------------------------------------------------------------------------------------
2e2f2509  2023-10-31 16:27:30  pi4                     monthly snapshot  /mnt/data/mifrey
8065ee06  2023-11-05 04:15:01  pi4                     weekly snapshot   /mnt/data/mifrey
c7606b3c  2023-11-08 04:15:02  pi4                     daily snapshot    /mnt/data/mifrey
c07e07b4  2023-11-09 18:32:46  pi4                     daily snapshot    /mnt/data/mifrey
c1036bdc  2023-11-10 04:15:02  pi4                     daily snapshot    /mnt/data/mifrey
d4a4187e  2023-11-12 22:10:46  pi4                     daily snapshot    /mnt/data/mifrey
                                                       weekly snapshot
bd47f7cc  2023-11-13 21:22:54  pi4                     daily snapshot    /mnt/data/mifrey
1f7b94ae  2023-11-14 04:15:02  pi4                     last snapshot     /mnt/data/mifrey
                                                       daily snapshot
28e25b23  2023-11-15 03:15:02  pi4                     last snapshot     /mnt/data/mifrey
                                                       daily snapshot
                                                       weekly snapshot
                                                       monthly snapshot
                                                       yearly snapshot
-----------------------------------------------------------------------------------------
9 snapshots

remove 1 snapshots:
ID        Time                 Host        Tags        Paths
-----------------------------------------------------------------------
3d898f3d  2023-11-07 04:15:05  pi4                     /mnt/data/mifrey
-----------------------------------------------------------------------
1 snapshots

1 snapshots have been removed, running prune
counting files in repo
building new index for repo
[39:21] 100.00%  12757 / 12757 packs

repository contains 12757 packs (101952 blobs) with 61.125 GiB
processed 101952 blobs: 0 duplicate blobs, 0 B duplicate
load all snapshots

Info on the lock:

pi@pi4:~/backup $ restic list locks --no-lock
repository aea81237 opened successfully, password is correct
9396cff4d2db1a25556a8b2511248bf47c3005fe440ee68456c0b5a774f7e363
pi@pi4:~/backup $ restic cat lock 9396cff4d2db1a25556a8b2511248bf47c3005fe440ee68456c0b5a774f7e363 --no-lock
repository aea81237 opened successfully, password is correct
Fatal: unable to create lock in backend: repository is already locked exclusively by PID 27369 on pi4 by pi (UID 1000, GID 1000)
lock was created at 2023-11-15 03:20:53 (10h54m30.92362501s ago)
storage ID 9396cff4

Data in the journal related to the backup:

Nov 15 03:15:01 pi4 CRON[23991]: pam_unix(cron:session): session opened for user pi by (uid=0)
Nov 15 03:15:01 pi4 CRON[23993]: (pi) CMD (/home/pi/backup/backup_email_wrapper.sh )
Nov 15 04:00:35 pi4 CRON[23991]: (CRON) info (No MTA installed, discarding output)
Nov 15 04:00:35 pi4 CRON[23991]: pam_unix(cron:session): session closed for user pi

You’re using a very old restic version. Please upgrade to a newer restic version, which will be much faster for prune and use less memory.

How much free memory does your raspberry pi have? My guess would be that the system runs out of memory and restic gets killed. Are there any corresponding entries in dmesg?

There’s some problem with your mail setup on the host.

Indeed it is 0.9.4 that came from the raspian repository. I have now installed the latest binary (0.16.2).

Free memory is ~200MB but no message in demsg or other logs related to memory.

The mail is not set up for CRON, I use a pipe to msmtp to send an email.

But since I installed the latest version of restic two days ago I have had any issue. The pruning is indeed much faster, it takes few seconds vs 39 minutes with the old version.

Also I had /tmp mounted as a tpmfs file system (to avoid writr on the SD card of the Raspberry Pi) of 100MB and the backup step was still crashing. All is ok now after increasing the size to 200MB.