Restic Exit Code in a bash script appears to be wrong

I have the following bash shell script that I’m running periodically from cron in Ubuntu 16.04. I’m piping my cron output to cronlog.txt and cronerrlog.txt respectively. The output from the actual backup command I’m independently outputting to .log files so as not to jam up my cron output. I have plans of keeping each backup’s .log output separately so that I can refer to it if necessary.

#!/bin/bash


now=$(date +"%Y%m%d%H%M")
today=$(date +"%Y%m%d")
rbkup="/home/nexargi/GI/bkup/"
rrepo="/home/nexargi/GI/bkup/restic-repofile.txt"
rhome="/home/nexargi/GI/restic/"
result="UNKNOWN"

echo '------ BACKUP START ------'  $now
${rhome}restic backup -v -v --files-from ${rbkup}restic-includefiles.txt  --repository-file $rrepo  --password-file ${rbkup}rchuparustom >>./logs/bkup_$now.log &2>>./logs/bkuperr_$now.log
result=$?
echo $result
if [[ $result == "0" ]]; 
then
	echo 'Backup Success'
	/usr/sbin/sendmail purvez@nexar.free-online.co.uk < successemail.txt
fi
if [[ $result == "1" ]];
then
	echo 'Backup Fail'
	/usr/sbin/sendmail purvez@nexar.free-online.co.uk < failureemail.txt
fi
if [[ $result == "3" ]];
then
	echo 'Backup Partial'
	/usr/sbin/sendmail purvez@nexar.free-online.co.uk < partialsuccessemail.txt
fi
if [[ $result == "UNKNOWN" ]];
then
	echo 'Unknown backup error'
	/usr/sbin/sendmail purvez@nexar.free-online.co.uk < unknownproblem.txt
fi
echo '------ BACKUP END ------'  $now

When everything is running smoothly it appears to work. I wanted to test the script so I created a fatal error by changing the name of the repository in the restic-repofile.txt. In the bkup_datetime.log file I got a single line:

open repository

The bkuperr_datetime.log file is empty.

My cronerrlog.txt shows the following lines:

Fatal: unable to open config file: Stat: stat /home/nexargi/GI/bkup/restic-repox/config: no such file or directory
Is there a repository at the following location?
/home/nexargi/GI/bkup/restic-repox

and my cronlog.txt shows:

------ BACKUP START ------ 202104201133
0
Backup Success
------ BACKUP END ------ 202104201133

I was expecting the Fatal error to be reported in bkuperr_datetime.log file and the Exit code to be 1 so that I could catch the fatal error and send myself an email.

My questions are:

  1. Am I trying to be too clever here?

  2. Why is restic not reporting the Fatal error as part of the backup command in stderr?

  3. Why is the Exit code always 0?

  4. Is this more of a bash/shell script problem than a restic one? If it is I’ll happily search for the answer elsewhere but getting a ‘zero’ exit code immediately after the backup command that had a fatal error is what’s confusing me.

Thanks for all help in advance.

I would encourage you to:

  • Tell us which version of restic that is.
  • If it isn’t the latest version of restic (0.12.0), upgrade restic and verify that the script uses the latest version.
  • If the problem then persists, run the same restic command that the script runs, but manually (without the script) and see if you can then reproduce and verify the problem.

If you can reproduce the problem with the latest restic version running manually, let’s dig deeper :slight_smile:

@rawtaz thanks for your prompt response. Yes I downloaded the binary yesterday from your github page and it is 0.12.0.

Funnily enough I did try the command at the command line. I first set up all the variables within the command line shell. Then I carried out the command and here is the command and the output.


~/GI/bkup$ ${rhome}restic backup -v -v --files-from ${rbkup}restic-includefiles.txt  --repository-file $rrepo  --password-file ${rbkup}rchuparustom >>./logs/bkup_$now.log &2>>./logs/bkuperr_$now.log
[1] 5610
nexargi@server-02:~/GI/bkup$ Fatal: unable to open config file: Stat: stat /home/nexargi/GI/bkup/restic-repox/config: no such file or directory
Is there a repository at the following location?
/home/nexargi/GI/bkup/restic-repox
^C
[1]+  Exit 1                  ${rhome}restic backup -v -v --files-from ${rbkup}restic-includefiles.txt --repository-file $rrepo --password-file ${rbkup}rchuparustom >> ./logs/bkup_$now.log

What was strange was that instead of immediately ending with an error code it ‘hung’ on the line before the ^C towards the end.

Only after I pressed ^C did the last line get outputted. I don’t know whether this has any significance or not.

I also don’t know what the [1] 5610 immediately after the command line is.

This is the output from running : restic version

restic 0.12.0 compiled with go1.15.8 on linux/amd64

Ok I figured out why any errors from the restic backup command were not going into bkuperr_datetime.log. I had a mistake in my shell script where there was an & before the 2>>.

So that solves one half of the mystery. I now get the error showing correctly in the proper file.

I tried to create a Partial Backup situation by including a non-existent directory in my restic-includfiles.txt file. I got a message that : pattern “/home/nexargi/GI/abc/*” does not match any files, skipping

However the EXIT Code was still ‘zero’ rather than 3.

If there is anything further I should be testing then please let me know. Thanks

I also tried it as a manual command and I got the same output and errors but the Exit Code was ‘zero’ not 3.

It is entirely possible that resolving this may take some time and that’s fine with me. I’ll re-visit when ready. However in the meantime I’m planning an alternative strategy for reporting:

I still output the stdout and stderr from the restic backup command to individual files. If everything is fine then stderr will have zero length which I can check for.

If stderr has content then I can grep for ‘Fatal’ within the context and thereby decide which email to send.

If stderr has content and I grep for ‘skipping’ then that would mean a Partial backup.

If stderr has content and it doesn’t pass either of the above tests then the error would be Unknown.

My question therefore is:

Are ALL Fatal errors marked with the word Fatal?

Please would one of the Devs confirm this or suggest alternative strategy.

Many thanks to ALL here who do such a GREAT job in keeping this project going so well.

Curious. I’ve tried to reproduce your problem, but cannot. I don’t have an Ubuntu system to test on. Could you please provide:

  1. the current version of your script.
  2. the result of the following commands, all issued one right after the other in the same shell:

bash --version
restic backup foo (can replace “foo” with something else nonsensical as long as it doesn’t exist)
echo $?
type $PROMPT_COMMAND

On my system I get:

mcp:~$ bash --version
GNU bash, version 4.2.46(2)-release (x86_64-redhat-linux-gnu)
Copyright (C) 2011 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>

This is free software; you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
$ restic version
restic 0.12.0 compiled with go1.15.8 on linux/amd64
$ restic backup foo
foo does not exist, skipping
Fatal: all target directories/files do not exist
$ echo $?
1

$ restic version
restic 0.12.0 compiled with go1.15.8 on linux/amd64
$ echo $?
0

So I do get a non-zero return code on errors. “type $PROMPT_COMMAND” may return nothing, and that’s fine.

Regards,
-Jason

@jdwhite thanks very much for your help. Here is the output you requested:

nexargi@server-02:~$ bash --version
GNU bash, version 4.3.48(1)-release (x86_64-pc-linux-gnu)
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html

This is free software; you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

xxxxxxxxxxx

nexargi@server-02:~/GI/restic$ ./restic backup --files-from ~/GI/bkup/restic-includefiles.txt -r ~/GI/bkup/restic-repo --password-file ~/GI/bkup/rchuparustom
pattern “x/data/www/" does not match any files, skipping
pattern "x/home/nexargi/GI/mysql/
” does not match any files, skipping
Fatal: nothing to backup, please specify target files/dirs
nexargi@server-02:~/GI/restic$ echo $?
1

xxxxxxxxxxxxxxxxxxxxxxxxx

nexargi@server-02:~/GI/restic$ ./restic backup --files-from ~/GI/bkup/restic-includefiles.txt -r ~/GI/bkup/restic-repo --password-file ~/GI/bkup/rchuparustom
pattern “x/data/www/*” does not match any files, skipping
repository adcf3d00 opened successfully, password is correct
no parent snapshot found, will read all files

Files: 32 new, 0 changed, 0 unmodified
Dirs: 4 new, 0 changed, 0 unmodified
Added to the repo: 16.310 KiB

processed 32 files, 86.253 MiB in 0:01
snapshot 012d4ad0 saved
nexargi@server-02:~/GI/restic$ echo $?
0


nexargi@server-02:~/GI/restic$ type $PROMPT_COMMAND
nexargi@server-02:~/GI/restic$

You will note that I ran the backup command twice. The first time neither of the paths existed and restic gave a correct Exit code of 1 for Fatal. However when I ran it again where ‘SOME’ of the files existed then it did a partial backup and should have exited with code 3. However it exited with code 0.

My ‘Fatal’ test had been to give it a non-existant repository and there although it output a Fatal error it exited with code 0.

I haven’t included the latest version of my script because as I said in my previous post I have changed strategy and am now grepping for the word ‘Fatal’ or ‘skipping’ and therefore no longer checking for exit codes in the script. My revised strategy appears to be working…so far. I would ideally like someone to confirm that all ‘Fatal’ errors ALWAYS have the word Fatal included in the message. If they do then I’m getting what I need for now although a bit long winded.

Please let me know if there is anything else that you would like me to try because going back to checking Exit codes will be simpler, faster and easier to understand the script.

Thanks again.

I was able to reproduce this with restic 0.12.0 and later. But it’s probably not a bug, but unexpected behavior.

The exit code 3 is returned for files that the scanner in restic found, but that cannot be backed up when it’s tries to do that (a separate step after the scanning). But it’s a different thing when it just reads the files from an “include file” and telling you that some of the patterns in that file don’t match anything.

So in other words; What you are seeing here is restic telling you “hey, this pattern I you asked me to back up isn’t valid”, and that is not something it will yield an exit code 3 for.

What it would yield an exit code 3 for is if a path/pattern it was asked to back up actually does exist at the time you start restic and when it scans the disk for this path, but that then disappears or becomes unreadable after the scanning but before the actual backing up takes place (restic first scans for files to back up, then backs them up once the scanning is done).

So at this point you should simply make sure that you don’t provide non-existing patterns in your “include file”.

@rawtaz thanks for your help and input. My intention was to test my script by providing restic with some stuff that it could not backup and hence become a partial backup.

Although I now understand what restic is doing, I believe in a single snapshot if ANY of the patterns can’t be backed up then it should have an Exit code of 3 in my opinion.

In any case I believe my revised testing strategy of grepping through actual output of errors is fine for now. I would however feel more comfortable if one of the Devs would confirm that all fatal errors have the word Fatal in the error message.

Thanks again.

I also tried supplying a non-existent directory at the command line, so the following example has a non-existent folder in both the list.txt “include file” and on the command line. Neither yields an exit code 3:

$ cat list.txt 
foo
foo2
foo3

$ ./restic -r apa backup --files-from list.txt bar
pattern "foo3" does not match any files, skipping
bar does not exist, skipping
enter password for repository: 
repository ba330420 opened successfully, password is correct
using parent snapshot 9af99ec5

Files:           0 new,     0 changed,     3 unmodified
Dirs:            0 new,     0 changed,     6 unmodified
Added to the repo: 0 B  

processed 3 files, 494 B in 0:00
snapshot e1f007b5 saved

$ echo $?
0

I understand what you’re saying, that if some of the things you ask restic to back up can’t be backed up, you want to know about it. Doing so would be to err on the side of caution, which is generally what restic tries to do. At the same time one can wonder why on earth you tell it to back something up that doesn’t exist. But there are of course use cases for that as well, e.g. that you have some path mounted only now and then, and have it listed/included in your backup scripts.

However, the question then becomes, do you really want to get a non-zero exit code just because you don’t have e.g. that disk mounted at the time? Some people would say yes, while others would say no because they don’t want to get a potential error code that they have to look into for every backup run.

In summary, there’s no right and wrong here, and each approach has pros and cons and proponents and opponents. It’s an interesting question though, and perhaps worthy a reconsideration. It would probably be more safe to exit with code 3 when an invalid path/pattern was supplied to restic, and require that those who include such paths have to not do so in case they want to not get the exit code 3 returned.

I am not in position to confirm that, maybe someone else can. FWIW you shouldn’t depend on the textual output. Is there any way you could make use of the output from --json instead? You can pipe the output through jq to access specific parts of it (jq is insanely powerful for processing JSON).

@rawtaz Thanks very much for your continued assistance with this. I whole heartedly agree with you that there is no ‘right or wrong’ answer here. That’s why I decided to fall back on the stderr output. However now your comment that I shouldn’t rely on that is worrying me.

I haven’t looked at the --json flag so I’ll do that tomorrow. I would need to learn some new stuff which whilst necessary is not something that I particularly want to do.

Would have been nice to just rely on Exit codes but …one can’t always have what one would like. :slight_smile:

The reason to not rely on textual output is that you never know if someone at some point decides to change the output somehow. But sure, if you keep an eye on that then you can do it. It’s just not the best way.

Let’s rewind though; Why does the paths that you want to back up not exist? What is your actual use case for giving restic paths to back up, when those paths don’t exist?

@rawtaz in this particular case I was simply trying to create a ‘partial backup’ situation. Normally the paths set up WOULD exist. However I would still maintain that for whatever reason a path given to restic cannot be backed up then it should report it, which it does but also exit with an ‘unsuccessful’ code.

I think this particular point has now been made and it is up to the Devs who have a far better understanding of the software to decide what the outcome should be.

I am very grateful for all that they and others here do to keep restic relevant and if it means I have to bend a bit to get what I need then I am very happy to do so.