I’d like to monitor the freshness of my backups, i.e. make icinga red if host X has not created a backup in the last 24h.
I could use restic snapshots --json and get the information from there, but I’ve been wondering if anyone has a ready-to-use integration already?
Since this question came up several times in the last couple of days, what would be the requirements for such a thing? For restic, multiple clients can save backups into a single repo, and there may even be different backups (e.g. different directories) for a single client. What would be the semantics and the output of such a command?
So obnam has a nagios-last-backup-age option, which will check the age of the backup and produce a nagios compatible output saying whether the backup is OK, WARNING or CRITICAL (the option accepts two parameters to define when a backup is old enough for a warning, and when it’s critical).
For restic, I could imagine something like restic snapshot-age [--host=HOSTNAME] [--critical-age=AGE] [--warning-age=AGE] DIRECTORY. The output (for me) would be as expected by Nagios/Icinga (basically a string with human data plus a correct exit code of 0/1/2/3).
However, I am not even really sure if this belongs into restic core or whether this should live as a Perl/whatever script in a contrib area.
Whatever features are implemented should IMO be implemented in a neutral way, not specific to a certain e.g. external application. As you described it here, it’s very specific for Nagios. Better to make such a feature generic and the output formatted in a way that would make sense for and be coherent with restic overall.
I’ve not used Icinga, so this may not be useful but I’ve handled this by backing up with a shell script that then does a curl request to my Sensu API. One of the things Sensu has is a TTL for the check, so I get an alert when the TTL expires (aka due to cron not running my backup, or my backup failing.)
# restic snapshots --json --path /srv/data | jq -r '.[-1].time|fromdate'
jq: error (at <stdin>:1): date "2019-03-08T18:00:09.792955797Z" does not match format "%Y-%m-%dT%H:%M:%SZ"
What is considered to be the best practise approach for monitoring restic backups?
Calling the snapshot list on the CLI and post-processing the returned JSON response?
My backups are initiated via cron, consisting of a backup and listing of changes over the last two snapshots, and the output is mailed to me. Normally I don’t have a lot of daily changes, but the listing of changes gives me a chance to see if something was added that shouldn’t have been, and more importantly if something was deleted that shouldn’t have been.
I did a mix of @rawtaz and your solution: getting the snapshots list from my cron, tailing it to the last 4 lines and emailing it, so i can see if the cron has run and restic created the snapshot or if an error occured.
One of my customers doesn’t afford any monitoring so from time to time I manually check all systems. I wrote a little script that simply checks all repos and lists whether the last backup run (“yesterday”) created a backup:
Without any option, it should output the date of the last backup - disregarding the host, etc. (Or maybe it should even refuse to do something as this might be a dangerous default.)
There should be a group-by option: e.g. if you group by host, it should output the date of the last backup for each host
There’s a problem with this though - you’re relying on the client-side information to determine whether your backups ran or not. I check on the server side only.
The forget command is not one that is expected to run after every backup run, nor is it meant to get you snapshot information. The things you describe are what the snapshots command is for
Now i’m confused: doesn’t the client reflect the status on the remote server? I mean, when executing the snapshots command, doesn’t this provide information retrieved from the repository on the server?
Ah, maybe a misunderstanding: i’m executing the snapshots command on the remote server via restic, you had been talking about scaning a local directory, right?
I was just saying that if you want to inspect when snapshots were last made, the best source of truth for that would be the server where you run rest-server, by looking at when the files in the snapshots/ directory in the repository files were created.