Monitoring the freshness of backups


#1

Hey,

I’d like to monitor the freshness of my backups, i.e. make icinga red if host X has not created a backup in the last 24h.
I could use restic snapshots --json and get the information from there, but I’ve been wondering if anyone has a ready-to-use integration already?

Evgeni


Getting last successful backup time
#2

No, we don’t have such a feature (yet).

Since this question came up several times in the last couple of days, what would be the requirements for such a thing? For restic, multiple clients can save backups into a single repo, and there may even be different backups (e.g. different directories) for a single client. What would be the semantics and the output of such a command?


#3

So obnam has a nagios-last-backup-age option, which will check the age of the backup and produce a nagios compatible output saying whether the backup is OK, WARNING or CRITICAL (the option accepts two parameters to define when a backup is old enough for a warning, and when it’s critical).

For restic, I could imagine something like restic snapshot-age [--host=HOSTNAME] [--critical-age=AGE] [--warning-age=AGE] DIRECTORY. The output (for me) would be as expected by Nagios/Icinga (basically a string with human data plus a correct exit code of 0/1/2/3).

However, I am not even really sure if this belongs into restic core or whether this should live as a Perl/whatever script in a contrib area.


#4

Whatever features are implemented should IMO be implemented in a neutral way, not specific to a certain e.g. external application. As you described it here, it’s very specific for Nagios. Better to make such a feature generic and the output formatted in a way that would make sense for and be coherent with restic overall.


Restic REST API for client
#5

I’ve not used Icinga, so this may not be useful but I’ve handled this by backing up with a shell script that then does a curl request to my Sensu API. One of the things Sensu has is a TTL for the check, so I get an alert when the TTL expires (aka due to cron not running my backup, or my backup failing.)


#6

I think the neutral way is already covered by restic snapshots. This is also what restilc-tools uses for it’s monitor command: https://github.com/binarybucks/restic-tools/blob/master/bin/backup#L80.


#7

I sat down and wrote a simple check using Perl and Monitoring::Plugin: https://github.com/evgeni/check_restic/blob/master/check_restic.pl


#8

Sorry for bumping this old thread :wink:

For a monitor shell script I tried the following approach using the popular jq tool:

  # restic snapshots --json --path /srv/data | jq -r '.[-1]'
  {
  "time": "2019-03-08T18:00:09.792955797Z",
  "parent": "ee047edc6dbfc50dd9b179d471f48079d4a4f0dd31bf60658766b6f9bed9e012",
  "tree": "8653335ec23446702c2f18531e9fbdb4d720b40378c0f88f7a995facb59814f9",
  "paths": [
    "/srv/data",
    "/srv/data2"
  ],
  "hostname": "data01",
  "id": "09a799501eb2aa3407c0c85055756275508f9c087222178e33168ea9a05f5b07",
  "short_id": "09a79950"
  }

One pitfall are the milliseconds of the iso8601 dates which are still not parseable by jq (issue):

# restic snapshots --json --path /srv/data | jq -r '.[-1].time|fromdate'
jq: error (at <stdin>:1): date "2019-03-08T18:00:09.792955797Z" does not match format "%Y-%m-%dT%H:%M:%SZ"

The following seems to work:

# restic snapshots --json --path /srv/data | jq -r '.[-1].time|strptime("%Y-%m-%dT%H:%M:%S.%Z")|mktime'
1552068009

One can now use the timestamp for further checks in their shell script.


#9

I really like https://healthchecks.io/