Monitoring the freshness of backups

Hey,

I’d like to monitor the freshness of my backups, i.e. make icinga red if host X has not created a backup in the last 24h.
I could use restic snapshots --json and get the information from there, but I’ve been wondering if anyone has a ready-to-use integration already?

Evgeni

No, we don’t have such a feature (yet).

Since this question came up several times in the last couple of days, what would be the requirements for such a thing? For restic, multiple clients can save backups into a single repo, and there may even be different backups (e.g. different directories) for a single client. What would be the semantics and the output of such a command?

So obnam has a nagios-last-backup-age option, which will check the age of the backup and produce a nagios compatible output saying whether the backup is OK, WARNING or CRITICAL (the option accepts two parameters to define when a backup is old enough for a warning, and when it’s critical).

For restic, I could imagine something like restic snapshot-age [--host=HOSTNAME] [--critical-age=AGE] [--warning-age=AGE] DIRECTORY. The output (for me) would be as expected by Nagios/Icinga (basically a string with human data plus a correct exit code of 0/1/2/3).

However, I am not even really sure if this belongs into restic core or whether this should live as a Perl/whatever script in a contrib area.

Whatever features are implemented should IMO be implemented in a neutral way, not specific to a certain e.g. external application. As you described it here, it’s very specific for Nagios. Better to make such a feature generic and the output formatted in a way that would make sense for and be coherent with restic overall.

I’ve not used Icinga, so this may not be useful but I’ve handled this by backing up with a shell script that then does a curl request to my Sensu API. One of the things Sensu has is a TTL for the check, so I get an alert when the TTL expires (aka due to cron not running my backup, or my backup failing.)

I think the neutral way is already covered by restic snapshots. This is also what restilc-tools uses for it’s monitor command: https://github.com/binarybucks/restic-tools/blob/master/bin/backup#L80.

I sat down and wrote a simple check using Perl and Monitoring::Plugin: https://github.com/evgeni/check_restic/blob/master/check_restic.pl

1 Like

Sorry for bumping this old thread :wink:

For a monitor shell script I tried the following approach using the popular jq tool:

  # restic snapshots --json --path /srv/data | jq -r '.[-1]'
  {
  "time": "2019-03-08T18:00:09.792955797Z",
  "parent": "ee047edc6dbfc50dd9b179d471f48079d4a4f0dd31bf60658766b6f9bed9e012",
  "tree": "8653335ec23446702c2f18531e9fbdb4d720b40378c0f88f7a995facb59814f9",
  "paths": [
    "/srv/data",
    "/srv/data2"
  ],
  "hostname": "data01",
  "id": "09a799501eb2aa3407c0c85055756275508f9c087222178e33168ea9a05f5b07",
  "short_id": "09a79950"
  }

One pitfall are the milliseconds of the iso8601 dates which are still not parseable by jq (issue):

# restic snapshots --json --path /srv/data | jq -r '.[-1].time|fromdate'
jq: error (at <stdin>:1): date "2019-03-08T18:00:09.792955797Z" does not match format "%Y-%m-%dT%H:%M:%SZ"

The following seems to work:

# restic snapshots --json --path /srv/data | jq -r '.[-1].time|strptime("%Y-%m-%dT%H:%M:%S.%Z")|mktime'
1552068009

One can now use the timestamp for further checks in their shell script.

1 Like

I really like https://healthchecks.io/

2 Likes

What is considered to be the best practise approach for monitoring restic backups?
Calling the snapshot list on the CLI and post-processing the returned JSON response?

I just scan the snapshots/ folder for files and get the most recent timestamp. It’s good enough for me.

My backups are initiated via cron, consisting of a backup and listing of changes over the last two snapshots, and the output is mailed to me. Normally I don’t have a lot of daily changes, but the listing of changes gives me a chance to see if something was added that shouldn’t have been, and more importantly if something was deleted that shouldn’t have been.

I did a mix of @rawtaz and your solution: getting the snapshots list from my cron, tailing it to the last 4 lines and emailing it, so i can see if the cron has run and restic created the snapshot or if an error occured.

Thanks guys for your answers and input!

One of my customers doesn’t afford any monitoring so from time to time I manually check all systems. I wrote a little script that simply checks all repos and lists whether the last backup run (“yesterday”) created a backup:

restic -r /repo snapshots | grep `date -d 'yesterday' '+%Y-%m-%d'`

Probably the simplest form of control but better than nothing :stuck_out_tongue_winking_eye:

2 Likes

Maybe looking at forget gives us some ideas.

  • Without any option, it should output the date of the last backup - disregarding the host, etc. (Or maybe it should even refuse to do something as this might be a dangerous default.)
  • There should be a group-by option: e.g. if you group by host, it should output the date of the last backup for each host
  • It should be possible to filter by tag and host
  • There should be a JSON output option

This might not be complete, yet.

There’s a problem with this though - you’re relying on the client-side information to determine whether your backups ran or not. I check on the server side only.

The forget command is not one that is expected to run after every backup run, nor is it meant to get you snapshot information. The things you describe are what the snapshots command is for :slight_smile:

Now i’m confused: doesn’t the client reflect the status on the remote server? I mean, when executing the snapshots command, doesn’t this provide information retrieved from the repository on the server?

Ah, maybe a misunderstanding: i’m executing the snapshots command on the remote server via restic, you had been talking about scaning a local directory, right?

I was just saying that if you want to inspect when snapshots were last made, the best source of truth for that would be the server where you run rest-server, by looking at when the files in the snapshots/ directory in the repository files were created.

Hmm… again irritated. Are you saying executing restic: snapshots on the remote server is not reliable?