How to access more details about a failure?

I am writing a simple orchestrator for my backups, in Python, so that this is portable between OSes (specifically Windows 11 and Linux). It works great except that sometimes I get notified about failed backups.

My targets (different depending on the backup profile) are local file systems, REST servers, and S3.

The failures are transient - if a backup fails on a target, it recovers on the next backup cycle. Nevertheless, I would like to know why it failed. The causes can be multiple and at different layers: the network is down, a disk is full, a repo is corrupted, a specific restic error, …

The only information I found is about exit codes ( Scripting — restic 0.18.0 documentation ) so my question is whether I can find more details, somewhere?

In order to fix the question, let’s take:

  • network failure
    • can restic tell “the REST server did not respond”, or “cannot connect at all”
    • or should I expect this at the network connection level (network connections exceptions)?
  • disk full:
    • can restic tell me that it could not write a backup because there was no space left?
    • because otherwise i have no way to tell this without monitoring the target (which i may not be able to do)

Hi :waving_hand:

You can trigger restic with –json flag and get the stdout of the process when it is done. If the output is not 0 or 3, you might want to parse and log these json objects to see the actual error. Not all failure cases are fully covered on exit codes afair.

What you are looking for is there, I generally only check the element ‘message_type’: ‘summary’ from the output.

Of course, it has a lot of helpful output like total_bytes_processed for tracking the backup target’s size over time, if you need.

I think case 1 is easy to catch by your orchestrator, just do a quick repository query before the backup starts, like restic snapshots to ensure that you can reach the destination repository.