Restic check hangs, reports no error

Good morning,

I’ve been using Restic without incident since February, and I went to my server to do some routine security updates and the next day Restic failed to backup. The good news is I still have all data I care about that Restic backed up here in my home server, but I want to fix this ASAP since this is my only off-site backup.

Since then I’ve been trying to figure out what went wrong, below is my setup, what I’ve tried, and what I suspect happened.

Setup:
I have an initial install script for Restic through Ansible that handles setting up environment variables and connecting to the right repository in Backblaze B2. This script also sets up the systemd service that does the backup script in the first place, which runs restic check, backup, and prune, reporting any errors and timing out if restic check takes more than a few minutes.

When I update the server, I often run the script again which will see if restic check fails, and if it does it assumes it is a first-time run and will init the remote repo. (this is likely important later)

What I’ve tried, nothing has worked yet:

  1. At first I thought this might just be a fluke run, so I restarted the PC and finished the rest of the updates.
  2. Next I tried to run the restic check command in CLI to check for errors, but after waiting about 15 minutes with the verbose switch I aborted it. The only error I got was “Fatal: Unable to open config file. Stat: Context canceled”.
  3. Looked through the forums a bit, started by checking for hardware-related bottlenecks, but I still have more than half of my Ram (8GB remaining), storage, and swap space available so I am doubt that’s the issue.
  4. Looked in the forums a bit longer, tried again (about a minute) with the --no-cache option, received the same config file error above
  5. I doubled checked the Backblaze keys to see if they expired, but they are definitely still valid
  6. Opened up the Backblaze B2 bucket to see what the config file had in it. It looks suspicious, but I cannot confirm that the “creation date” it has listed (which is the day the repo was made) would include modifications to this file, so I’m at a bit of a loss. Here’s the content of that file:
{
  "code": "unauthorized",
  "message": "",
  "status": 401
}
  1. Running the check for much longer to see if it breaks the loop itself or actually finishes. The repo I’m backing up does have about 300GB so I am unsure if check is simply slow.

Hypothesis:

  1. My best bet is that somehow the ansible script registered a false-failure and attempted to recreate the repo, thus breaking the config file by writing over it. I have no secondary backup of this file, unfortunately. This is my best hypothesis because I ran this script literally the day before the failure, so it seems like the probable cause.
  2. I have my doubts about this, but it could be that some firewall or other rule is blocking the connection to the bucket API. This has happened to me once before with a Pihole blocking B2 by mistake, but I am not using a pihole on my network right now so that couldn’t be the issue.

How I could use some help:
I really need better insights into the problem, since I am grasping at straws here. Any help or insight is appreciated since I’m a noob in this area, but I especially could use a way to figure out what’s going wrong under the hood and some expert eyes that could confirm that this config file is the problem or not.

Thank you!

This is resolved now.

In short I decided to let Restic Check run for however long and time it this time to see if it would hang indefinitely, and ultimately it took about 30 minutes to run. This is far, far longer than I would have expected it to, but ultimately all is well since it is running just fine now.