Faulty network card crashes my server

Hello,

I have been using Restic 0.9.6 without a problem for many months on my Centos 8 server. It is not a beefy machine but it gets the job done. The repo size is ~500GB

Today I upgraded to 0.10.0 and ran my backup.
After I while my server became unresponsive and I had to restart with hardware button.
There is nothing in the logfiles, so I guess the panic information was not written.
At point of crash I was connected via SSH which also did not show any helpful info.

Can anybody help me here?
Thanks!

Please run it over SSH or something so you can grab the entire command you run and all of its output and paste it here. You can also send the restic process a SIGABRT when it appears to have stalled, hopefully that can show you a stacktrace that you can paste here too. See if any of that is doable :slight_smile:

I’m running the following command with a simple bash script
restic backup $RESTIC_REPO $SOURCE --cleanup-cache --ignore-inode --exclude="parent.lock" --verbose --verbose

I’ll try to get a stacktrace, but not sure if I can find the right moment before it stops.

Please make a complete paste including the entire command, the values of the environment variables (obfuscated to not show sensitive information of course), and the entire output of the command.

So far I have not been able to send the signal.
During backup these errors are showing error: Readdirnames /mnt/path/file.jpg: failed: readdirent: no such file or directory no such file or directory but the files in question exist. I cannot remember seeing that before.

@uok I have asked you several times now to provide a paste with the following in it:

  • The full command you use to run restic.
  • The value of the environment variables you use in that command.
  • The entire output from restic when this command is run.

Yet you only provide piece by piece of it, and still not all of it. Am I being unclear, or what is the reason that you are not providing the information I ask you for?

I posted the command above and I do not use special environment variables. Restic is only informing me about new/modified files until it crashes. As I wrote I was not able to send the signal in time.

How much free memory does the server have? What is behind /mnt/path/file.jpg? A network filesystem or a local filesystem? Where is the repository stored? What are the first few lines output by restic?

1 GB free mem during restic run (I know this is not much, but in the past this never was a problem), files are read from mount (which points to Windows share on PC), repository is stored on local disk (server).
When Restic starts it reads repo info and then is normal output with new/modified file list and progress status.

The problem is that on server there is also firewall for network, webserver, files, etc. so if it crashes it takes me some time to get all back running again.

I will check back later, thanks!

You posted the command, but you use variables in it that you still haven’t told us the value of. Furthermore, there is definitely more output from restic than that specific error message you showed above. And finally, I have asked you to put all the things I asked for in one single paste merely so it’s easier for us to get a clear picture of the entire run.

In summary, there is evidently information that you have been asked for but haven’t provided. If you want people to help, it’s expected that you follow their instructions to get the information they ask for. Please do that (this is the fourth time I’m asking you).

I understand it’s hard to get the signal sent, but that’s not what I’m talking about, I’m just talking about the other things I asked for several times by now.

It’s really really really simple; Just do what I ask and then we can continue to debug this for you. But if you don’t provide the information we ask for, we simply cannot help you. This should be obvious to anyone. And FWIW, we want to help you.

Sorry for the late reply and the lack of detail in my bug report.
For completeness:

SOURCE="/mnt/windows"
REPO="--repo /backup --password-file cred.txt"
mount -t cifs -o ro \\path\to\windows-computer $SOURCE
restic backup $REPO $SOURCE --cleanup-cache --ignore-inode --exclude="parent.lock" --verbose --verbose

I finally found some time and after many hours of testing the reason was a faulty network card.
Now Restic runs like a :rocket: