Rest-server crashes or exits unexpectedly

pep · October 12, 2018, 1:03pm

We’ve been observing issues with rest-server unexpectedly exiting.

We have it running on a not-so-great linux distribution (synology), amd64 machine, using a statically linked binary (not so hard with go), and custom init scripts.

Here are the relevant bits in the init scripts, that start the daemon:

RUNARGS="--debug --log /dev/stdout --cpu-profile $PROFFILE --path $BACKUP_PATH --tls --tls-cert $TLS_CERT --tls-key $TLS_KEY --private-repos --append-only"
$REST_BIN $RUNARGS >> $LOGFILE 2>&1 &!

(no syslog, no systemd, no what-have-you)

We have about 60 machines doing backups to this endpoint, of different sizes, ranging from 10GB to 2TB.
We only recently switched from bacula to restic, so we had to improvise something for the scheduling as restic doesn’t provide anything. Using a combination of ansible/AWX (ansible tower), among these machines there is between 3 and 6 machines always running backups.

For the issue at hand, rest-server sometimes falls down, for unknown reasons. I don’t see anything at all in dmesg about this, nor in the logs. Logs generally end with the last http request that a machine has made.
Maybe a bad combination of big machines running at the same time, as this doesn’t happen all the time, and we have had successful runs with all these machines.
This might not even be related to this daemon at all, but it’s the only daemon affected that we can see.

The storage box itself (synology) has 8 cores, 8 GB RAM, 6GB swap. At the moment it seems pretty much idle, but I will try and keep an eye on it.

Any advice as to how I can provide more information?
I don’t mind patching rest-server to add more logging.

fd0 · October 12, 2018, 1:22pm

That’s very odd indeed. Can you run the rest server in the foreground, so we can see if there’s anything on stderr printed?

Are you sure stderr is written to the logfile? You can test by inserting the following line here: https://github.com/restic/rest-server/blob/master/cmd/rest-server/main.go#L99

fmt.Fprintf(os.Stderr, "testing output to stderr\n")

Is there anything in the kernel log (check with dmesg)? Maybe an out-of-memory situation?

What you can also try is using rclone as the REST backend, it offers the rclone serve restic command which provides such a server.

pep · October 12, 2018, 5:50pm

I haven’t reproduced yet, but I did patch rest-server and I do get that message on stderr.

Is there anything in the kernel log (check with dmesg )? Maybe an out-of-memory situation?

As I said above, I get exactly nothing in dmesg. I will keep an eye on memory consumption on that storage box.

What you can also try is using rclone as the REST backend, it offers the rclone serve restic command which provides such a server.

I’ll give it a try, thanks for the suggestion.

fd0 · October 12, 2018, 8:25pm

Ah, I overlooked, sorry about that!

pep · November 27, 2018, 10:33pm

Sorry for not coming back to you earlier. This is related to https://github.com/restic/rest-server/issues/80.
This can be closed, (or lost in the meanders of this forum).