We’ve been observing issues with rest-server unexpectedly exiting.
We have it running on a not-so-great linux distribution (synology), amd64 machine, using a statically linked binary (not so hard with go), and custom init scripts.
Here are the relevant bits in the init scripts, that start the daemon:
RUNARGS="--debug --log /dev/stdout --cpu-profile $PROFFILE --path $BACKUP_PATH --tls --tls-cert $TLS_CERT --tls-key $TLS_KEY --private-repos --append-only" $REST_BIN $RUNARGS >> $LOGFILE 2>&1 &!
(no syslog, no systemd, no what-have-you)
We have about 60 machines doing backups to this endpoint, of different sizes, ranging from 10GB to 2TB.
We only recently switched from bacula to restic, so we had to improvise something for the scheduling as restic doesn’t provide anything. Using a combination of ansible/AWX (ansible tower), among these machines there is between 3 and 6 machines always running backups.
For the issue at hand, rest-server sometimes falls down, for unknown reasons. I don’t see anything at all in
dmesg about this, nor in the logs. Logs generally end with the last http request that a machine has made.
Maybe a bad combination of big machines running at the same time, as this doesn’t happen all the time, and we have had successful runs with all these machines.
This might not even be related to this daemon at all, but it’s the only daemon affected that we can see.
The storage box itself (synology) has 8 cores, 8 GB RAM, 6GB swap. At the moment it seems pretty much idle, but I will try and keep an eye on it.
Any advice as to how I can provide more information?
I don’t mind patching rest-server to add more logging.