Restic backup just gets killed

I have an OpenVZ based virtual machine with centos 7 at some provider. I was using restic here at a daily (cron) basis to backup to wasabi (S3) without any problems for three years.
Today I discovered that backups did not work since end of june.

When I run restic manually with -vv it runs for like five minutes and then it justed says “Killed.”.

I am using “restic 0.13.1 compiled with go1.18 on linux/amd64”. I also tried to init a new repo - same error.

Allthough I have 8GB of RAM on this machine I am quite sure that this is related with past issues like

or

Unfortunately I cannot add a swapfile, it’s not possible on OpenVZ VMs.

By the way, I cannot find (grep) anything useful on system logs or with dmesg.

My restic call is quite simple

/usr/local/bin/restic backup -vvv --host my-hostname /home

As read by the other threads above, I tried adding “export GOGC=20”. But that doesn’t change anything.

Maybe someone has an idea what I could try next?

how large is the index folder of the repository? That should give a rough estimate of the memory required by restic.

Besides that you could try run restic using strace, maybe that shows something useful? Or run it using GNU time (not the usual shell builtins) which should also show how much memory was used.

The problem maybe tied to a specific file?

I also have an OpenVZ VM on some service provider. I only have 2GB of RAM, though. My workaround is the following, but obviously it’s less than ideal if you have any substantial data to backup. I only have about 100MB of config and data files.

ssh remotehost 'tar -cPf - /path/to/files | zstd -2q' | zstdcat | restic backup --compression max -H akrabu-ovz --stdin --stdin-filename akrabu-ovz.tar

Obviously you’ll need zstd on your remote host, but you could probably use gzip just as easily. Also zstdcat is equivalent to zstd -dcf.

I used to use zstd -2q --rsyncable and just backed up the compressed tarball, but since Restic has built in compression, I now compress only for the transfer speed, then decompress on the fly and let Restic handle the compression. The deltas are ever-so-slightly smaller this way.

Again, I know this is less than ideal if you have a substantial amount of data to backup, but I thought I’d mention it here should it prove useful for you or anyone else doing something similar but with a small dataset.

Thank you all for your answers. I am starting to believe more and more that this is somehow openVZ related. Maybe the hoster has implemented some CPU or RAM limits, since it was working for years before.
The problem is not related with some file, it happens with different directories, I tested that. in the index folder at the repo (new repo, no finished backup by now) is just one file with 6MB…

A run with strace ends like this:

--- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=26861, si_uid=0} ---
rt_sigreturn({mask=[]})                 = 2264378868
clock_gettime(CLOCK_MONOTONIC, {tv_sec=159761, tv_nsec=31478469}) = 0
clock_gettime(CLOCK_MONOTONIC, {tv_sec=159761, tv_nsec=31510199}) = 0
clock_gettime(CLOCK_MONOTONIC, {tv_sec=159761, tv_nsec=39790781}) = 0
clock_gettime(CLOCK_MONOTONIC, {tv_sec=159761, tv_nsec=39823538}) = 0
clock_gettime(CLOCK_MONOTONIC, {tv_sec=159761, tv_nsec=39857971}) = 0
clock_gettime(CLOCK_MONOTONIC, {tv_sec=159761, tv_nsec=39888580}) = 0
clock_gettime(CLOCK_MONOTONIC, {tv_sec=159761, tv_nsec=39929996}) = 0
ioctl(1, TIOCGWINSZ, {ws_row=51, ws_col=209, ws_xpixel=0, ws_ypixel=0}) = 0
clock_gettime(CLOCK_MONOTONIC, {tv_sec=159761, tv_nsec=40013993}) = 0
ioctl(1, TIOCGPGRP, [26859])            = 0
getpgid(0)                              = 26859
[0:34] 9.72%  20980 files 4.896 GiB, total 456618 files 50.395 GiB, 0 errors ETA 5:21
) = 91
/home/mysql/mysql-bin.000013
)           = 9
clock_gettime(CLOCK_MONOTONIC, {tv_sec=159761, tv_nsec=40247684}) = 0
clock_gettime(CLOCK_MONOTONIC, {tv_sec=159761, tv_nsec=40278782}) = 0
futex(0x15a3ce8, FUTEX_WAIT_PRIVATE, 0, NULL) = 0
clock_gettime(CLOCK_MONOTONIC, {tv_sec=159761, tv_nsec=42242818}) = 0
--- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=26861, si_uid=0} ---
rt_sigreturn({mask=[]})                 = 824672807784
clock_gettime(CLOCK_MONOTONIC, {tv_sec=159761, tv_nsec=61672326}) = 0
clock_gettime(CLOCK_MONOTONIC, {tv_sec=159761, tv_nsec=61703598}) = 0
clock_gettime(CLOCK_MONOTONIC, {tv_sec=159761, tv_nsec=77189311}) = 0 errors ETA 5:21
clock_gettime(CLOCK_MONOTONIC, {tv_sec=159761, tv_nsec=77224105}) = 0
futex(0x15a3ce8, FUTEX_WAIT_PRIVATE, 0, NULL) = 0
clock_gettime(CLOCK_MONOTONIC, {tv_sec=159761, tv_nsec=94649638}) = 0
futex(0x15a3ce8, FUTEX_WAIT_PRIVATE, 0, NULL) = 0
clock_gettime(CLOCK_MONOTONIC, {tv_sec=159761, tv_nsec=94761013}) = 0
NULL) = 074%  20980 files 4.909 GiB, total 456618 files 50.395 GiB, 0 errors ETA 5:21
futex(0x15a3ce8, FUTEX_WAIT_PRIVATE, 0, NULL) = 0
clock_gettime(CLOCK_MONOTONIC, {tv_sec=159761, tv_nsec=98374199}) = 0
futex(0xc000200148, FUTEX_WAKE_PRIVATE, 1) = 1
--- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=26861, si_uid=0} ---
rt_sigreturn({mask=[]})                 = 824759794536
clock_gettime(CLOCK_MONOTONIC, {tv_sec=159761, tv_nsec=112170038}) = 0
--- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=26861, si_uid=0} ---
rt_sigreturn({mask=[]})                 = 0
clock_gettime(CLOCK_MONOTONIC, {tv_sec=159761, tv_nsec=112279719}) = 0
clock_gettime(CLOCK_MONOTONIC, {tv_sec=159761, tv_nsec=112333121}) = 0
clock_gettime(CLOCK_MONOTONIC, {tv_sec=159761, tv_nsec=112369207}) = 0
--- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=26861, si_uid=0} ---rs ETA 5:21
rt_sigreturn({mask=[]})                 = 824759794536
clock_gettime(CLOCK_MONOTONIC, {tv_sec=159761, tv_nsec=127403710}) = 0
clock_gettime(CLOCK_MONOTONIC, {tv_sec=159761, tv_nsec=127431483}) = 0
--- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=26861, si_uid=0} ---
rt_sigreturn({mask=[]})                 = 824759794536
clock_gettime(CLOCK_MONOTONIC, {tv_sec=159761, tv_nsec=146641822}) = 0
clock_gettime(CLOCK_MONOTONIC, {tv_sec=159761, tv_nsec=146669763}) = 0
clock_gettime(CLOCK_MONOTONIC, {tv_sec=159761, tv_nsec=150385316}) = 0errors ETA 5:21
clock_gettime(CLOCK_MONOTONIC, {tv_sec=159761, tv_nsec=150414123}) = 0
futex(0x15a3ce8, FUTEX_WAIT_PRIVATE, 0, NULL) = 0
clock_gettime(CLOCK_MONOTONIC, {tv_sec=159761, tv_nsec=159320107}) = 0
nanosleep({tv_sec=0, tv_nsec=3000}, NULL) = 0
--- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=26861, si_uid=0} ---
rt_sigreturn({mask=[]})                 = 824637303656
clock_gettime(CLOCK_MONOTONIC, {tv_sec=159761, tv_nsec=174982104}) = 0errors ETA 5:21
futex(0x15a3ce8, FUTEX_WAIT_PRIVATE, 0, NULL) = ?
+++ killed by SIGKILL +++015
Killed

But who sends this SIGKILL? Can be only the OpenVZ host, no?

I will also ask the hoster if they updated something or implemented some limits.

In fact it turned out to be some OpenVZ resource limits. The hoster must have introduced them in the last weeks.
I discussed it with the hoster and now it is working again.

1 Like