Server unresponsive during restic backups

paulschreiber · January 12, 2023, 2:24am

After updating to v3, I’m pruning:

restic prune --repack-uncompressed

This is using a lot of CPU:

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                      
 286157 root      20   0 2102964   1.3g  17740 R 176.2   5.7 418:07.77 restic

but the server is perfectly responsive. Web requests are served immediately and I have not received any warnings from UptimeRobot.

Given this plus the strace slowness: sounds like we are I/O -bound and not CPU-bound. Could this be a kernel or btrfs bug?

MichaelEischer · January 13, 2023, 10:31pm

Maybe, that is pretty hard to tell with the available information. It might also partially be a result of Core 2 Duos being really old by now. But in either case the system shouldn’t slow down that much… Does dmesg show anything particularly suspicious? Besides that I currently don’t have an idea how to investigate this further.

What might be worth a try is to use nice and ionice to reduce the priority of restic, such that other tasks get a higher priority. Or you could try whether restic backup --read-concurrency 1 (available since v0.15.0) helps.

paulschreiber · January 14, 2023, 2:25pm

I updated to 0.15 yesterday.

Currently, dmesg only shows a bunch of PHP segfaults. Do you want me to run it again during the backup?

What flags should I use with nice and ionice? Do you want me to try both, or just ionice? (Please supply the precise syntax lest I misuse it like I did with --verbose.)

How should I prioritize trying different options? There are seven different possibilities:

nice
ionice
read-concurrency
nice / ionice
nice / read-concurrency
ionice / read-concurrency
nice / ionice / read-concurrency

MichaelEischer · January 14, 2023, 7:28pm

Well, the kernel log also keeps older log entries. Thus, that sounds like nothing was printed in the kernel log.

Let’s just try everything at once: GOMAXPROCS=1 nice ionice restic backup --read-concurrency 1 .... If that help then we can still try to figure out which of those options makes the difference. I’ve added GOMAXPROCS=1 which limits restic to only use at most one CPU core. That might also help.

paulschreiber · January 14, 2023, 7:52pm

Do I want ionice -c2 nice -n19 per the docs? Or just the default behaviour?

MichaelEischer · January 15, 2023, 10:33am

The default should already reduce the priority. But you can of course reduce it further.

paulschreiber · January 15, 2023, 3:32pm

My log shows this:
[Sun, 15 Jan 2023 06:47:01 -0500] Starting backup
[Sun, 15 Jan 2023 08:08:30 -0500] Ending backup
[Sun, 15 Jan 2023 08:08:32 -0500] Starting prune
[Sun, 15 Jan 2023 08:14:30 -0500] Ending prune

Backup took about 80 minutes this time. Big improvement. Not sure whose “fault” it is (0.15, read concurrency, nice, ionice, max procs), but quite happy with the results.

MichaelEischer · January 15, 2023, 7:46pm

Nice to hear that the problem is mostly solved .That backup duration looks much more reasonable. When considering the used hardware, then it probably won’t get that much faster. (20-30 mins might be possible, but that’s a very rough estimate.)