Server unresponsive during restic backups

After updating to v3, I’m pruning:

restic prune --repack-uncompressed

This is using a lot of CPU:

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                      
 286157 root      20   0 2102964   1.3g  17740 R 176.2   5.7 418:07.77 restic                                       

but the server is perfectly responsive. Web requests are served immediately and I have not received any warnings from UptimeRobot.

Given this plus the strace slowness: sounds like we are I/O -bound and not CPU-bound. Could this be a kernel or btrfs bug?

Maybe, that is pretty hard to tell with the available information. It might also partially be a result of Core 2 Duos being really old by now. But in either case the system shouldn’t slow down that much… Does dmesg show anything particularly suspicious? Besides that I currently don’t have an idea how to investigate this further.

What might be worth a try is to use nice and ionice to reduce the priority of restic, such that other tasks get a higher priority. Or you could try whether restic backup --read-concurrency 1 (available since v0.15.0) helps.

I updated to 0.15 yesterday.

Currently, dmesg only shows a bunch of PHP segfaults. Do you want me to run it again during the backup?

What flags should I use with nice and ionice? Do you want me to try both, or just ionice? (Please supply the precise syntax lest I misuse it like I did with --verbose.)

How should I prioritize trying different options? There are seven different possibilities:

  • nice
  • ionice
  • read-concurrency
  • nice / ionice
  • nice / read-concurrency
  • ionice / read-concurrency
  • nice / ionice / read-concurrency

Well, the kernel log also keeps older log entries. Thus, that sounds like nothing was printed in the kernel log.

Let’s just try everything at once: GOMAXPROCS=1 nice ionice restic backup --read-concurrency 1 .... If that help then we can still try to figure out which of those options makes the difference. I’ve added GOMAXPROCS=1 which limits restic to only use at most one CPU core. That might also help.

1 Like

Do I want ionice -c2 nice -n19 per the docs? Or just the default behaviour?

The default should already reduce the priority. But you can of course reduce it further.

My log shows this:
[Sun, 15 Jan 2023 06:47:01 -0500] Starting backup
[Sun, 15 Jan 2023 08:08:30 -0500] Ending backup
[Sun, 15 Jan 2023 08:08:32 -0500] Starting prune
[Sun, 15 Jan 2023 08:14:30 -0500] Ending prune

Backup took about 80 minutes this time. Big improvement. Not sure whose “fault” it is (0.15, read concurrency, nice, ionice, max procs), but quite happy with the results.

2 Likes

Nice to hear that the problem is mostly solved :slight_smile: .That backup duration looks much more reasonable. When considering the used hardware, then it probably won’t get that much faster. (20-30 mins might be possible, but that’s a very rough estimate.)