Server unresponsive during restic backups

After updating to v3, I’m pruning:

restic prune --repack-uncompressed

This is using a lot of CPU:

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                      
 286157 root      20   0 2102964   1.3g  17740 R 176.2   5.7 418:07.77 restic                                       

but the server is perfectly responsive. Web requests are served immediately and I have not received any warnings from UptimeRobot.

Given this plus the strace slowness: sounds like we are I/O -bound and not CPU-bound. Could this be a kernel or btrfs bug?

Maybe, that is pretty hard to tell with the available information. It might also partially be a result of Core 2 Duos being really old by now. But in either case the system shouldn’t slow down that much… Does dmesg show anything particularly suspicious? Besides that I currently don’t have an idea how to investigate this further.

What might be worth a try is to use nice and ionice to reduce the priority of restic, such that other tasks get a higher priority. Or you could try whether restic backup --read-concurrency 1 (available since v0.15.0) helps.

I updated to 0.15 yesterday.

Currently, dmesg only shows a bunch of PHP segfaults. Do you want me to run it again during the backup?

What flags should I use with nice and ionice? Do you want me to try both, or just ionice? (Please supply the precise syntax lest I misuse it like I did with --verbose.)

How should I prioritize trying different options? There are seven different possibilities:

  • nice
  • ionice
  • read-concurrency
  • nice / ionice
  • nice / read-concurrency
  • ionice / read-concurrency
  • nice / ionice / read-concurrency

Well, the kernel log also keeps older log entries. Thus, that sounds like nothing was printed in the kernel log.

Let’s just try everything at once: GOMAXPROCS=1 nice ionice restic backup --read-concurrency 1 .... If that help then we can still try to figure out which of those options makes the difference. I’ve added GOMAXPROCS=1 which limits restic to only use at most one CPU core. That might also help.

1 Like

Do I want ionice -c2 nice -n19 per the docs? Or just the default behaviour?

The default should already reduce the priority. But you can of course reduce it further.

My log shows this:
[Sun, 15 Jan 2023 06:47:01 -0500] Starting backup
[Sun, 15 Jan 2023 08:08:30 -0500] Ending backup
[Sun, 15 Jan 2023 08:08:32 -0500] Starting prune
[Sun, 15 Jan 2023 08:14:30 -0500] Ending prune

Backup took about 80 minutes this time. Big improvement. Not sure whose “fault” it is (0.15, read concurrency, nice, ionice, max procs), but quite happy with the results.


Nice to hear that the problem is mostly solved :slight_smile: .That backup duration looks much more reasonable. When considering the used hardware, then it probably won’t get that much faster. (20-30 mins might be possible, but that’s a very rough estimate.)