High load on Raspberry Pi

Hello,
I’m trying to use restic to backup a large directory from a Raspberry Pi.
I’m launching it with:

ionice -c2 -n7 nice -n 19 restic backup /directory --exclude-file=restic.excludes -v

with the following alias:

alias restic="restic -o rclone.args='serve restic --stdio --b2-hard-delete --drive-use-trash=false --tpslimit 3 --tpslimit-burst 6 --fast-list'"

However the process is making the Pi unresponsive and I’m forced to unplug it to re establish an ssh connection.
The directory I’m trying to back up is a few terabytes (5.6T according to du), the backend is Google Drive (via rclone), and this is the second device I’m adding, the first one is a much beefier server however.
Why is this happening and are there any options to fix it?

Thank you

Hi devster,
Are you running out of RAM, maybe? Run top and see how it looks. For a ~700Gb repository (including many small files), memory was getting dangerously close to full on my rpi3 with 1Gb ram. I switched to an rpi4 with 4Gb and all is smooth again :wink:

FYI, the next limit I hit was the restic cache filling my SD card (although that produced an error, rather than hanging the machine). To solve that, I first redirected the cache to another drive, and then decided to just upgrade the SD card from 16 to 64Gb.

Also you might want to check the logs for file system problems. SD cards do funny things at times, especially if they have served some time in your Pi. Here at home I am backing up about 1.2 TB using a 4 GB Pi using a 32 GB quality sd card and have not had a problem yet.

That could very well be the case, unfortunately it becomes unresponsive and it’s fairly difficult to check RAM usage. Is there a way to lower the amount of data restic reads into memory or cache?

I didn’t think about disk cache, I’ll try to redirect it to an external drive.

I think it would be useful to know if it was the ram. How about using dstat? For example, to write the status to disk every 60 seconds, use:
dstat --vmstat 60 > dstat.out
To install dstat if you don’t already have it:
sudo apt install dstat

I disabled swap with sudo systemctl stop dphys-swapfile, moved RESTIC_CACHE_DIR to an external drive (not SD card), and added Nice=-20 to sshd.service to avoid freezes (it didn’t help, tmux still froze).
This is an incomplete log, as the Pi froze and the restart corrupted the dstat.out file.

$ dstat --output dstat.out --nocolor --vmstat 30
---procs--- ------memory-usage----- ---paging-- -dsk/total- ---system-- --total-cpu-usage--
run blk new| used  free  buff  cach|  in   out | read  writ| int   csw |usr sys idl wai stl
  0   0 1.7| 209M  227M  117M  338M|   0     0 | 917k   88k| 771   778 |  2   1  94   3   0
6.0 3.0 0.9| 251M  149M  117M  374M|   0     0 |2451k 3004B|1023  1410 |  4   1  93   2   0
6.0   0 0.1| 259M  133M  121M  377M|   0     0 |9557B 5461B|1580  2225 |  3   1  95   0   0
 30 1.0 0.8| 280M 45.5M  154M  408M|   0     0 | 220k 1242k|5399  8264 | 17   4  77   1   0
 46   0 0.1| 347M 24.5M  164M  360M|   0     0 |  65k 1502k|6186  8992 | 21   4  73   1   0
 92 2.0 0.1| 441M 24.5M  145M  292M|   0     0 |  19k 2491k|  11k   15k| 43   8  49   1   0
 81 3.0 0.1| 571M 24.6M 91.2M  216M|   0     0 | 277k 3110k|  10k   14k| 42   7  50   1   0
109 3.0 0.1| 655M 24.6M 71.0M  152M|   0     0 | 156k 2859k|  10k   14k| 42   8  49   1   0
 81 4.0 0.1| 753M 24.0M 43.5M 82.6M|   0     0 | 134k 3244k|  10k   14k| 44   7  48   1   0
 75  25  13| 826M 28.7M 4392k 56.5M|   0     0 |5902M 5755k| 431k  828k| 11  28   8  53   0 missed 23
110  39 8.9| 799M 24.1M 24.7M 67.2M|   0     0 |2953M 3726k| 220k  420k| 11  28   8  53   0
 50  62 3.7| 826M 29.8M 2552k 57.0M|   0     0 |6927M 2659k| 401k  777k| 17  34   3  47   0 missed 37
 94  73 8.6| 836M 27.8M 8284k 43.5M|   0     0 |4851M 2469k| 283k  547k| 17  34   3  47   0
 17  11 0.2| 864M 27.8M  336k 23.0M|   0     0 |  22M 1612k|7020    11k| 26   7  23  43   0 missed 4
 23  21 0.1| 867M 25.9M  388k 21.9M|   0     0 |  27M 1325k|6823    11k| 22   7  22  48   0 missed 4
 24  55 0.1| 866M 22.9M  268k 25.8M|   0     0 |  28M 1006k|5705  9256 | 19   7  24  51   0 missed 2
 24  56 0.1| 866M 20.9M  268k 28.4M|   0     0 |  28M  965k|5558  9034 | 18   7  24  51   0

It seems memory isn’t the issue (not the only issue at least).
I had left a process running overnight without swap and with --no-cache option, but that was killed because of memory (dmesg had the out of memory message).

Hi @devster, Welcome to the forums!

My experience has been that the Raspberry Pi, and other single board computers running off a MicroSD card, will essentially lock up when out of RAM and starting to use swapspace on the MicroSD. This has happened to me several times, and I don’t know if it’s a Linux issue, MicroSD issue, or Raspberry Pi/SBC issue. Either way though, I think it’s likely that it closing gracefully now and you disabling the swap are not unrelated. If you want to test, you could turn the swap back on and run the test again. I would definitely avoid using SWAP on a MicroSD Card, both for the card’s longevity and because in my experience it’s done more harm than good.

A few possible workarounds might help though:

  1. If this is a headless server, I would recommend lowering the VRAM allocation on the Raspberry Pi as low as is tolerable. I believe this is somewhere in the raspi-config options: sudo raspi-config
  2. You could try enabling ZRAM on the Pi. This essentially allocates a portion of the RAM to be used as virtual SWAP that is compressed (Yes, you really can “Download more RAM!”. At least on Linux, but with a few potential drawbacks), allowing for some applications to function better on low-RAM devices like the RPi (I don’t know how it will work with Restic, but it might be worth a try). I believe the script here should still work (Even though somewhat dated) https://github.com/novaspirit/rpi_zram
  3. Another potential option is changing with Single Board Computer you use. A Raspberry Pi 4 with 4GB of RAM would likely be much better suited to this application, but even then, I’m not sure if that’s enough RAM. I also personally like Digital Logger’s Atomic Pi’s which Intel x86 Atom Single Board Computers that can be found in some countries for about $35. They have 2GB of RAM for around the price of a 1GB model of the Raspberry Pi. ( For Amazon US: https://www.amazon.com/dp/B07DVYDDV8/ref=cm_sw_r_tw_dp_U_x_Z9dxEbDR8H0HX ). They’re certainly less polished and more work than Raspberry Pi’s, but I find them very compelling for the price, and I like that I can run standard Linux OS’s because of the x86 CPU.

Hopefully this helps you debug. I hope that one of the workarounds is helpful.
Good luck,
jedi453

3 Likes

Thanks for the suggestions @jedi453.
As I mentioned I’m now running with no swap enabled (so no swapping but still using cache), and the process still freezes the Pi. It doesn’t close gracefully for now, nor gets shut down quickly as there seem to be at least 20M of RAM still available for most of the run (according to dstat).
In the meantime the system is effectively unusable however.

  1. VRAM is already at the lowest possible value
  2. I’ll try ZRAM, that could be a good option, CPU seems to be idle most of the time anyway
  3. Changing boards isn’t something I was planning on doing. A RPI 4 would probably be my choice however, as I use almost exclusively Ubuntu and they have fairly good support for that

Latest run I tried also export GOGC=20 but this also froze the Raspberry.

I guess I have to keep using duply or similar for now.

It might also be worth reducing the CPU affinity with taskset. Are you seeing any errors in dmesg? If there is acute resource shortages then they may be reported there.

How would I change affinity? I’m only finding commands to limit it to a specific core but would that help?
Also, I’m not seeing high CPU usage, most of it is idle or wait, I would guess for I/O activity on the cache disk.

There are no errors in dmesg besides the out of memory ones unfortunately.

Ah, if it’s definitely a RAM problem then CPU affinity won’t help. man taskset gives instructions on how to change this, for future reference.

At this point I’d be tempted to open an issue on GitHub, maybe it’s something that can be improved.

Sorry if this is an uninformed suggestion, but couldn’t it be a shitty network interface that’s locking things up? I don’t know how good the RPi ones are, but I would presume it’s not a fancy Intel NIC.

FWIW, I too have seen slow and lockup on RPis. We have one at a club I’m in, all it does is run Raspbian with Firefox (also tried the default browser, no relevant difference), and just surfing a couple of pages with regular JavaScript and some SVG makes it all come to a crawl at best, hang at worst (after a while of crawling). Completely unbearable.

Could very well be. The Pi should have an SMSC LAN9514 as LAN chip. The Model 3, which I am using, uses the USB 2.0 BUS between it and the SoC. This could be one of the causes (high network usage with high usage of USB for external drive cache).
I’ll try to limit restic and rclone bandwidth. Which other tests could i run to check?
However, running with --no-cache didn’t seem to work (Pi ran out of memory).
Is there a way to force restic to first create the local cache and then, in a separate step, run the backup? Maybe this could solve.

+1 for ZRAM, then re-run dstat and see if it goes higher.

1 Like

The --no-cache option only disables the on-disk cache used by restic. The memory usage should not differ by much more than 100MB. As you backup to Google Drive, which is a high-latency backend, you will want to use the cache for reasonable performance.

How large is the index folder of your backup repository? How many files does the directory you are trying to backup have (you could use e.g. find backup-dir -type f | wc-l)?

Having a lot of small files (i.e. less than 0.5 MB) leads to a larger index than a folder with larger files. Assuming the directory contains large files I get the following estimations for minimum memory usage:
5,6TB / 0.5MB/Chunk * 190B/Chunk = 2.1GB
5,6TB / 8MB/Chunk * 190B/Chunk = 133MB

0.5 MB is the minimum chunk size, if the file is long enough. 8MB is the maximum chunk size. Restic tries to create chunks with 1MB size on average. Currently the index requires something around 190 bytes per chunk (just a rough estimate, the absolute required minimum is about 130bytes, but the 190bytes are closer to the usual memory usage). So you will end up with at least 1.05GB memory usage for just the in-memory index, not accounting for any go garbage collection overhead. You can add a few additional hundred MB for reading file chunks and cache management data, which makes it a close call on a 32-bit system.

ZRAM might help a bit, a compression factor of 2 or 3 could be possible. Without it you will most certainly run out of memory.

The memory usage optimizations that are worked on in https://github.com/restic/restic/issues/2523 are probably enough to make restic work for you, however, these might take some time before they are ready for prime time.

1 Like

@MichaelEischer thanks for the suggestions and the insight.
Most of them are big files, I get 38079 for a total size of 6.067 TB.
I’ll try with ZRAM and will follow the issue you linked for news.

Hi @devster,

I just wanted to point out that since you’re using Ubuntu, you might just be able to install the package zram-config to enable zram. I’m not sure if it’s available in the arm repositories, but I’d imagine it is. It’s probably a better way to do it that way than script I listed if it’s available.

Good luck,
jedi453

Thanks, I ended up using zram-tools on Raspbian and your suggestion on Ubuntu (multiple RPis with different OS).

1 Like