However the process is making the Pi unresponsive and I’m forced to unplug it to re establish an ssh connection.
The directory I’m trying to back up is a few terabytes (5.6T according to du), the backend is Google Drive (via rclone), and this is the second device I’m adding, the first one is a much beefier server however.
Why is this happening and are there any options to fix it?
Hi devster,
Are you running out of RAM, maybe? Run top and see how it looks. For a ~700Gb repository (including many small files), memory was getting dangerously close to full on my rpi3 with 1Gb ram. I switched to an rpi4 with 4Gb and all is smooth again
FYI, the next limit I hit was the restic cache filling my SD card (although that produced an error, rather than hanging the machine). To solve that, I first redirected the cache to another drive, and then decided to just upgrade the SD card from 16 to 64Gb.
Also you might want to check the logs for file system problems. SD cards do funny things at times, especially if they have served some time in your Pi. Here at home I am backing up about 1.2 TB using a 4 GB Pi using a 32 GB quality sd card and have not had a problem yet.
That could very well be the case, unfortunately it becomes unresponsive and it’s fairly difficult to check RAM usage. Is there a way to lower the amount of data restic reads into memory or cache?
I didn’t think about disk cache, I’ll try to redirect it to an external drive.
I think it would be useful to know if it was the ram. How about using dstat? For example, to write the status to disk every 60 seconds, use: dstat --vmstat 60 > dstat.out
To install dstat if you don’t already have it: sudo apt install dstat
I disabled swap with sudo systemctl stop dphys-swapfile, moved RESTIC_CACHE_DIR to an external drive (not SD card), and added Nice=-20 to sshd.service to avoid freezes (it didn’t help, tmux still froze).
This is an incomplete log, as the Pi froze and the restart corrupted the dstat.out file.
It seems memory isn’t the issue (not the only issue at least).
I had left a process running overnight without swap and with --no-cache option, but that was killed because of memory (dmesg had the out of memory message).
My experience has been that the Raspberry Pi, and other single board computers running off a MicroSD card, will essentially lock up when out of RAM and starting to use swapspace on the MicroSD. This has happened to me several times, and I don’t know if it’s a Linux issue, MicroSD issue, or Raspberry Pi/SBC issue. Either way though, I think it’s likely that it closing gracefully now and you disabling the swap are not unrelated. If you want to test, you could turn the swap back on and run the test again. I would definitely avoid using SWAP on a MicroSD Card, both for the card’s longevity and because in my experience it’s done more harm than good.
A few possible workarounds might help though:
If this is a headless server, I would recommend lowering the VRAM allocation on the Raspberry Pi as low as is tolerable. I believe this is somewhere in the raspi-config options: sudo raspi-config
You could try enabling ZRAM on the Pi. This essentially allocates a portion of the RAM to be used as virtual SWAP that is compressed (Yes, you really can “Download more RAM!”. At least on Linux, but with a few potential drawbacks), allowing for some applications to function better on low-RAM devices like the RPi (I don’t know how it will work with Restic, but it might be worth a try). I believe the script here should still work (Even though somewhat dated) https://github.com/novaspirit/rpi_zram
Another potential option is changing with Single Board Computer you use. A Raspberry Pi 4 with 4GB of RAM would likely be much better suited to this application, but even then, I’m not sure if that’s enough RAM. I also personally like Digital Logger’s Atomic Pi’s which Intel x86 Atom Single Board Computers that can be found in some countries for about $35. They have 2GB of RAM for around the price of a 1GB model of the Raspberry Pi. ( For Amazon US: https://www.amazon.com/dp/B07DVYDDV8/ref=cm_sw_r_tw_dp_U_x_Z9dxEbDR8H0HX ). They’re certainly less polished and more work than Raspberry Pi’s, but I find them very compelling for the price, and I like that I can run standard Linux OS’s because of the x86 CPU.
Hopefully this helps you debug. I hope that one of the workarounds is helpful.
Good luck,
jedi453
Thanks for the suggestions @jedi453.
As I mentioned I’m now running with no swap enabled (so no swapping but still using cache), and the process still freezes the Pi. It doesn’t close gracefully for now, nor gets shut down quickly as there seem to be at least 20M of RAM still available for most of the run (according to dstat).
In the meantime the system is effectively unusable however.
VRAM is already at the lowest possible value
I’ll try ZRAM, that could be a good option, CPU seems to be idle most of the time anyway
Changing boards isn’t something I was planning on doing. A RPI 4 would probably be my choice however, as I use almost exclusively Ubuntu and they have fairly good support for that
Latest run I tried also export GOGC=20 but this also froze the Raspberry.
I guess I have to keep using duply or similar for now.
It might also be worth reducing the CPU affinity with taskset. Are you seeing any errors in dmesg? If there is acute resource shortages then they may be reported there.
How would I change affinity? I’m only finding commands to limit it to a specific core but would that help?
Also, I’m not seeing high CPU usage, most of it is idle or wait, I would guess for I/O activity on the cache disk.
There are no errors in dmesg besides the out of memory ones unfortunately.
Sorry if this is an uninformed suggestion, but couldn’t it be a shitty network interface that’s locking things up? I don’t know how good the RPi ones are, but I would presume it’s not a fancy Intel NIC.
FWIW, I too have seen slow and lockup on RPis. We have one at a club I’m in, all it does is run Raspbian with Firefox (also tried the default browser, no relevant difference), and just surfing a couple of pages with regular JavaScript and some SVG makes it all come to a crawl at best, hang at worst (after a while of crawling). Completely unbearable.
Could very well be. The Pi should have an SMSC LAN9514 as LAN chip. The Model 3, which I am using, uses the USB 2.0 BUS between it and the SoC. This could be one of the causes (high network usage with high usage of USB for external drive cache).
I’ll try to limit restic and rclone bandwidth. Which other tests could i run to check?
However, running with --no-cache didn’t seem to work (Pi ran out of memory).
Is there a way to force restic to first create the local cache and then, in a separate step, run the backup? Maybe this could solve.
The --no-cache option only disables the on-disk cache used by restic. The memory usage should not differ by much more than 100MB. As you backup to Google Drive, which is a high-latency backend, you will want to use the cache for reasonable performance.
How large is the index folder of your backup repository? How many files does the directory you are trying to backup have (you could use e.g. find backup-dir -type f | wc-l)?
Having a lot of small files (i.e. less than 0.5 MB) leads to a larger index than a folder with larger files. Assuming the directory contains large files I get the following estimations for minimum memory usage:
5,6TB / 0.5MB/Chunk * 190B/Chunk = 2.1GB
5,6TB / 8MB/Chunk * 190B/Chunk = 133MB
0.5 MB is the minimum chunk size, if the file is long enough. 8MB is the maximum chunk size. Restic tries to create chunks with 1MB size on average. Currently the index requires something around 190 bytes per chunk (just a rough estimate, the absolute required minimum is about 130bytes, but the 190bytes are closer to the usual memory usage). So you will end up with at least 1.05GB memory usage for just the in-memory index, not accounting for any go garbage collection overhead. You can add a few additional hundred MB for reading file chunks and cache management data, which makes it a close call on a 32-bit system.
ZRAM might help a bit, a compression factor of 2 or 3 could be possible. Without it you will most certainly run out of memory.
The memory usage optimizations that are worked on in https://github.com/restic/restic/issues/2523 are probably enough to make restic work for you, however, these might take some time before they are ready for prime time.
@MichaelEischer thanks for the suggestions and the insight.
Most of them are big files, I get 38079 for a total size of 6.067 TB.
I’ll try with ZRAM and will follow the issue you linked for news.
I just wanted to point out that since you’re using Ubuntu, you might just be able to install the package zram-config to enable zram. I’m not sure if it’s available in the arm repositories, but I’d imagine it is. It’s probably a better way to do it that way than script I listed if it’s available.