I run a restic daily backup for a big repository (around 1,2TB) for around a year. First it was running on a VM, recently moved do Kubernetes.
In both cases, RAM usage seems to be very huge while loading the index.
It can take up whole a 30GB RAM of a VM, and gets OOM (return code 137) inside container limited to 12GB of RAM.
In one of threads I read restic memory consumption should be somehow close to the index size which in this case is around 6-7GB (still much less than what restic actually consumes with respect to RAM). I tried to use GOGC=10, but it didn’t really help.
Is there anything I can do to limit the memory usage?
You have neither specified which restic version you’re using nor which commands are run when the problem occurs. Without that information it’s not possible to give any proper advice.
To get an idea of the expect memory requirements for the repository please run restic stats --mode debug.
A 6-7GB index is excessively large for a 1.2TB repository. That roughly means that there are 150 million blobs in the repository (assuming repo format version 2). Has the repository every been pruned?
The rule of thumb is that 1GB RAM is necessary for every 7 millions blobs. So no matter which options you try to specify, at that repository size the memory usage cannot go much below 22GB. (That is using the latest restic version, older version can use much more memory).
After release of 16.X, daily backups were upgraded to use it
Recently I upgraded to 17.3 recently
Described issues were visible in the recent version as well.
The command concerned is restic backup --no-scan
I couldn’t do the prune, as it was saturating my storage. Instead, I used restic forget to decrease number of historic snapshots from 390 to 90. This decreased size of the index to around 1.3GB. The memory usage for backup decreased to ~3.5GB.
Regarding restic stats please see result *after the cleanup below. The biggest difference is visible in the last section (tree): before cleanup it was around 150 milion (110GB).
I think at this point I would have just 2 questions:
Does the repo after cleanup look healthy to you? Or you would still investigate?
Any chance restic could support distribution in the future? (so that it could run on several smaller nodes, instead of a single one)?
That looks much more reasonable. Although, the number of tree blobs is still rather high compared to the number of data blobs in my opinion. Are there by any chance filesystem snapshots involved in the backup creation process?