As someone with more than a passing familiarity with restic (but not exactly intimate knowledge of the source code, either) I can say that I see no reason why using the same or different hostnames would have any effect on the performance of parallel backups. I’m curious of the reasoning that led to this conclusion, because I can’t even fabricate a plausible-yet-wrong train of thought that would lead there.
The hostname is used in exactly two places (AFAIK) when performing a backup: the value is used in selection of the parent snapshot (only if
--parent is not given), and the value is recorded in the new snapshot object. I cannot see how either of these operations would be slower because a concurrent restic process is using the same hostname.
When the jobs were taking much longer and failing, how did the consumption of RAM by restic processes differ to the values you see now, using multiple repositories? I ask because, assuming an identical amount of raw, non-deduplicated data, a single repository will have a bigger index than each index (separately) in multiple repositories. Restic memory use, in my experience, scales nearly linearly with the size of the repository index. So using multiple repositories should lower the memory consumption of each restic process.
A larger index, and therefore higher memory consumption, could cause any of the following, in order:
- Longer restic startup times since more index data has to be read into memory.
- Higher/any swap use, which would dramatically slow down restic (and probably other stuff on the same system).
- Thrashing / swap death as processes on the box compete for their memory to be paged in.
- The OOM killer remedying the situation by killing restic, which would likely be the biggest consumer of memory on the system.
2 and 3 could easily explain a 30-fold increase in backup times, and failed backups would be explained by 4.
If these multiple restic processes could have been running on the same physical machine (depending on the configuration of your cluster) then these problems would be amplified on that machine.
tl;dr: The hostname being the same is, to the best of my knowledge, a red herring. The use of multiple repositories is likely what solved the problem because this means a smaller index per repository, which lowers the memory consumption of restic.