I’ve been using a commercial backup application on my Linux servers (VMs actually) for some years now, with great success. It is file-based, dedupes, encrypts etc, and sends the data directly to B2 or S3 or whatever. Sadly, due to insanity at their management level, the annual cost of this software is going up by a factor of 10 and I simply can’t afford it.
I’ve therefore been going through the alternatives, and I narrowed things down to restic, which seems to me to be the bees knees in terms of almost every feature I need and then some. OK, so there’s no central GUI to select files for backup and restore on multiple servers like I currently have, but I can live without that. We almost never need to restore anything anyway.
I do have a concern about using restic on one particular server though, and I’m hoping someone with experience of backing up large amounts of data, consisting of millions of smaller files, could chime in to reassure me (or otherwise!) please?
For context, the servers I back up are multi-function, multi-account hosting servers. They run postfix email, apache web and mysql databases. My normal methodology is to use the built-in backup facility on the hosting control we use to create daily backups (daily incremental and weekly full) of all user account data, then use my file-based backup utility - soon to be restic - to essentially encrypt and copy these backups, along with the content of a few specific directories, to a cloud storage service like S3 or B2.
[A daily disk image snapshot of each system is also done, but that’s for disaster recovery not individual file recovery]
Surprisingly, at least to me, the dedupe on my old backup application, and on restic in testing, works even on the highly compressed backups created by the control panel so it all works very well for me. The magic of chunks, I suppose.
I have one server that’s a problem though. It has nearly 2TB of email on it, spread across many accounts. The control panel’s backup cannot cope with it, taking hours and hours and hours at high load average to backup every day, whether the backup was incremental or full.
The solution I came up with with was to exclude email from the control panel backups, leaving just web and databases and configs, and to use my old backup application to backup the contents of all the email directories as well as the control panel backup files. This worked, with a daily change of 6GB being sent to cloud storage, the entire backup process normally taking less than an hour, and with only moderate server load for a short period.
Naturally, I want to do the same using restic. I can think of no reason why restic would struggle or cause problems if the old backup worked perfectly well doing the same thing. But I’m a cautious person, and don’t like rushing into things without a second opinion. And clearly restic will use a different technical methodology to my existing backup and different chunk sizes by default etc etc.
So, does anyone here backup large quantities of email like this with restic? What is your experience? Are there any “gotchas” I should look out for? Obviously I can just go ahead and test it, and I will. But I would love some feedback first.
The disks on these VMs are all solid-state, local RAID6 - I don’t have IO performance worries under normal circumstances and io waits are normally negligible. There’s plenty of CPU grunt and RAM too, and I can always allocate more if need be.
Thanks,
F.