Backing up busy email server

I’ve been using a commercial backup application on my Linux servers (VMs actually) for some years now, with great success. It is file-based, dedupes, encrypts etc, and sends the data directly to B2 or S3 or whatever. Sadly, due to insanity at their management level, the annual cost of this software is going up by a factor of 10 and I simply can’t afford it.

I’ve therefore been going through the alternatives, and I narrowed things down to restic, which seems to me to be the bees knees in terms of almost every feature I need and then some. OK, so there’s no central GUI to select files for backup and restore on multiple servers like I currently have, but I can live without that. We almost never need to restore anything anyway.

I do have a concern about using restic on one particular server though, and I’m hoping someone with experience of backing up large amounts of data, consisting of millions of smaller files, could chime in to reassure me (or otherwise!) please?

For context, the servers I back up are multi-function, multi-account hosting servers. They run postfix email, apache web and mysql databases. My normal methodology is to use the built-in backup facility on the hosting control we use to create daily backups (daily incremental and weekly full) of all user account data, then use my file-based backup utility - soon to be restic - to essentially encrypt and copy these backups, along with the content of a few specific directories, to a cloud storage service like S3 or B2.

[A daily disk image snapshot of each system is also done, but that’s for disaster recovery not individual file recovery]

Surprisingly, at least to me, the dedupe on my old backup application, and on restic in testing, works even on the highly compressed backups created by the control panel so it all works very well for me. The magic of chunks, I suppose.

I have one server that’s a problem though. It has nearly 2TB of email on it, spread across many accounts. The control panel’s backup cannot cope with it, taking hours and hours and hours at high load average to backup every day, whether the backup was incremental or full.

The solution I came up with with was to exclude email from the control panel backups, leaving just web and databases and configs, and to use my old backup application to backup the contents of all the email directories as well as the control panel backup files. This worked, with a daily change of 6GB being sent to cloud storage, the entire backup process normally taking less than an hour, and with only moderate server load for a short period.

Naturally, I want to do the same using restic. I can think of no reason why restic would struggle or cause problems if the old backup worked perfectly well doing the same thing. But I’m a cautious person, and don’t like rushing into things without a second opinion. And clearly restic will use a different technical methodology to my existing backup and different chunk sizes by default etc etc.

So, does anyone here backup large quantities of email like this with restic? What is your experience? Are there any “gotchas” I should look out for? Obviously I can just go ahead and test it, and I will. But I would love some feedback first.

The disks on these VMs are all solid-state, local RAID6 - I don’t have IO performance worries under normal circumstances and io waits are normally negligible. There’s plenty of CPU grunt and RAM too, and I can always allocate more if need be.

Thanks,

F.

I don’t have any experience with the challenges you write about but there are quite a few threads in this forum about backing up large datasets, like this one. Especially CERN are worth mentioning as they do huge backups using restic, are discussed in multiple threads and have quite a few CERN internal presentations about the issue available online.

From what I have read so far, most trouble with huge backups comes from having too little RAM available which, as you write, should not be a problem in your case.

One potentially relevant question is how those emails are stored. restic currently requires a significant amount of memory for folder that contain 100k+ files (subfolders are not relevant).

With a lot of changing data you should probably create a file system snapshot and create the backup from that snapshot to avoid inconsistencies.

Thanks for the replies.

If subfolders aren’t relevant then there will easily be more than 100K files in total being backed up under the include path for the email directory :frowning:

@MichaelEischer when you say significant amount of memory, what sort of amounts are we talking about? The VM has 24Gb allocated to it, and for whatever it might be worth (not a lot?) “free” currently says 4.4Gb used, 1.5Gb shared (the rest of buffer/cache and there’s even some actually free).

So, in principal there’s some headroom available, but is it enough?

I do not think we can do filesystem snapshots within this system. It is ext4 and LVM isn’t used, and I can’t do anything with the snapshots made by the host node itself.

Is doing individual restic backups of each domain’s individual email folder a practical option? Iterating through them all is possibly possible, but there are 300+ and I’m thinking it could result in a very big and confusing mess.

Can you just try a backup and see if it works within a reasonable timeframe? The first run will take long and only show if it works at all with the 20 gigs of free RAM. The second backup a day later will show how long a “normal” backup will take.

Idea for consistency without snapshot: maybe try to make two backups right after each other. The first one grabs all new mails and will take longer. The second one just adds the mails that arrived during the first run. It’s not perfect but maybe perfect enough.

That sounds like you understood the exact opposite of what I wanted to say: the memory usage only depends to a limited amount on the total number of files. That is 1 million files spread across a few hundred folders are completely unproblematic. But 1 million files in a single folder will lead to a significant amount of memory usage in the range of 1-2GB RAM.

Other than that the memory requirements for recent restic versions can roughly be estimated as follows: 1GB RAM per 7 million unique files + 1GB RAM per 7TB data. Depending on your data set this can considerably overestimate the amount of necessary. prune will probably require about double the memory (that is until restic 0.17.0 is released).

2 Likes

One data point to add from experience: When backing up 1.7 million PDF files (50 GB in total, 30 KB on average), restic would like to consume up to 10 GB of RAM. Since all those files live in a single directory, I’m debating whether some kind of sharding (i.e. sub-directories with a maximum amount of files) would be useful/sensible. From what I can gather here, this seems to be the case.

Yes, almost. Restic, Spotify and others does not implement a limit of the file count per folder, but instead simply place them in 256 folders named “00” to “FF”. And they place the files in those folders randomly.

We use Restic to backup mailservers. (malcow and kerio)
It’s perfect for the job!
I find it very fast and light for the servers.
We backup every 15 minutes to the offsite Restic server.
This takes 30 sec for ± 500GB mails. (300MB change)
Sql is a little bit harder because you need to dump the database to a folder before the backup.

This is planned to be fixed via memory usage when backing up directories containing many empty files · Issue #2446 · restic/restic · GitHub which will start chunking directory metadata in the same was as files are chunked right now.

2 Likes

:slight_smile: Thanks Michael and sorry for my slow response.

I’m looking at /var/qmail/mailnames/[300 to 600 domain-name subfolders]/Maildir/[some other subfolders]/100 to 1000+ messages

From what you are saying, that should be OK. But if it was /var/qmail/mailnames/1,000,000+ messages then it would potentially be a problem. Do I have it right this time?

(and thank you again for your reply and sorry for my slow response. I’m struggling with an issue that prevents me from being able to follow up on things as quickly as I should at the moment)

I was debating backing up more frequently than once a day, like once an hour, and here you are doing it every 15 minutes! Does restic therefore deal with things in some specific friendly way when it comes across a file that is being written to by a different process at the point that it wants to read it to back it up? (e.g. a new message being written to disk, or some existing stats or config file that’s being modified on disk).

And yes, clearly I don’t really know how filesystems work if I’m asking that question, but is it a filesystem or is it the application that has to figure out how to deal with those situations? I’m outside my knowledge zone here.

Sometimes files move while making the backup.
By making more backups they are faster and more reliable.
And when we miss a file because a move/change we will catch it 15 minutes later.
Backing up the mail is very solid.
No strange things on the mailserver.

No worries, the forum posts don’t mind sitting there for a few days :slight_smile: .

Problematic as in requiring potentially a lot of memory. There is one hard limit though, the metadata for a directory must not be larger than 4GB. If you hit this case, then restic 0.16.4+ will throw an error during backup. However, that should only happen at several million files at which point the filesystem performance will also be degraded.