Mbox or maildir?

Hello,

I have a large(ish) directory of emails stored using the maildir format (approx. 33Gb, with 93000 small files). I might have to convert all that to the mbox format, which would involve a much smaller number of larger files (up to 900Mb per file). As far as backing up this data with restic is concerned, is there a clear preference for one format or another? I am not sure if it’s better to have many small files or a few large files.

Many thanks for any comments you might have on this.

As far as I know, mbox file format is very simple. It’s more or less just one mail after the other in one file. So if you use it, i guess restic’s deduplication should work very fine with it, since it’s chunk based.

I for myself backup daily maildirs with hundreds of thousands files, no problems with that also.

So in regards of restic i think you should not worry which one you use.

From restic’s perspective a few large files are preferable. Both variants work, but for folders with nearly hundred thousand files, restic will require quite a bit of memory (think a few hundred MB extra while processing that folder) to back them up.

You are talking about “folders” with a lot of files so I have to ask: is there a difference between the amount of files per repository and the amount of files per folder?

e.g. case 1: we have 100k files in one directory

e.g. case 2: we have 10k files in each of 10 directories, totaling the same 100k files but not surpassing 10k files per directory?

Are those 2 cases the same in regards to memory consumption in restic during backup/restore/check/prune?

On disk, a filesystem folder in a maildir configuration would have the same amount of files as emails in an IMAP folder afaik, so that’s why I believe the distinction is important (a mailbox may have 100k files, however not in a single folder).

I personally dislike mbox a lot, because if I move an email from one IMAP folder into another one, huge files have to be rewritten (and subsequently rescanned by restic). I doubt that restic performs exceptionally well if on every run, huge files need to be completely rescanned because of some small changes.

Dovecots mbox documentation also recommends against using it.

I like maildir because once a file is written to disk, it doesn’t have to be written again, only renamed/moved between folders on disk. restic would have to rescan small files (the size of the actual email) when an mail was moved between IMAP folders, not huge ones.

There are other choices of mailbox formats, for example the multi-dbox format, which saves multiple messages per file but multiple files per mail folder.

There are two main contributors to the memory usage of restic: the blob index and the directory metadata. The index size will be nearly identical in both cases. The total directory metadata will also be similar, however, in case 1 there is a single large directory metadata blob whereas in case 2 there are 10 smaller ones. Currently, restic builds a single metadata object per folder and has to keep that whole object in memory. For folders with hundreds of thousands of files, that will require a notable amount of memory.

Checking thousands of files isn’t free either. My estimate would be that 100k files take about as long to check (e.g. 10 seconds) as deduplicating at least 1 GB of data (at 100MB/s). Additionally, directory metadata is only deduplicated if the directory content matches exactly. Thus, it’s a bit hard to predict which variant will be the most efficient in the end.

400GB Kerio mailserver 3,35 minutes to backup offsite ±9GB change that day.
From SSD to a HDD nas.
Mailcow mailserver 143GB total 172MB change 55 sec. :slight_smile:
It’s fast!

What’s the configuration? maildir, mbox or something in between? How many files?

Kerio uses Maildir.
987441 folders and files(du --inodes /Data/Kerio/mail/)
The config folder is 75422 files. 2,5 GB
Storage is a mirrored sata ssd.
Backup storage a 4 disk 7200rpm nas.

Backend is restserver in append only mode.(mailserver cannot delete the backup, backup server cannot acces the mailserver)

Just restored a 50GB mailbox.
Restic saved the day.

1 Like

Even without taking restic into account I would definitely prefer Maildir instead of mbox everywhere - always better to lose/corrupt some e-mails than everything at once.

1 Like