Backuping mail store: could restic be for me?

olc · July 6, 2018, 5:41pm

I’m looking for a efficient way to backup and archivate ~10TB of data from a mail storage server (lots a very small files). I discovered restic recently and I wondered if it could be suitable for such a use case… What do you think about?

Thanks and best regards,

Olivier

stevesbrain · July 7, 2018, 1:01pm

I use it to back up my (significantly smaller) mail server - works well for me Only thing I’d mention is, with that size, you may benefit from compression (which is not yet available in restic) - in which case, I’d consider borg backup as a potential candidate for you. However, I’d only head that direction if you’re looking to save on money or space - if neither is a concern for you, then restic will (in my experience) work fine for you - although I’ve not tested it on a dataset that large.

olc · July 7, 2018, 4:13pm

Hi @stevesbrain, and thanks for your post!

Terabytes cost is relatively low so that lack of compression is not a big deal. For a mail store, deduplication is probably more important anyway. But what I’m mostly concerned about is the ablity to backup shortly tons of very small files. Best is probably to give it a try though!

Regards,

Olivier

fd0 · July 8, 2018, 9:36am

Please do that, and report back! I’m very interested in how restic performs in this case. On the one hand, it needs to manage a lot small files (=much metadata), on the other hand the contents of the small files is bundled together into larger files, so the number of files in the repo will be way less than the number of files in the source directories.

764287 · July 8, 2018, 12:11pm

I’m using restic on 2 small mail servers (~50GB on SATA and ~100GB on SSD) with great success. As most of the files are static, restic spends the majority of time scanning. This results in backups which complete in 10-120 seconds.

Keep in mind that with a lot of files the cache can grow quite significantly. In my case restic requires 15-20GB for ~500k files and ~400 snapshots.

olc · July 8, 2018, 1:55pm

Is there anything to optimise regarding caching? I have read almost everything in the doc but some things on concurrency and caching are still a bit unclear in my mind.

–
Olivier

olc · July 9, 2018, 6:19am

I started with a web server nas which stores about 1,5TB. Files are rather simolar in sized comparative to my mail stores.

scan finished in 42956.241s: 19322655 files, 1.517 TiB
...
[12:12:25] 15.74%  2389327 files 244.611 GiB, total 19322655 files 1.517 TiB, 7 errors ETA 65:20:01

I must confess that the (virtual) server where restic is launched has only one vcpu and 1GB of RAM. Load was not very high during scan but now CPU usage is almost at the maximum it can (all used but ~15.3% idle, very stable) . Load stays moderate so I suspect restic to pay attention not to overload, right?

Does these timings sound OK?

–
Olivier

fd0 · July 9, 2018, 7:06am

restic doesn’t do that, it’s the kernel’s job (which is much better at it anyway). You can use nice and/or ionice to assign a priority for CPU and IO to the restic process, which tells the kernel to prefer other tasks when there’s a choice to make

You may run into a problem with 1GiB of RAM though, especially with many files. I suspect the index (which is where restic keeps the information which data blob is stored in what file) may grow larger than the available memory.

764287 · July 9, 2018, 7:07am

Nothing special that I’m aware of. But make sure that your cache directory has the CACHEDIR.TAG file in it (if using non-default --cache-dir) and you are using --exclude-caches, so you don’t unnecessarily backup your cache.

stevesbrain · August 7, 2018, 6:14am

In terms of keeping it small, just run a forget + prune semi-regularly, so you don’t have quite a many snapshots, and this should keep it trim

olc · August 14, 2018, 9:14am

Still performing some tests in my spare time, now backuping a 4TB mail store. I have the feeling that restic could parallelize much much more the backup process: both backuped client and minio target are ~95% idle. Is there something I can change regarding that?

Thanks,

Olivier