Compression vs blob size - worth exploring?

Nev · May 2, 2022, 5:54pm

So I’m eagerly testing the new compression feature as we speak (great work devs!). Does anyone have any thoughts (or even experiences, by now) concerning whether larger blobs might lead to better compression? And would there be any downside? Maybe a deduping hit?

Further to various threads, including this one, I understand the blob parameters are in the chunker routines here. I’m guessing one would only need to increase MaxSize?

In case it’s relevant to blob-fiddling considerations, I already use 128MiB packs, by tweaking here.

Thanks!

MichaelEischer · May 2, 2022, 7:21pm

The pack size doesn’t matter for the compression as blobs are compressed individually. Increasing the MaxSize won’t help much as it will do little to increase the average blob size. The more relevant parameter would be splitmask.

Whether it’s worthwhile to explore larger blob sizes depends a lot on the data. And very likely it would only improve the compression by a few percent.