Is it possible to compress only specific file types?

kdebisschop · January 29, 2024, 2:48pm

I have a partition that contains a mix of text files, which compress well, and PDF files, which are essentially already compressed. Would it be feasible to compress only certain file types? Or exclude certain file types from being compression again?

Of course, we know that file extension is not a 100% reliable way to determine if a file is already compressed or not, but it would be good enough in this case, I think.

kapitainsky · January 29, 2024, 3:28pm

It is not possible to use different (or none) encryption level based on files’ types. All snapshot uses the same encryption.
Compression can differ for different snapshots - so in theory you could have one set of snapshots for your text files and another for pdf
In practical terms I am not sure what you would gain by trying to implement point 2 beyond making things more complicated. Compression algorithm used (zstd) compresses files much faster than typical network connections. On modern computer you still can saturate 10G LAN or USB connected SSD drive. Decompression speeds are even faster (we are talking about GiB per second)

kdebisschop · January 29, 2024, 3:53pm

Thanks.

The benefit of not trying to recompress files that are already compressed would be in CPU utilization on the host running the backup. If the file is already well compressed, those clock cycles are not helping much.

Yes, the added complexity of two parallel snapshot configurations is a significant downside. I was hoping it was conceptually simple enough to think about as a feature request. I’m not quite sure from the reply if the answer is addressing the feasibility in terms of architecture or the existence of configuration options. I should have been clearer in my question, as I already knew there was not an option for such a configuration.

kapitainsky · January 29, 2024, 4:20pm

What would maybe make sense is to implement early abort when data is not compressible - similar to one existing in ZFS.
But it would be quite serious undertaking and IMO brings very limited gains to backup software like restic. Especially given multiple much more important features waiting for implementation.

But let’s see what others think.

sc2maha · January 29, 2024, 4:20pm

I’ve always thought using zstd frees you from having to think about this (at least at the default compression level), because zstd checks the file type before deciding compression level etc.

Is that not true?

kapitainsky · January 29, 2024, 4:55pm

zstd is compression algorithm - it compress/decompress whatever you throw at it:)

I would be VERY surprised if restic zstd implementation (which probably is some 3rd party lib) does it. It would be VERY bad if it does based on file type.

sc2maha · January 30, 2024, 4:59am

Looks like I misremembered as well as confused a couple of things. First, borg does this, but it is borg doing it, not “using a zstd feature”. Having mis-remembered that, I then extrapolated it to any tool using zstd !

George · May 14, 2025, 12:08pm

Kopia supports specifying which extensions to (never) compress.

kapitainsky · May 14, 2025, 1:01pm

I meant early abort and not extension based compression.

restic does not until somebody will feel that it is so important that it is worth of implementing. Good that it is open source.