I didn’t see Restic 16.5 (default settings) do any compression or deduplication when a new empty repository receives a backup for the first time. Tested this a couple of times and got the same result. Is this normal and does this mean if you gave it a larger amount of data to backup it would produce the same result or do files have to be of certain size before any saving occurs?
Test 1 : Source data: 166 GiB, 34,000 files, each file 5 MiB of random ASCI chars. Compressed by pigz gave 16.9% saving, restic gave 0% savings on the same data.
Test 2 : Source data:190 GiB, 78,000 files with sizes ranging from 1 KiB to 5 MiB filled with random ASCI chars. Compressed by pigz to gave 16.3% savings, restic gave 0.01% savings on the same data.
Normally with inline data deduplication/compression I would have expected some savings when the source data is compressible. Some deduplication/compression tech advertises when deduplication starts, I couldn’t find it any in of the restic doco.
My guess is there maybe a point when a file is too small to be considered for deduplication - I’ve seen that in some deduplication apps but no mention of it the restic doco.
I did see some very efficient removal of identical files but that was not in an empty repository.
Agree, if input is completely random that is true and that is why deduplication doesn’t like source data that is encrypted, compressed* or generated by /dev/random or /dev/urandom. Restricting randomness to the range of 95 printable ASCI characters should have reduced randomness. I’ll test again with files filled with random data based on restricted sets of 60, 26 chars. Did quick with 10chars (0-9) got about 50% savings.
My guess compression in restic is like other deduplication apps, after the input has been chopped into chunks for deduplication each unique chunk is individually compressed before its stored in the repository. Interesting that pigz found something to compress, not restic.
*The only compressed data that I’m aware of that is deduplication friendly is data compressed by gzip or pigz using the --rsyncable option.