Hi guys!
We are testing restic for backup raw disks of virtual machine. Things are going well!)
We split backups of many projects by many different repos and now I try to analyze efficient of compression (with default level).
Usual I can see efficient of compression between 15-45% and it’s good and i think it’s correct.
But we have one repo with this stats:
restic stats -r /backups/8806f10d-ada0-44d1-a87a-3e9502032b67
enter password for repository:
repository a5d6c1a1 opened (version 2, compression level auto)
scanning...
Stats in restore-size mode:
Snapshots processed: 36
Total File Count: 36
Total Size: 7.891 TiB
restic stats -r /backups/8806f10d-ada0-44d1-a87a-3e9502032b67 --mode raw-data
enter password for repository:
repository a5d6c1a1 opened (version 2, compression level auto)
created new cache in /root/.cache/restic
scanning...
Stats in raw-data mode:
Snapshots processed: 36
Total Blob Count: 5003453
Total Uncompressed Size: 3.119 TiB
Total Size: 531.810 GiB
Compression Progress: 100.00%
Compression Ratio: 6.01x
Compression Space Saving: 83.35%
6x ratio!
And my question is: It’s possible that restic do CDC in that way when one uniq blob has some amount of zeros which will be dropped by compression and rise compression ratio so high?
Wherry stupid example:
bytes flow:
abcde0000abcde00000
first iteration:
blob 1: abcde00
blob 2: 00abcde
second iteration:
blob 1: abcde0000
blob 2: abcde00