Data duplication


#1

I got to know about this from lwn.net and Froscon. Thanks for a nice software. I want to know about minimizing space. Assume I have
directory1/file_AAA.iso
directory2/file_AAA.iso
directory2/file_BBB.iso

condition 1: If all three above files are identical:
If I run a simple ‘backup’ - will it cost me 3 X disk space of each file (+some json + other small + overhead)

condition 2: If file_AAA.iso is identical in SHA1SUM to file_BBB.iso will it minimize the size to one ISO size?

Thanks


#2

Hi, and welcome to the forum!

If all of the three files are identical, the space needed for the repo is roughly the size of one file. I’m not sure what the difference in your condition 1 and 2 is. Please note that a file doesn’t need to be identical to have the same sha1sum (see https://shattered.io/ for background information). restic also does not use sha1 anywhere.

The background on how restic operates on data is explained in a blog entry here: https://restic.github.io/blog/2015-09-12/restic-foundation1-cdc