I’m testing out the deduplication of restic.
I generated 128MB of data, in 16 x 8MB files:
for i in {00..15}; do dd if=/dev/urandom of=$i bs=1M count=8; done
Then I combine those files into a single 128MB file:
cat ?? > combined
I did a backup
on the individual files, then on the combined file. The output of the combined run was:
% RESTIC_PASSWORD=X restic --verbose=4 backup --tag=test combined 1m16s | 19-07-30 16:47:59
open repository
repository 86b7fabe opened successfully, password is correct
lock repository
load index files
start scan on [combined]
start backup on [combined]
scan finished in 2.707s: 1 files, 128.000 MiB
new /combined, saved in 26.064s (45.673 MiB added)
Files: 1 new, 0 changed, 0 unmodified
Dirs: 0 new, 0 changed, 0 unmodified
Data Blobs: 22 new
Tree Blobs: 1 new
Added to the repo: 45.679 MiB
processed 1 files, 128.000 MiB in 0:44
snapshot 46fd9426 saved
By my calculations, if 45.679 MiB was added, then 128.000 - 45.679 == 82.327 was reused.
Reusing 82.327 of 128.000MiB is 64.32% reuse.
I re-did the above test with 100 x 1MB files. This time I first added the combined file, then the 100 files. This was the output:
Files: 100 new, 0 changed, 0 unmodified
Dirs: 0 new, 0 changed, 0 unmodified
Data Blobs: 137 new
Tree Blobs: 1 new
Added to the repo: 99.734 MiB
This is only about 0.3% deup efficiency.
Interestingly, only one file got any deduplication:
new /test/99, saved in 0.025s (751.531 KiB added)
This was the last of the 100 files (since the first was named 00
), and coincided with the last 1024KiB of the combined file. All the other 99 files were (1024.000 KiB added)
.
Questions:
- Are these ballpark expected figures?
- Is there any way of getting these numbers closer to 100%?
- In the 100 x 1MB test, why was the last file the only one to receive deduplication?