Evaluate deduplication ratio

I’m in love with Restic but I have a question for you…

On my system, I have two backups: one for my account and another for my wife. I have noticed that we share many pictures, so it could make sense to “join”.

The question: how can evaluate the saving due to the deduplication of the common pictures?

My idea was:

a) Create a new empty repository
b) Exec backup for “husband”: restic --compression off backup /homedir/husband
c) …
d) Exec backup for “wife”: restic --compression off backup /homedir/wife
e) …

I imagine that in c) and e) I have to gather some stats about the repository… but what?

Thank you!

@davide-italy , why would you run with --compression off? I would use the default restic settings with auto compression as that would be your real-life scenario. Besides, restic will report both uncompressed and compressed stats. Find here proposed steps with some choices and assuming linux as you mention /homedir/ :

c) evaluate husband data size and repo size.
data size:

  1. note the restic backup command stats
    or
  2. sudo du -hs /homedir/husband
    or
  3. restic stats latest -r /path/to/your/repository

husband repository size:

  1. restic stats --mode raw-data -r /path/to/your/repository

You can then do similar steps for wife in step e). The difference between the size of the wife data and added repository data at 2nd backup is the saving.

Example:

>        /homedir/husband = 100 GB
>      repo after husband =  50 GB
>           /homedir/wife =  80 GB
> repo after husband+wife =  70 GB

Saving from shared data are 80 - ( 70 - 50 ) = 60 GB

References
restic backup stats example Backing up — restic 0.16.4 documentation
restic stats reference and example Manual — restic 0.16.4 documentation

@GuitarBilly thank you very much; my preference for disabling compression was to keep it out of the loop for this first test.

Anyway, tomorrow morning I’ll start the test… I need some days before coming here again to show my results!

Thank you!

Just run these two commands:

$ restic stats
repository 7723e83d opened (version 2, compression level auto)
[0:00] 100.00%  2 / 2 index files loaded
scanning...
Stats in restore-size mode:
     Snapshots processed:  2
        Total File Count:  141
              Total Size:  531.342 MiB

$ restic stats --mode files-by-contents
repository 7723e83d opened (version 2, compression level auto)
[0:00] 100.00%  2 / 2 index files loaded
scanning...
Stats in files-by-contents mode:
     Snapshots processed:  2
        Total File Count:  83
              Total Size:  313.132 MiB

The difference between the two sizes (here, 531-313 = 218) is what is saved by file-level deduplication.

Some notes on the “stats” modes I made some time ago after a lot of experimentation, which may be useful to you:

  • restore-size (default): total size if you restore all snapshots
    • total size: counts duplicate files multiple times
    • total file count: includes duplicate files and directories also
  • files-by-contents: same but after dedup
    • total size: file level dedup, so it’s lesser than from previous mode
    • total file count: just unique files and no directories
  • raw-data: most useful for compression stats
    • total uncompressed size: chunk level dedup, so it’s even lesser than previous mode
  • blobs-per-file: useless

Hi,

I have finished doing my test:

Is this correct?

Thank you

From the “red” lines in your paste, I’d say you saved (had common data between the two directories) only about 39 GB.

I’m curious what would show if you now ran the two commands I posted earlier in this thread, viz: restic stats and restic stats --mode files-by-contents

Hi,

below the full stats available :slight_smile:

For your info, the two source dirs. are 99,9% jpg, png and movie (the “Camera” folder of two Android phones)

So looks like you saved 504-425

This includes dedup due to duplicate files, *as well as dedup due to some files having common chunks (even if the files are not exact duplicates of each other).

@davide-italy, I agree with @sc2maha observation, close to 80GiB savings.
On both backup runs you have significant savings!
Enjoy restic…