Understanding the difference between added / stored / processed

Hi everyone,

When backing up a new folder (first time, so not parent snapshot available) it finished with this output:
Added to the repository: 3.698 GiB (1.516 GiB stored)
processed 5604 files, 7.227 GiB in 2:35

I see 3 different sizes and I am not sure of the meaning of each of them. Was every single file stored in the backup destination? I understand that the stored size can be smaller since the data is compressed, but here there is a huge difference from the initial total size of 7.227 GiB to the final 1.516 stored.

You seem to be running an old restic version, which one is it?

restic 0.15.1 compiled with go1.19.5 on windows/amd64

I use --verbose=1 to get that information

Got it, I realize now that you just removed an empty line in that output.

It would be good if you provide all of the output from your restic command, since there is a lot more statistics in it than you included above. Can you do that?

Sorry, it is the habit of trying not to include any personal data, but now I realize there is no private data anywhere in the log. This is the whole log:

open repository
lock repository
no parent snapshot found, will read all files
load index files
start scan on [C:\DesktopB]
start backup on [C:\DesktopB]
creating VSS snapshot for [c:\]
successfully created snapshot for [c:\]
scan finished in 6.848s: 5604 files, 7.227 GiB

Files:        5604 new,     0 changed,     0 unmodified
Dirs:         2037 new,     0 changed,     0 unmodified
Data Blobs:   6527 new
Tree Blobs:    528 new
Added to the repository: 3.698 GiB (1.516 GiB stored)

processed 5604 files, 7.227 GiB in 2:35
snapshot b79c162e saved

in case it is relevant: that folder “DesktopB” shouldn’t have any system files or anything that is hard to copy due to being protected files (though it is using VSS so they would also be copied, but I think there are some limitations even with VSS)

I may be wrong, but I interpreted those numbers in the following way:

5604 files are backed up with an original total size of 7.227 GiB on C:\DesktopB.
The data chunking creates blobs with a total size of 3.698 GiB uncompressed. It is less than 7 GiB due to deduplication, i.e. apparently many of those files are similar and contain identical chunks of data.
Those blobs are stored in your backup using 1.516 GiB due to compression.

Thanks for this info! it makes sense. The deduplication gain is more than I would expect. They are in theory all different files, but I can image that many of those files have a lot in common so if the chunks are small maybe many chunks are the same even if they belong to different files.

The high compression I think makes sense in this case since a lot of the files there are text files.