I’m not sure if I’m asking this in the right place. My apologies if I’m not.
I have a Dropbox account and when I go to “Selective Sync”, it tells me that I have well over 500,000+ files in my account.
I know I have a lot of duplicates that I need to prune out.
Recently, I started using Backblaze’s B2 service (on Fedora 29) along with restic. I did a complete backup of my “/home” folder which includes my Dropbox folder (and a full download of my Dropbox files).
When I log into my B2 control panel, it says I have 59,293 files total.
I know restic has a de-duplication feature, but there’s no way I have almost 10x duplicate files in Dropbox.
My thinking is to try and run some other app on my machine to see if it can identify how many dupes I have. Assuming I’m correct, and I don’t have that many dupes, how can I find out what’s going on?
File contents are packed (and deduplicated), not copied 1:1 into the repository.
How do the sizes of the folders compare with the size of the repo? Have you tried browsing the repo (
restic ls) to ensure everything is there?
To expand on what matt said, restic stores “tree” (directory) objects and “blob” (file content) objects in the data directory. Files you back up may have their contents split into multiple blobs, which means that pieces of files can also be deduplicated, not just entire files.
These objects are combined into packs that seem to average 4-5MB on my systems. Each file under the data directory is a pack, which may contain multiple objects.
This means if you have one file that’s 25MB, that file’s contents may be stored in five packs, so one file on your source became five files in the backup.
By the same token, if you have 1,000 files that are 5KB each, they could all wind up being stored in the same pack, so your thousand source files became a single file in the backup.
The tl;dr is that there is no relationship between the number of files in the repository structure and the number of logical snapshotted files that the repository contains.