Your repo size is very modest but you have almost 40 million files in it. For every file you need some cache data e.g. name, blob id, data offset etc. Fresh cache size 16GB translates to about 450 bytes cache data per file which sounds reasonable.
It is even possible that cache size will be larger than repository - simply when all metadata is larger than content.
It is the same with any storage. 40 million small files occupy on disk much more space that their actual data size.
@zcalusic seriously? every after backup run? hmm my backup is twice a day, i don’t think i need to prune each day let alone twice a day (yesterday i did prune, maybe it is first time i did, and it run for almost 10 hours!)
@kapitainsky unfortunately, i can’t access my machine this weekend, i will do check later.
Yeah, the real reason is that longer you wait with prune, the longer it will take to run it. All my regular backup procedures take 5-10 minutes every night. It is true that backup itself is usually done within a minute, and then prune step takes 5 times as much, but as long as it is this fast, I don’t really care. In any case I like it better than running it for 10 hours. Generally the logic is simple, new snapshot arrives with fresh data, and the oldest one is then pruned, I consider it simple.
Here, I ran all those stats/cache commands to give more info about my repo with lots of files, now I see I have more files than you, yet… much smaller cache. When you collect all those data, we can compare and try to pinpoint what could be the culprit for your extra large cache.
Stats in restore-size mode:
Snapshots processed: 31
Total File Count: 55408579
Total Size: 1.742 TiB
Stats in raw-data mode:
Snapshots processed: 31
Total Blob Count: 6043415
Total Uncompressed Size: 229.965 GiB
Total Size: 133.211 GiB
Compression Progress: 100.00%
Compression Ratio: 1.73x
Compression Space Saving: 42.07%
Cache size: 1.894 GiB
Cache size depends on many factors but I would say two major drivers are repo size and number of unique files. In my artificial example with 100k files I can grow cache indefinitely if e.g. I touch every file before taking new snapshot. Files content is unchanged but metadata changes will be huge relatively. Of course it is very edge situation but I think good presentation of metadata impact on cache size.
How it works in real life? Different for everybody. All depends on backed up data.
@kapitainsky if you create metadata-only repo, of course your metadata cache will reach full 100% of your repo size. The only question is why would anyone want to do such a silly thing.
I’m more thinking about real world usage patterns, why would @yogasu have such a big cache, where I don’t see nothing similar here, with even more files. There must be an explanation. And possibly a bug to fix…
@yogasu Which version of restic are you using?
I ask because there was a bug in versions older than 0.14 where the “stats” output was incorrect for restore-size:
Still big, considering repo size is 27GB
From your prune output, it seems like the repo is perhaps 187GiB?:
repository contains 21526 packs (1161293 blobs) with 187.764 GiB
To see the size of the repo, you want to use the restic stats --mode raw-data as @kapitainsky suggested. The default stats mode will print the size of the backups in the repository if you restored them all, which doesn’t sound like it’s what you are after.
Only to see this edge case behavior:) Indeed it is unlikely in real life.
@yogasu data can for example generate massive metadata changes when yours not. There is no simple formula linking number of files to cache size.
Here you are example of metadata changes driven backup size (it of course results also with massive cache size):
So massive factor is how many files change - as it generates new metadata. So e.g. I backup 1 million files, they never change - 10 snapshots later I have 10m files in my repo but only 1m unique. Then in very different situation whith every file changing before taking new snapshot (even if it is only metadata change) I will end up with also 10m files in repo but metadata part will be many times larger.