Massive cache size?

jimp · March 4, 2020, 10:43pm

I haven’t had a chance to try PR 2513 yet, because I’m allowing existing prune operations to finish (I have a few more repos, though). Even deleting packs is taking a very long time. Is this normal or just B2 getting in the way? Does PR 2513 address it with more requests in parallel?

On a cloud server to B2, multi-gig bandwidth:

repository ... opened successfully, password is correct
counting files in repo
building new index for repo
[4:51:12] 100.00%  209957 / 209957 packs
repository contains 209957 packs (2750342 blobs) with 1.001 TiB
processed 2750342 blobs: 0 duplicate blobs, 0 B duplicate
load all snapshots
find data that is still in use for 19 snapshots
[1:17] 100.00%  19 / 19 snapshots
found 461704 of 2750342 data blobs still in use, removing 2288638 blobs
will remove 0 invalid files
will delete 170280 packs and rewrite 25605 packs, this frees 913.599 GiB
[11:49:09] 100.00%  25605 / 25605 packs rewritten
counting files in repo
[21:11] 100.00%  23802 / 23802 packs
finding old index files
saved new indexes as [...]
remove 1823 old index files
[6:53:33] 13.99%  27413 / 195885 packs deleted

In the process of these very long prune operations, the cache is shrinking considerably. So at least in my case, large caches have been caused by the retention of data (lots of small files) for almost a year.