Hi, I have issue that Restic create big cache file in /root/.cache/restic.
restic stats:
restic stats
repository 010e7818 opened successfully, password is correct
scanning...
Stats in restore-size mode:
Snapshots processed: 182
Total File Count: 39628216
Total Size: 27.719 GiB
and cache size
du -sch /root/.cache/restic
34G /root/.cache/restic
34G total
so my question:
How is it possible that cache is bigger than the backup itself?
What command that can create the cache? or should i add --no-cache parameter in all Restic command to avoid cache? (i know the drawback is make the backup process slower)
it taking time, and it looks like many things pruned.
after finish, run restic stats again.
restic stats
repository 010e7818 opened successfully, password is correct
scanning...
Stats in restore-size mode:
Snapshots processed: 182
Total File Count: 39628216
Total Size: 27.719 GiB
i try delete cache manually, run the backup again, now it reduced
du -sch /root/.cache/restic
16G /root/.cache/restic
16G total
so from 34GB to 16GB.
and lastly, restic cache
restic cache
Repo ID Last Used Old Size
----------------------------------------
010e78184b 0 days ago 16.530 GiB
----------------------------------------
1 cache dirs in /root/.cache/restic
So for now, cache is 16GB.
Still big, considering repo size is 27GB (am i right?), so 16GB is more than 50%.
Currently still running the backup daily, and will monitor if cache will growing bigger.
Your repo size is very modest but you have almost 40 million files in it. For every file you need some cache data e.g. name, blob id, data offset etc. Fresh cache size 16GB translates to about 450 bytes cache data per file which sounds reasonable.
It is even possible that cache size will be larger than repository - simply when all metadata is larger than content.
It is the same with any storage. 40 million small files occupy on disk much more space that their actual data size.
I don’t see any possibility that cache could be larger than repo itself. That is, if there’s not some cache cleanup bug, which is always a possibility.
Because, cache just contains blobs which are metadata only, and as such only part of the whole repo (the other part being data only blobs, of course).
It’s true that 40 million files is a lot, but I have one restic repo which has close to that number, and cache is much smaller…
I have created folder with 100k empty files and then new restic repo with only one snapshot containing this folder:
restic stats
Stats in restore-size mode:
Snapshots processed: 1
Total File Count: 100002
Total Size: 95 B
restic cache
Repo ID Last Used Old Size
----------------------------------------
c0f1f5f509 0 days ago 2.171 MiB
----------------------------------------
@zcalusic seriously? every after backup run? hmm my backup is twice a day, i don’t think i need to prune each day let alone twice a day (yesterday i did prune, maybe it is first time i did, and it run for almost 10 hours!)
@kapitainsky unfortunately, i can’t access my machine this weekend, i will do check later.
Yeah, the real reason is that longer you wait with prune, the longer it will take to run it. All my regular backup procedures take 5-10 minutes every night. It is true that backup itself is usually done within a minute, and then prune step takes 5 times as much, but as long as it is this fast, I don’t really care. In any case I like it better than running it for 10 hours. Generally the logic is simple, new snapshot arrives with fresh data, and the oldest one is then pruned, I consider it simple.
Here, I ran all those stats/cache commands to give more info about my repo with lots of files, now I see I have more files than you, yet… much smaller cache. When you collect all those data, we can compare and try to pinpoint what could be the culprit for your extra large cache.
Stats in restore-size mode:
Snapshots processed: 31
Total File Count: 55408579
Total Size: 1.742 TiB
Stats in raw-data mode:
Snapshots processed: 31
Total Blob Count: 6043415
Total Uncompressed Size: 229.965 GiB
Total Size: 133.211 GiB
Compression Progress: 100.00%
Compression Ratio: 1.73x
Compression Space Saving: 42.07%
Cache size: 1.894 GiB
Cache size depends on many factors but I would say two major drivers are repo size and number of unique files. In my artificial example with 100k files I can grow cache indefinitely if e.g. I touch every file before taking new snapshot. Files content is unchanged but metadata changes will be huge relatively. Of course it is very edge situation but I think good presentation of metadata impact on cache size.
How it works in real life? Different for everybody. All depends on backed up data.
@kapitainsky if you create metadata-only repo, of course your metadata cache will reach full 100% of your repo size. The only question is why would anyone want to do such a silly thing.
I’m more thinking about real world usage patterns, why would @yogasu have such a big cache, where I don’t see nothing similar here, with even more files. There must be an explanation. And possibly a bug to fix…
@yogasu Which version of restic are you using?
I ask because there was a bug in versions older than 0.14 where the “stats” output was incorrect for restore-size:
Still big, considering repo size is 27GB
From your prune output, it seems like the repo is perhaps 187GiB?:
repository contains 21526 packs (1161293 blobs) with 187.764 GiB
To see the size of the repo, you want to use the restic stats --mode raw-data as @kapitainsky suggested. The default stats mode will print the size of the backups in the repository if you restored them all, which doesn’t sound like it’s what you are after.
Only to see this edge case behavior:) Indeed it is unlikely in real life.
@yogasu data can for example generate massive metadata changes when yours not. There is no simple formula linking number of files to cache size.
Here you are example of metadata changes driven backup size (it of course results also with massive cache size):
So massive factor is how many files change - as it generates new metadata. So e.g. I backup 1 million files, they never change - 10 snapshots later I have 10m files in my repo but only 1m unique. Then in very different situation whith every file changing before taking new snapshot (even if it is only metadata change) I will end up with also 10m files in repo but metadata part will be many times larger.
@yogasu I think you have a very old version of restic. Your prune output looks like you are running 0.12.x or earlier. Note that 0.15.2 is the current version.
If you use the latest restic version
the prune command is much, much faster - enabling you to prune much more often
(about your issue:) metadata can be compressed which usually also reduces the cache size to a factor of around 1/4. Search the repo for finding out how to change the repo to the compressed format.
good quick test would be to see how repo data is split (data packs vs tree packs, index size etc.)
I am not sure restic has such stats as PR was abandoned but they can be displayed using rustic (restic spin off written in rust)
# rustic repoinfo
[INFO] repository local:./restic/: password is correct.
[INFO] using cache at /Users/kptsky/Library/Caches/rustic/c0f1f5f50978a40db533262ddd35608d476a3ed5d9f7713198afbca9e821577c
[INFO] scanning files...
repository files
| File type | Count | Total Size |
|-----------|-------|------------|
| Key | 1 | 466 B |
| Snapshot | 4 | 1.1 kiB |
| Index | 1 | 368 B |
| Pack | 2 | 2.2 MiB |
| Total | 8 | 2.2 MiB |
[00:00:00] scanning index... ████████████████████████████████████████ 1/1
| Blob type | Count | Total Size | Total Size in Packs |
|-----------|-------|------------|---------------------|
| Tree | 2 | 36.9 MiB | 2.2 MiB |
| Data | 1 | 95 B | 115 B |
| Total | 3 | 36.9 MiB | 2.2 MiB |
| Blob type | Pack Count | Minimum Size | Maximum Size |
|------------|------------|--------------|--------------|
| Tree packs | 1 | 2.2 MiB | 2.2 MiB |
| Data packs | 1 | 192 B | 192 B |
This immediately should give you a hint if there is some bug with too much data in local cache or simply your repo has amount of metadata reflected in cache size.