Massive cache size?

Hi @CendioOssman the cache does not only hold the index data.
Check and see if this answers your questions. Also, have you done some searching on the Forum or GitHub for restic cache?
Just to make sure that we leverage the knowledge we have already :slight_smile:

With a lot of small files I’ve seen repositories of ~100GB raw data using ~15GB of cache.

I have not found much details unfortunately. The wording of that page also concerns me. It sounds like it will only clear out caches of no longer used repositories. But no word about removing stale stuff inside a repo cache.

So will it just continue to grow and grow? Do I basically need to have as much cache storage as I have repo storage? :slight_smile:

Hey @CendioOssman, nice to see you here! I’m still regularly using novnc, thanks a lot for your work!

restic will automatically cache snapshot and index files and the files from the data/ subdir which only contains metadata. When you run prune to remove data from the repo, it is also removed from the local cache (either on the next run of restic or directly during prune, depending on where you run it). The cache only contains data which is still present in the repo, so there’s no stale data.

However, there’s no process to delete data from the cache which is still in the repo, but hasn’t been accessed for some time. So if you, for example, use a command like find or ls to check if some file is in the backup, it’ll download most of the metadata files from the repo, and those files will stay in the cache.

You can just delete the cache if it grows too large, restic will rebuild it. If some metadata is needed, it’ll download and save the whole file containing the metadata to the cache.

I’ve seen cache sizes varying from 3-10% of the whole repository size.

I hope this helps a bit!

Removing data from your repository with restic prune removes corresponding data in your cache. Therefor your repository will keep growing until you remove some snapshots.

I’m pretty sure that’s not possible but you coud run restic with --no-cache even though the slowdown is quite noticeable. :wink:

You’re welcome. Thank you for your work on restic. :slight_smile:

As for the cache, good to know that prune will at least clean some things up.

(Although the prune does seem to take some time in some cases. Currently sitting at almost 48h and counting because a lot of files got removed. :slight_smile: )

Can I safely nuke parts of the cache manually, or is there some index keeping track of things?

This machine is only supposed to run backup and forget (with --prune). Any things that I can easily automate removal off in that case that won’t have too much performance impact?

The cache has now grown slightly to 40 GB. So for 1.2 TB of data that would mean about 3%.

This is still large enough that it could be prudent to make this more noticeable to new users. ~/.cache might not have the space for it in many installations. That’s how we first noticed this issue.

I’m now up to 240 GB cache for a 1.6 TB repo and it doesn’t show any sign of leveling out. That means 15% of the repo so far. We sure there isn’t an issue here?

This is for a system that only does backup and forget. No pruning.

I am also still struggling with massive caches, such as 20-40GB for VMs that are lucky to have that much space. The issue I noticed today is my script hasn’t pruned in 1 year, and now I cannot do it because there is 1+ TB of data in the repo on B2, but the cache fills up the disk before it can be completed so it fails. (As a workaround I’m going to try again from a desktop with plenty of RAM and diskspace.)

I gather the cache is so the dup blocks can be easily known. But out of my 1+ TB restic reports around 15 GB is dedupped data. If Restic had a way to run without dedup, could the cache stay small and pruning run as fast as deleting a snapshot (in theory)? I love how Restic has really simple snapshots, but the large cache and 5+ hr prunes are really difficult to work around.

As a workaround you can add --no-cache to your commands. This will increase runtime but atleast prune can succesfully finish.

Lately restic’s developers have added a lot of PRs to increase the performance. @alexweiss wrote a summary about current prune issues and linked some PRs to improve the situation. Would be great if you could help to test these out.

If you have the feeling that restic does cache too much, you might also consider using

to get detailed statistics about you repository. The cache should hold snapshots, index and packs containing tree blobs (which are separately listed using restic stat --mode repository with this PR).

For my repositories, the size and number of files produced by the statistics exactly fit to what’s inside the cache dir.

Of course with usual filesystems you have to take into account that files occupy a multiple of the fs block size. This means if you have lot’s of small files this may blow up the used size of the cache dir.

To repack small data files which contain tree blobs you can use prune or cleanup-packs of PR #2513 with the repack-flags.

Actually rethinking about it, it may be good to indepenently make the repack decision for data files containing tree blobs and those containing data blobs. I’ll try to improve #2513:wink:

If you don’t want dedup then restic is probably the wrong tool… tar or 7z or just rclone with crypt would be more efficient.

Yeah more efficient, but while giving up any concept of snapshots, I think?

What I meant by disabling dedup is desiring something closer to rysnc+hard links. From what I can tell, restic can match a block of zeros from the beginning of time with one in today’s snapshot. That’s amazing, but my backups don’t seem to have a lot of duplicated data. I think I’d prefer dedup as block level changes within the same file but not across the entire repo. That way dropping a snapshot and the data (prune) would have a much smaller dataset to crawl and update when finding orphaned blocks. Storage requirements would be more, but I think the algorithm speed would be much improved.

Apples to oranges, I know, but ZFS operates similar to this. Snapshots are lightening fast to create and forget, file changes are tracked on a block level, but true dedup is optional because it is system intensive and not necessarily a good fit for many datasets.

Not at all – an archive/directory per snapshot.

I’m not sure prune would be any faster since packs are not associated with snapshots at all. Such an association would need to be added.

Basically this would require a repository format change and most likely it would probably have to be a repository-wide setting for the benefits to be realized.

It sounds like you really want something like rsnapshot.

I added the possibility to independently specify repacking options for tree blobs in PR #2513. So if you do have lot’s of small tree packs in your cache it’s worth trying out.

@jimp: If you encounter large prune times, it’s also worth trying out PR #2513.
The commands can basically do all the cleanup that prunedoes but are much faster, especially if you do not repack but only delete completely unused packs (which is the standard option for data packs).

1 Like

I’ll try it out. Any chance it will break the repo entirely? I mean, is PR 2513 radically new or just doing the same steps more effeciently?

I have two ~1TB repos on B2 that I began pruning 3 days ago, which after 1 day said they would recover about 75% of the disk usage, but they both failed to complete. The cache for one is 152GB. The other is < 1GB because it thinks it deleted the data, but reported thousands of packs not deleted when I canceled it. They appeared to be stuck on a slow loop saying the lock on B2 was missing, but both also needed to be manually unlocked to start again this morning. I have 4 more of that same size to prune. The CPU usage remains very low during the prune, less than 1% CPU.

I checked B2 and I haven’t run into any caps, so I cannot explain why the prunes failed, but I’m wondering if some throttling is taking place because the unlocks took around 30 seconds. And… I now owe more in bandwidth charges than I do for storage. :face_with_raised_eyebrow:

Any chance it will break the repo entirely? I mean, is PR 2513 radically new or just doing the same steps more effeciently?

It is a complete rewrite. Hence, yes, there is the chance that cleanup-packs breaks the repo.
Best you run it on a copy of your repo followed directly by a check. If check reports an error, please report it to me!

It has been tested quite a lot without pack rewrites. However, I added defaults to repack some tree packs…

Depending if it makes sense run restic check --read-data

Depending if it makes sense run restic check --read-data

Nope. --read-data checks for the integrity of pack files. This check is meant for finding issues with your storage like bit swaps etc.
To test if pruning was successful IMO a check without --read-data is sufficient.

Ahh gotcha! Thanks for clarifying

I haven’t had a chance to try PR 2513 yet, because I’m allowing existing prune operations to finish (I have a few more repos, though). Even deleting packs is taking a very long time. Is this normal or just B2 getting in the way? Does PR 2513 address it with more requests in parallel?

On a cloud server to B2, multi-gig bandwidth:

repository ... opened successfully, password is correct
counting files in repo
building new index for repo
[4:51:12] 100.00%  209957 / 209957 packs
repository contains 209957 packs (2750342 blobs) with 1.001 TiB
processed 2750342 blobs: 0 duplicate blobs, 0 B duplicate
load all snapshots
find data that is still in use for 19 snapshots
[1:17] 100.00%  19 / 19 snapshots
found 461704 of 2750342 data blobs still in use, removing 2288638 blobs
will remove 0 invalid files
will delete 170280 packs and rewrite 25605 packs, this frees 913.599 GiB
[11:49:09] 100.00%  25605 / 25605 packs rewritten
counting files in repo
[21:11] 100.00%  23802 / 23802 packs
finding old index files
saved new indexes as [...]
remove 1823 old index files
[6:53:33] 13.99%  27413 / 195885 packs deleted

In the process of these very long prune operations, the cache is shrinking considerably. So at least in my case, large caches have been caused by the retention of data (lots of small files) for almost a year.