Massive cache size?

You’re welcome. Thank you for your work on restic. :slight_smile:

As for the cache, good to know that prune will at least clean some things up.

(Although the prune does seem to take some time in some cases. Currently sitting at almost 48h and counting because a lot of files got removed. :slight_smile: )

Can I safely nuke parts of the cache manually, or is there some index keeping track of things?

This machine is only supposed to run backup and forget (with --prune). Any things that I can easily automate removal off in that case that won’t have too much performance impact?

The cache has now grown slightly to 40 GB. So for 1.2 TB of data that would mean about 3%.

This is still large enough that it could be prudent to make this more noticeable to new users. ~/.cache might not have the space for it in many installations. That’s how we first noticed this issue.

I’m now up to 240 GB cache for a 1.6 TB repo and it doesn’t show any sign of leveling out. That means 15% of the repo so far. We sure there isn’t an issue here?

This is for a system that only does backup and forget. No pruning.

I am also still struggling with massive caches, such as 20-40GB for VMs that are lucky to have that much space. The issue I noticed today is my script hasn’t pruned in 1 year, and now I cannot do it because there is 1+ TB of data in the repo on B2, but the cache fills up the disk before it can be completed so it fails. (As a workaround I’m going to try again from a desktop with plenty of RAM and diskspace.)

I gather the cache is so the dup blocks can be easily known. But out of my 1+ TB restic reports around 15 GB is dedupped data. If Restic had a way to run without dedup, could the cache stay small and pruning run as fast as deleting a snapshot (in theory)? I love how Restic has really simple snapshots, but the large cache and 5+ hr prunes are really difficult to work around.

As a workaround you can add --no-cache to your commands. This will increase runtime but atleast prune can succesfully finish.

Lately restic’s developers have added a lot of PRs to increase the performance. @alexweiss wrote a summary about current prune issues and linked some PRs to improve the situation. Would be great if you could help to test these out.

If you have the feeling that restic does cache too much, you might also consider using


to get detailed statistics about you repository. The cache should hold snapshots, index and packs containing tree blobs (which are separately listed using restic stat --mode repository with this PR).

For my repositories, the size and number of files produced by the statistics exactly fit to what’s inside the cache dir.

Of course with usual filesystems you have to take into account that files occupy a multiple of the fs block size. This means if you have lot’s of small files this may blow up the used size of the cache dir.

To repack small data files which contain tree blobs you can use prune or cleanup-packs of PR #2513 with the repack-flags.

Actually rethinking about it, it may be good to indepenently make the repack decision for data files containing tree blobs and those containing data blobs. I’ll try to improve #2513:wink:

If you don’t want dedup then restic is probably the wrong tool… tar or 7z or just rclone with crypt would be more efficient.

Yeah more efficient, but while giving up any concept of snapshots, I think?

What I meant by disabling dedup is desiring something closer to rysnc+hard links. From what I can tell, restic can match a block of zeros from the beginning of time with one in today’s snapshot. That’s amazing, but my backups don’t seem to have a lot of duplicated data. I think I’d prefer dedup as block level changes within the same file but not across the entire repo. That way dropping a snapshot and the data (prune) would have a much smaller dataset to crawl and update when finding orphaned blocks. Storage requirements would be more, but I think the algorithm speed would be much improved.

Apples to oranges, I know, but ZFS operates similar to this. Snapshots are lightening fast to create and forget, file changes are tracked on a block level, but true dedup is optional because it is system intensive and not necessarily a good fit for many datasets.

Not at all – an archive/directory per snapshot.

I’m not sure prune would be any faster since packs are not associated with snapshots at all. Such an association would need to be added.

Basically this would require a repository format change and most likely it would probably have to be a repository-wide setting for the benefits to be realized.

It sounds like you really want something like rsnapshot.

I added the possibility to independently specify repacking options for tree blobs in PR #2513. So if you do have lot’s of small tree packs in your cache it’s worth trying out.

@jimp: If you encounter large prune times, it’s also worth trying out PR #2513.
The commands can basically do all the cleanup that prunedoes but are much faster, especially if you do not repack but only delete completely unused packs (which is the standard option for data packs).

1 Like

I’ll try it out. Any chance it will break the repo entirely? I mean, is PR 2513 radically new or just doing the same steps more effeciently?

I have two ~1TB repos on B2 that I began pruning 3 days ago, which after 1 day said they would recover about 75% of the disk usage, but they both failed to complete. The cache for one is 152GB. The other is < 1GB because it thinks it deleted the data, but reported thousands of packs not deleted when I canceled it. They appeared to be stuck on a slow loop saying the lock on B2 was missing, but both also needed to be manually unlocked to start again this morning. I have 4 more of that same size to prune. The CPU usage remains very low during the prune, less than 1% CPU.

I checked B2 and I haven’t run into any caps, so I cannot explain why the prunes failed, but I’m wondering if some throttling is taking place because the unlocks took around 30 seconds. And… I now owe more in bandwidth charges than I do for storage. :face_with_raised_eyebrow:

Any chance it will break the repo entirely? I mean, is PR 2513 radically new or just doing the same steps more effeciently?

It is a complete rewrite. Hence, yes, there is the chance that cleanup-packs breaks the repo.
Best you run it on a copy of your repo followed directly by a check. If check reports an error, please report it to me!

It has been tested quite a lot without pack rewrites. However, I added defaults to repack some tree packs…

Depending if it makes sense run restic check --read-data

Depending if it makes sense run restic check --read-data

Nope. --read-data checks for the integrity of pack files. This check is meant for finding issues with your storage like bit swaps etc.
To test if pruning was successful IMO a check without --read-data is sufficient.

Ahh gotcha! Thanks for clarifying

I haven’t had a chance to try PR 2513 yet, because I’m allowing existing prune operations to finish (I have a few more repos, though). Even deleting packs is taking a very long time. Is this normal or just B2 getting in the way? Does PR 2513 address it with more requests in parallel?

On a cloud server to B2, multi-gig bandwidth:

repository ... opened successfully, password is correct
counting files in repo
building new index for repo
[4:51:12] 100.00%  209957 / 209957 packs
repository contains 209957 packs (2750342 blobs) with 1.001 TiB
processed 2750342 blobs: 0 duplicate blobs, 0 B duplicate
load all snapshots
find data that is still in use for 19 snapshots
[1:17] 100.00%  19 / 19 snapshots
found 461704 of 2750342 data blobs still in use, removing 2288638 blobs
will remove 0 invalid files
will delete 170280 packs and rewrite 25605 packs, this frees 913.599 GiB
[11:49:09] 100.00%  25605 / 25605 packs rewritten
counting files in repo
[21:11] 100.00%  23802 / 23802 packs
finding old index files
saved new indexes as [...]
remove 1823 old index files
[6:53:33] 13.99%  27413 / 195885 packs deleted

In the process of these very long prune operations, the cache is shrinking considerably. So at least in my case, large caches have been caused by the retention of data (lots of small files) for almost a year.

For reference, this is the workaround I have in place for now:

find ~/.cache/restic/ -atime +14 | xargs rm
1 Like

I did that for a while, approximately 1 year, but I wedged myself into a corner when it came time to prune. On a couple VMs, I actually had to perform the prune on a desktop with over 200 GB of free space. The entire cache was needed to perform the prune, and it took 3 days to complete. However, when the prune was complete, the cache was small (around 1-10GB depending on the machine).

I suggest pruning every 14 days instead. It only takes around 1 hr if you do it once a month and the cache stays small. This is assuming you have a reasonable snapshot policy in place that results in 25 snapshots retained. Your cache will always remain large if you keep every snapshot.

I think performance improvements for prune are needed so people are more likely to do it, but also clarity in the documentation would help a lot of people understand it really is needed on a semi-regular basis. Sure, you don’t have to do it, but for any active workload being backed up, your remote repo will quickly exceed 1 TB and the cache will become unmanageable.

I am typically running Restic pruning on a separate VM, not the one where I run the backup from, and would like to keep the cache size small.

Will removing just part of the cache as indicated by @CendioOssman cause any harm in terms of consistency of the backups or is it better to remove the cache entirely every few weeks ?

Many thanks.

The cache is really just a cache. That is you can delete whatever files you want from it and restic will download them again if necessary.

Thank you for the confirmation.