Help with large cache

Hi All,

I have a fairly large repo (~10T) that I backup to and have daily snapshots of. I used to run a script that did the backup and then a check. Recently I have been blowing out my /tmp tmpfs and now my / parition (changing the cache dir) when running the check. The cache is huge, 85-100G. Is there a way I can limit this somehow? Am I not following a good practice in some way?

Hi @hibby50 and welcome to the restic community! :slight_smile:

So you could try using the restic cache command to clean out old cache data.

restic cache --help

The "cache" command allows listing and cleaning local cache directories.

EXIT STATUS
===========

Exit status is 0 if the command was successful, and non-zero if there was any error.

Usage:
  restic cache [flags]

Flags:
      --cleanup        remove old cache directories
  -h, --help           help for cache
      --max-age days   max age in days for cache directories to be considered old (default 30)
      --no-size        do not output the size of the cache directories

Hope this helps.

Hey @moritzdietz thank you for the suggestion!

I should have been more clear, my issue is not that the cache is accumulating, as a matter of fact after the check is complete it deletes all of its cache. The issue is while the check command is running the cache climbs to 100G by the time it completes, then upon completion it cleans it all up. The issue is I don’t necessarily have a good place on the system to house that 100G while it is running, and if it grows any larger I’ll really have some problems

If that’s just the way it is, I suppose I could add another drive just for this purpose, but something doesn’t seem right to me (likely I am doing something wrong)

Ah! Ok, that clears things up. So I have two theories right now.

  1. Your restic check command uses a --cache-dir flag which causes the creation of a cache directory
  2. You don’t use that flag, but the restic check command has to download and store the blobs somewhere so it uses your users cache dir by default
    (I don’t know about this one, I thought it would just load all blobs into RAM and not to disk… so maybe someone else can jump in and confirm this for me)

By default the restic check command does not use a cache directory.

By default, the “check” command will always load all data directly from the repository and not use a local cache.

That’s from restic check --help.

If you are running the restic check command with a manual --cache-dir flag then it might be the reason.

What you can try is to set the following flag: --no-cache to tell restic to not create a cache directory at all.
Or, as you suggested, you can add a spare disk.

Interesting, I am not using any flags for cache, just the check command, so it seems something is not lining up with the documentation. I played around with the --no-cache flag but it makes it so slow. Perhaps I can shift to doing a weekly check vs a check after every nightly run.

I’ll attach my script so people can see the details. But it is certainly caching. If I run the check command outside of the script I can watch the cache fill up. As per the docs I tried to set TMPDIR to a directory on my SSD to avoid it going to my tmpfs since it was causing things to swap, but that pushes my SSDs >88% usage and then other services start complaining the drive is getting full.

This script runs daily, then I have another practically identical one that backs different dirs to the same repo.

root@<redacted>:~/backups# cat daily-backup.sh 
#!/bin/bash

#export TMPDIR=/var/cache/restic

logfile=/root/backups/logs/`date +"%Y%m%d-%H:%m"`.log

if [ ! `pgrep restic` ]; then
        restic -r rclone:<redacted> --verbose --password-file=/root/backups/.restic-passwd backup /storage /home /etc /opt /root --exclude "media/.downloads/*" --exclude "nvr/*" --exclude "kvm/*" --exclude ".snapshots/*" &> $logfile

        backup_status=$?

        restic -r rclone:<redacted> --verbose --password-file=/root/backups/.restic-passwd check &>> $logfile

        check_status=$?

        if [ $(($backup_status + $check_status)) = 0 ]; then
                cat $logfile | mailx -s "Full Backup Completed Successfully" <redacted>
        elif [ ! $backup_status = 0 ]; then
                cat $logfile | mailx -s "Full Backup FAILED" <redacted>
        elif [ ! $check_status = 0 ]; then
                cat $logfile | mailx -s "Backup Integrity Check FAILED" <redacted>
        fi
else
        echo backup job is still running from yesterday > $logfile
        cat $logfile | mailx -s "Backup SKIPPED" <redacted>
fi

The line in question

restic -r rclone:<redacted> --verbose --password-file=/root/backups/.restic-passwd check &>> $logfile

You’re backing up 10 TB. It’s perfectly expected that you need to spend 100 GB for a cache, if you want a cache. If you don’t find this reasonable, simply use the --no-cache option. I’m not sure what other middle ground you’re expecting :wink:

Do you really have a 10 TB disk that cannot spare 100 GB for a cache? That’s certainly pushing it :smiley:

Hahaha :joy: if it is expected that’s fine. It seemed out of place to me and has only recently become an issue after almost 2 years.

The bulk of my storage is spinning rust, so I wasn’t sure the implications of having the cache on a slower medium since I already moved it from default RAM to SSD. Also the spinning disk is what’s being backed up. What I’ll do is make a dir on the HDDs and exclude that from the backup command. If I run into performance issues I’ll just add another SSD.

Thanks for taking a look and confirming this type of cache usage is expected

If it’s any comfort, I have like 150 GB cache for much smaller repositories (like 50 GB) :smiley: It’s however not expected, it’s due to an issue with file metadata on macOS.

Note that adding compression helps a lot with large caches as all cached files is metadata saved in JSON format. In my experience compression shrinks down the cache to about 1/5 of the size.

I have a repo with ~200 GB size and ~100 MB cache size. Scaling this would mean 5GB for your 10TB repo. But it of course depend a lot on the saved content. Small files and lots of dirs and lots of snapshots need more metadata than large files, few dirs and few snapshots.

1 Like

That part of the help text wants to say that check by default does not use an already existing cache, but instead creates a new one. So by default there is a cache and it is created in the temp directory or if a cache directory is set, then in the cache directory (requires a rather recent restic version). The check command also has the --with-cache option that lets it reuse an already existing local cache. The main downside of that option is that this won’t detect data corruption at the server for files that are already cached.

There’s a proof-of-concept PR that lets check reuse and verify an already existing local cache: Verify cache on check by MichaelEischer · Pull Request #3747 · restic/restic · GitHub

2 Likes

Aha! That’s very good to know :point_up: Thanks for the link and information Michael