Restic create big cache

@zcalusic seriously? every after backup run? hmm my backup is twice a day, i don’t think i need to prune each day let alone twice a day (yesterday i did prune, maybe it is first time i did, and it run for almost 10 hours!)

@kapitainsky unfortunately, i can’t access my machine this weekend, i will do check later.

Yeah, the real reason is that longer you wait with prune, the longer it will take to run it. All my regular backup procedures take 5-10 minutes every night. It is true that backup itself is usually done within a minute, and then prune step takes 5 times as much, but as long as it is this fast, I don’t really care. In any case I like it better than running it for 10 hours. :slight_smile: Generally the logic is simple, new snapshot arrives with fresh data, and the oldest one is then pruned, I consider it simple.

Here, I ran all those stats/cache commands to give more info about my repo with lots of files, now I see I have more files than you, yet… much smaller cache. When you collect all those data, we can compare and try to pinpoint what could be the culprit for your extra large cache.

Stats in restore-size mode:
     Snapshots processed:  31
        Total File Count:  55408579
              Total Size:  1.742 TiB

Stats in raw-data mode:
     Snapshots processed:  31
        Total Blob Count:  6043415
 Total Uncompressed Size:  229.965 GiB
              Total Size:  133.211 GiB
    Compression Progress:  100.00%
       Compression Ratio:  1.73x
Compression Space Saving:  42.07%

Cache size: 1.894 GiB

Cache size depends on many factors but I would say two major drivers are repo size and number of unique files. In my artificial example with 100k files I can grow cache indefinitely if e.g. I touch every file before taking new snapshot. Files content is unchanged but metadata changes will be huge relatively. Of course it is very edge situation but I think good presentation of metadata impact on cache size.

How it works in real life? Different for everybody. All depends on backed up data.

1 Like

@kapitainsky if you create metadata-only repo, of course your metadata cache will reach full 100% of your repo size. The only question is why would anyone want to do such a silly thing. :slight_smile:

I’m more thinking about real world usage patterns, why would @yogasu have such a big cache, where I don’t see nothing similar here, with even more files. There must be an explanation. And possibly a bug to fix…

@yogasu Which version of restic are you using?
I ask because there was a bug in versions older than 0.14 where the “stats” output was incorrect for restore-size:

Still big, considering repo size is 27GB

From your prune output, it seems like the repo is perhaps 187GiB?:

repository contains 21526 packs (1161293 blobs) with 187.764 GiB

To see the size of the repo, you want to use the restic stats --mode raw-data as @kapitainsky suggested. The default stats mode will print the size of the backups in the repository if you restored them all, which doesn’t sound like it’s what you are after.

The manual has some additional details about the stats command: Manual — restic 0.16.3 documentation)

Only to see this edge case behavior:) Indeed it is unlikely in real life.

@yogasu data can for example generate massive metadata changes when yours not. There is no simple formula linking number of files to cache size.

Here you are example of metadata changes driven backup size (it of course results also with massive cache size):

So massive factor is how many files change - as it generates new metadata. So e.g. I backup 1 million files, they never change - 10 snapshots later I have 10m files in my repo but only 1m unique. Then in very different situation whith every file changing before taking new snapshot (even if it is only metadata change) I will end up with also 10m files in repo but metadata part will be many times larger.

@yogasu I think you have a very old version of restic. Your prune output looks like you are running 0.12.x or earlier. Note that 0.15.2 is the current version.

If you use the latest restic version

  • the prune command is much, much faster - enabling you to prune much more often
  • (about your issue:) metadata can be compressed which usually also reduces the cache size to a factor of around 1/4. Search the repo for finding out how to change the repo to the compressed format.
2 Likes

good quick test would be to see how repo data is split (data packs vs tree packs, index size etc.)

I am not sure restic has such stats as PR was abandoned but they can be displayed using rustic (restic spin off written in rust)

# rustic repoinfo
[INFO] repository local:./restic/: password is correct.
[INFO] using cache at /Users/kptsky/Library/Caches/rustic/c0f1f5f50978a40db533262ddd35608d476a3ed5d9f7713198afbca9e821577c
[INFO] scanning files...
repository files

| File type | Count | Total Size |
|-----------|-------|------------|
| Key       |     1 |      466 B |
| Snapshot  |     4 |    1.1 kiB |
| Index     |     1 |      368 B |
| Pack      |     2 |    2.2 MiB |
| Total     |     8 |    2.2 MiB |

[00:00:00] scanning index...              ████████████████████████████████████████          1/1
| Blob type | Count | Total Size | Total Size in Packs |
|-----------|-------|------------|---------------------|
| Tree      |     2 |   36.9 MiB |             2.2 MiB |
| Data      |     1 |       95 B |               115 B |
| Total     |     3 |   36.9 MiB |             2.2 MiB |

| Blob type  | Pack Count | Minimum Size | Maximum Size |
|------------|------------|--------------|--------------|
| Tree packs |          1 |      2.2 MiB |      2.2 MiB |
| Data packs |          1 |        192 B |        192 B |

This immediately should give you a hint if there is some bug with too much data in local cache or simply your repo has amount of metadata reflected in cache size.

@kapitainsky @zcalusic here is restic stats raw data…

It show 43GB, not sure why suddenly got this number

 restic stats --mode raw-data
 repository 010e7818 opened successfully, password is correct
 scanning...
 Stats in raw-data mode:
 Snapshots processed:   188
    Total Blob Count:   616953
          Total Size:   43.000 GiB
restic version 
 restic 0.11.0 compiled with go1.14.5 on linux/amd64

@alexweiss @shd2h indeed, i just realize it still in 0.11!

Think i need to upgrade, but need to read first what changed to avoid data error or something for that quite upgrade.

it tells you what is your full repo size on disk/cloud

restic stats tells you what would be size of restoring latest snapshots

These numbers are never the same

LOL - yes this is ancient version.

Upgrade is very straightforward and safe. But of course good to make a backup first - especially that your repo is not big.

https://restic.readthedocs.io/en/stable/045_working_with_repos.html#upgrading-the-repository-format-version

1 Like

Hmm okay… that’s interesting, never know about it before.

So do you think if i upgrade Restic it will help to make cache smaller?

Or it will help prune faster so i can run it more often like what @zcalusic said?

Definitely upgrade to the latest restic, it contains a lot of improvements that are relevant to you.

Yes because as a minimum all metadata will will compressed. How much smaller depends on your data. And given how old version of restic you are using huge amount of fixes and improvements has been implemented in the meantime.

Re prune - first it is much faster in newer restic as @alexweiss mentioned. Second more often you run it less job it has to do. The worst thing to do is to run it very rarely - then it is slow.

Okay, thank you everyone, i will try to simulate and implement new Restic in our system.

I remember last time i did, i can’t restore backup because of different version of Restic. Basically like Restic can’t restore because snapshot created by different version with current active Restic. but let me check again soon to make it sure.

It can only happen when you migrate to V2 repo and then try to use old restic to restore. It is one way road:) You move to the new one and should forget about using old one.

Okay, so i updated Restic into last version,

restic version
restic 0.15.2 compiled with go1.20.3 on linux/amd64

after 7 days, i checked again.

restic stats --mode raw-data
repository 010e7818 opened (version 1)
scanning...
Stats in raw-data mode:
     Snapshots processed:  192
        Total Blob Count:  626434
              Total Size:  43.852 GiB
restic cache
Repo ID     Last Used   Old  Size
----------------------------------------
010e78184b  0 days ago        20.040 GiB
----------------------------------------
1 cache dirs in /root/.cache/restic

so it still 50% of backup size, i tried to prune.

restic prune
repository 010e7818 opened (version 1)
loading indexes...
loading all snapshots...
finding data that is still in use for 192 snapshots
[19:05] 100.00%  192 / 192 snapshots
searching used packs...
collecting packs for deletion and repacking
[0:02] 100.00%  6892 / 6892 packs processed

to repack:         16414 blobs / 1.938 GiB
this removes:        198 blobs / 1.624 GiB
to delete:           335 blobs / 1.643 GiB
total prune:         533 blobs / 3.267 GiB
remaining:        626981 blobs / 43.936 GiB
unused size after prune: 86.202 MiB (0.19% of remaining size)

repacking packs
[0:22] 100.00%  72 / 72 packs repacked
rebuilding index
[0:02] 100.00%  6800 / 6800 packs processed
deleting obsolete index files
[0:00] 100.00%  19 / 19 files deleted
removing 96 old packs
[0:03] 100.00%  96 / 96 files deleted
done

and restic cache down to 17GB.

restic cache
Repo ID     Last Used   Old  Size
----------------------------------------
010e78184b  0 days ago        16.844 GiB
----------------------------------------
1 cache dirs in /root/.cache/restic


du -sch /root/.cache/restic
17G	/root/.cache/restic
17G	total

I still considering this as too big, but really not sure what should i do next.

For now my plan just to reduce backup frequency.

your cache is almost 40% size of the repo size which indeed is real big. To double check that it is correct I would just delete all cache (it is 100% safe) and see if during subsequent backups/prune it returns to this value.

There are many people requesting cache limit option to be added:

But it is only discussion at the moment.

Yeah, that what i do, deleting cache folder regularly, because in 2-3 days it will be back at 16GB and start growing few mb each day (i did create backup 2 snapshots daily, and delete all snapshot older than 30 days).

I even thinking to put in cron to delete cache regularly :smiley:

btw, if I put --no-cache in all my Restic command, will it mean cache directory (/root/.cache/restic) won’t created at all? i know it will has drawback, but in my case, it will save some storage for it.