Another memory usage question

Hi again :waving_hand:

I have not touched the memory situation on the clients’ side for a while, but as repositories grew, I started to see higher and higher usages, so I wanted to ask preemptively about “what can I do” before it keeps growing into problematic size.

Some facts:

  • Restic version: restic 0.18.1-dev (compiled manually) compiled with go1.25.3 X:greenteagc on linux/amd64
  • Repositories are issued forget & prune automatically every ~3 days
    • Relevant prune arguments are --cleanup-cache --max-unused 20% --max-repack-size 300G -o s3.connections=20
  • Each client has their own retention but we have 2 main categories
    • Normal snapshots, which is ~daily from each client & kept around ~2 months
    • Persistent ones, for one reason or other, kept forever
  • Relevant backup arguments on client side: --cache-dir /opt/restic_local_cache --cleanup-cache --no-scan --json --stuck-request-timeout 15m --read-concurrency 2

Here is an example repository, clients taking between ~1.5GB and ~1.8GB, I assume depending on how recent was latest prune:

~> sudo -E restic stats --mode debug
repository c6048bae opened (version 2, compression level auto)
[0:42] 100.00%  1370 / 1370 index files loaded
Collecting size statistics

File Type: key
Count: 1
Total Size: 473 B
Size            Count
---------------------
100 - 999 Byte  1
---------------------
File Type: lock
Count: 1
Total Size: 182 B
Size            Count
---------------------
100 - 999 Byte  1
---------------------
File Type: index
Count: 1370
Total Size: 582.929 MiB
Size                    Count
-----------------------------
        100 - 999 Byte  366
      1000 - 9999 Byte  294
    10000 - 99999 Byte  400
  100000 - 999999 Byte  92
1000000 - 9999999 Byte  218
-----------------------------
File Type: data
Count: 388215
Total Size: 6.208 TiB
Size                      Count
--------------------------------
          100 - 999 Byte  251
        1000 - 9999 Byte  444
      10000 - 99999 Byte  479
    100000 - 999999 Byte  837
  1000000 - 9999999 Byte  4022
10000000 - 99999999 Byte  382182
--------------------------------
Blob Type: data
Count: 6782042
Total Size: 6.160 TiB
Size                    Count
-------------------------------
          10 - 99 Byte  148566
        100 - 999 Byte  557712
      1000 - 9999 Byte  955459
    10000 - 99999 Byte  404197
  100000 - 999999 Byte  2062349
1000000 - 9999999 Byte  2653759
-------------------------------


Blob Type: tree
Count: 5549504
Total Size: 4.398 GiB
Size                  Count
-----------------------------
        10 - 99 Byte  4
      100 - 999 Byte  4731544
    1000 - 9999 Byte  791957
  10000 - 99999 Byte  25337
100000 - 999999 Byte  662
-----------------------------

So it is a chunky one. Snapshot-wise, it includes total ~9000 snapshots from ~800 clients (~900 of them are tagged persistent, so never gets removed).

I’d be glad for any directions, I don’t think GOGC would rescue me at this point, asking nearly 2GB ram for 800+ clients on every run is a bit costly. Should I separate the persistent snapshots? Or more importantly, what is the main driving force for memory usage and can I somehow optimize that?

Thanks!

Hi @gurkan

The main driving force for memory usage is the in-memory index which needs (according to the restic in-code docu) around 62 bytes per entry plus overhead (mainly from storing this in a map and GC overhead, IIRC).

So, besides potential optimizations within restic (which are out of scope for this question), the only thing you can do as user is try to reduce the number of blobs in the repository. This can be done by using multiple repositories instead of a single big one. But it does not depend on the number of snapshots, only on the number of blobs which are effectively existing in the used repository.

And yes, tuning GOGC could also help to a certain degree as it may decrease the GC overhead. (But I’m not sure how much this is nowadays - I remember having read about some GC-optimizing implementation of the maps used within restic)

1 Like

Aww shoot.. Thanks. I was thinking maybe I can repack more aggressively or somehow force the size up for “something” so that thing would be less in terms of count, but (as far as I understand from design document) blob size is not something I can fiddle with.

Also it sounds like better dedup can help, but checking the individual snapshot stats I can see each has at least 50% dedup gain, so I guess it spends a lot less memory than it should already :sweat_smile:

Thanks again, I think the sensible approach here to divide the repository.