Default "Pack Size" is too small

Hi All,

I am in the process of setting the “pack size” on EVERY SINGLE CLIENT (all my pc’s, laptops, VM’s, raspberry pi’s etc) to the maximum, because the default of 16MB is just too small. Currently I’m at about 28 machines, the bulk of which are VM’s and RPI’s. (I work from my home office, and have a number of development and test environments set up for different clients)

What was the though process behind such a tiny default pack size?

My repository is about 7TB, and with the default tiny pack size, it has over a million files.
The store is on a 120TB ZFS NAS with 7M files, so while its not the biggest dataset by far, it consumes the most metadata.

My thoughts are:

  1. Make the pack size a REPOSITORY setting, so I dont have to remember to configure each client individually
    • Clients can override if needed
  2. Make the default a reasonable size, like say 256MB
    • Allow the “backend” to suggest a reasonable default. (S3 backend vs rest backend vs file backend can have different ideas on what a reasonable default might be)
  3. Make the maximum a reasonable size, like 1-4GB.

I’m guessing that a significant increase in maximum pack size might require a change in repository format (larger indexes etc), and a new repository version. What would be the impact of that? Would it make the repository significantly larger?

FWIW, here is what’s implemented in rustic:

  • All options which control the targeted pack size ares saved in the repository config file and can be set in the init or config command (and there are no other command line options which must be given for other commands)
  • The targeted pack size is by default scaled depending on the repository size. For instance, your 7 TB repo would get a targeted pack size of about 115 MB. Of course all scaling parameters are customizable.
  • The targeted pack size is calculated independently for tree and data packs. Usually tree packs cover a very small amount of the repo size and therefore a smaller packsize is OK for those (and gives benefits when forgetting/pruning snapshots)
  • The prune command also repacks packs which are too small (or, optionally, too large).
  • Side remark: the maximum packsize is always 4 GiB as the position of a blob within the pack is always saved as 32 bit integer.
  • Another remark: I don’t know why restic chose to use a pretty small targeted packsize of 16 MiB and limits the targeted packsize to 128 MiB. In practice, I didn’t encounter any problems with larger pack sizes.

Hope this gives some inspiration if restic decides to further improve the tuning of pack sizes!

it is a problem with restic how conservative it is with any changes. And it is one of the best restic characteristic. There is no contradiction for me here.

It is my data and backup - I do not want to spend time experimenting to chase some new cool additions.

Not that it can not be made better. Here for sure rustic is strong force for change. But as long as it wont move out of beta and provides proper documentation it is only something good to keep an eye on.

2 Likes

Tell me more about config commands. “restic config” says unknown command

Edit: Sorry, I didnt realise you weren’t talking about restic itelf…

@alexweiss It’s getting rather annoying that too many threads in the restic forum are derailed into discussions about rustic and now look like this: “question: why does restic do X? answer: in rustic …”.

That’s one more example that restic and rustic are too similar as names (just as predicted). Spraying rustic links all over the restic forum and github issues obviously doesn’t help and just amounts to advertising, so please stop that.

4 Likes

I’ve opened https://github.com/restic/restic/issues/4371 .

Judging from the number of issues opened after enforcing that lock files are refreshed before they become stale, an upload speed of just 50KB/s (or 400kbps) isn’t entirely uncommon. At that speed 16MB files already take 5 minutes to upload. With a larger pack size, an interrupted backup would loose even more work.

@MichaelEischer I hope I was clear to give some ideas about restic improvements and this thread is about this. IMO it is even better if there is already an implementation at hand to test out these ideas, but you may think different about this.

Of course you are free to ignore my ideas or suggestions. Of course you are also free to ignore them just because they are already implemented in rustic which is some kind of competitor for restic. BTW: You should try it out, it’s a great piece of software! (<- And THAT was an advertisment :wink: )

This topic is posted under “Features and Ideas”, however I agree with @MichaelEischer that it is not appropriate to promote a “Competing Product”. Just because both products are open source does not make it appropriate.

I appreciate the assistance you offer in this forum, but I believe you should limit your comments for competing features to simply say something like “Feature x is not implemented in restic.” Maybe even spend some time to correct statements found using google like “Rustic is a wrapper around the Restic backup program.”, which may lead to people looking here for support (taken from https://crates.io/crates/rustic-backup),

FYI, that crate has absolutely nothing to do with GitHub - rustic-rs/rustic: rustic - fast, encrypted, and deduplicated backups powered by Rust

The source code indeed confirms that it is a Rust wrapping around the Restic Go binary: rustic/src/restic.rs at master · bnavetta/rustic · GitHub (for whatever reason).