What does repack-small do?

noeck · August 21, 2024, 11:35am

What is the exact meaning of --repack-small? prune -h explains it as

repack pack files below 80% of target pack size

The documentation does not tell about this flag. What is the “target pack size”? Are small files packed into packs which end up being smaller as a certain size or where do those small packs come from? When looking into the repository’s data folder, the files are about 18 MB in size. Apparently, it is 16 MiB. Is this the “target size”?

Why are those not repacked? Because it’s not worth the effort? What would be a good reason to do so?

akrabu · August 21, 2024, 9:28pm

Say you backed up initially with the default pack size (16 MiB). It isn’t a “hard limit” so yes, pack sizes could possibly be up to 18 MiB or so, but will generally “average out” to 16 MiB.

Now say you’ve started using a pack size of 128 MiB, either with “RESTIC_PACK_SIZE=128” or “–pack-size 128”

Finally, say you wish the old 16 MiB packs weren’t so… small… and you’d like to… repack them… to a larger pack size (128 MiB).

That’s what --repack-small does. It will repack packs smaller than the current RESTIC_PACK_SIZE or --pack-size setting (it’s worth noting switches always take priority over environment variables).

The main reason to do this is if you think you might run into issues with too many files (number of files, not size of files) with a certain backend. Or if your local backend is struggling to enumerate a high number of files in a folder (think an APFS volume on a spinning SMR drive, which is notoriously slow on directory enumeration ). Raising the pack size limit will stuff more files into one pack (it will pack multiple small files into one pack, and split overly large files to occupy the pack). By using a larger pack size, you’ll decrease the sheer number of packs necessary to store your data.

Downsize is, when you have to prune, you may need to rewrite more packs than you would with a smaller pack size - which may be costly or time consuming depending on your backend.

alexweiss · August 22, 2024, 8:05am

Note that you can also get small pack files if you e.g. run your backup very frequently and only small changes have to be added. At the end of the backup, pack files are written even if they are not completely “full”.
In this case repacking the so-generated small pack files into bigger ones can be also done using --repack-small.

noeck · August 22, 2024, 11:00am

In my repo the small packs are old packs (500 GB out of 2 TB). I switched to the repo version 2 when it was introduced and with --repack-small I get a similar prune result as with --repack-uncompressed, so the small and the uncompressed packs are about the same. Did the default pack size change at some point? I never changed it explicitly and used default compression settings (now auto). In another, more recent repo, the packs affected by --repack-small are rare (40 MB). I used it for the old repo now, just because it feels cleaner to use the compressed format throughout.

But my conclusion from your answers is that (as long as the number of files is not crucial) it does not really matter if some packs are small.

alexweiss · August 22, 2024, 11:23am

Small packs have more disadvantages:

slightly higher memory consumption (as all pack ids need to be stored in the index)
when restoring, (very) small packs may prevent some optimizations about fetching many blobs in a row, i.e. restore could be slower.

But in general it is not much of a problem having small or lots of packs. There are even advantages like lower probability that a pack gets partly-used after a forgetand might need repacking.