Is there a way to set the Min and/or Max Chunk/Pack size when Init or Backup to a Repository?
There’s no way to do that without changing the code. The sizes are chosen as a compromise, originally to accommodate local/sftp backends. You can play with the constant values in the code.
What are you trying to do? What’s your use case?
No worries. I’d come across a post that talked about it and it seemed like it might have been a hidden flag or something, but was probably just Forked code.
The main reason was because I have some large Repos that are made up of predominantly very large ProRes video files. These files “never” (maybe in some distant future) get deleted, and only more get added. They do however get reorganized and renamed.
Having them split up into 1-8MB pieces makes using Cloud storage quite slow and restrictive… I’m currently using Wasabi, but was hoping to migrate to using Google Drive. Google’s Team Drives have a limit of 400,000 files per drive, and all my Repos already have > 1m files. I’m not sure if Google My Drive has the same limitation, but figured it would be worth getting ahead of the growth issue in general.
Forcing the file sizes to 100-250MB a piece seemed like it would probably help a lot for my situation. Although I dunno what this would mean for the handful of smaller project/meta files that do change occasionally.
For me at least, its only the Repo that has backups from my Laptops that it makes sense having smaller chunks, because files on there change often. Most of my other content is Static.
The bigger issue with large pack sizes is that even tree objects are stored in packs. When you forget some snapshots and prune, restic tries to discard all unused objects, which requires the packs containing these objects to be rewritten and consolidated.
If you have just a few large monolithic packs, prune is going to be a very painful process as it will likely have to rewrite most of the packs in the repository when you do your weekly/monthly prune.
You’d probably have to do some testing to determine the best pack size that both meets your goal of having fewer files, but doesn’t make prune operations have to download most of the data in the repository.
On the other hand, if you never delete anything, you might not really need to delete any snapshots, as you wouldn’t recover very much space, so the prune issue might be a moot point.
Good point, I didn’t realize about the trees. I’ll see if I can trackdown and play with the values to see what kind of difference it makes for me.
Please report back if you do! I’m interested in your experiences.
I have a similar use case as @jonreeves - I want to back up few TB of data to a hierarchic storage (national infrastructure for scientific computations) that only allows 200k files per user. Using the standard pack size, this is not possible. I would like to ask the following:
- Has there been any development regarding the pack size?
- Is it “dangerous” to fiddle with the pack size in source from the perspective of loosing/corrupting data?
- @jonreeves - how did the testing go - did you encounter any problems?
I modified the Pack Size to be about 250MB (most of my files are immutable and are between 100MB-2GB in this repo) and have been using it for the last month. I’ve stored about 4.7TB with it and haven’t had any issues so far doing multiple
restore commands. I did note that using ‘mount’ crashed out every time but it may not be anything to do with the Pack Size, I haven’t tried
mount on the official binary.
I haven’t tried to
restore using the official binary yet to see what happens, but it didn’t look like a
restore would complicate anything, only the commands that write.
Observations so far…
- File Count: As expected, far fewer files. So that’s good.
- Transfer Rates: Much quicker sending to Google Drive and Wasabi.
- Check Times: Seem a little quicker, but Pruning still took a few hours surprisingly.
- Waiting Times: It seems like packing has to finish before an upload can start, so I end up with a longer idle time between each transfer while the packer works. Means the overall transfer rate takes a hit, but its still better. (would be good to have the packer doing the next pack while the upload is going, or upload as its packing like Duplicacy)
- Moving the Repo: Using Rclone to download the whole repo was massively faster because of the Pack Size too. But moving it between a local machine and a NAS was where it really helped because those transfers were not parallel.
So far so good. I plan on using this moving forwards at least for this Repo and the few others I have that are mainly immutable.
Its worth noting, I started the repo from scratch again after I modified the pack size. I did not continue and existing one.