Is there a way to set the Min and/or Max Chunk/Pack size when Init or Backup to a Repository?
There’s no way to do that without changing the code. The sizes are chosen as a compromise, originally to accommodate local/sftp backends. You can play with the constant values in the code.
What are you trying to do? What’s your use case?
No worries. I’d come across a post that talked about it and it seemed like it might have been a hidden flag or something, but was probably just Forked code.
The main reason was because I have some large Repos that are made up of predominantly very large ProRes video files. These files “never” (maybe in some distant future) get deleted, and only more get added. They do however get reorganized and renamed.
Having them split up into 1-8MB pieces makes using Cloud storage quite slow and restrictive… I’m currently using Wasabi, but was hoping to migrate to using Google Drive. Google’s Team Drives have a limit of 400,000 files per drive, and all my Repos already have > 1m files. I’m not sure if Google My Drive has the same limitation, but figured it would be worth getting ahead of the growth issue in general.
Forcing the file sizes to 100-250MB a piece seemed like it would probably help a lot for my situation. Although I dunno what this would mean for the handful of smaller project/meta files that do change occasionally.
For me at least, its only the Repo that has backups from my Laptops that it makes sense having smaller chunks, because files on there change often. Most of my other content is Static.
The bigger issue with large pack sizes is that even tree objects are stored in packs. When you forget some snapshots and prune, restic tries to discard all unused objects, which requires the packs containing these objects to be rewritten and consolidated.
If you have just a few large monolithic packs, prune is going to be a very painful process as it will likely have to rewrite most of the packs in the repository when you do your weekly/monthly prune.
You’d probably have to do some testing to determine the best pack size that both meets your goal of having fewer files, but doesn’t make prune operations have to download most of the data in the repository.
On the other hand, if you never delete anything, you might not really need to delete any snapshots, as you wouldn’t recover very much space, so the prune issue might be a moot point.
Good point, I didn’t realize about the trees. I’ll see if I can trackdown and play with the values to see what kind of difference it makes for me.
Please report back if you do! I’m interested in your experiences.
I have a similar use case as @jonreeves - I want to back up few TB of data to a hierarchic storage (national infrastructure for scientific computations) that only allows 200k files per user. Using the standard pack size, this is not possible. I would like to ask the following:
- Has there been any development regarding the pack size?
- Is it “dangerous” to fiddle with the pack size in source from the perspective of loosing/corrupting data?
- @jonreeves - how did the testing go - did you encounter any problems?
I modified the Pack Size to be about 250MB (most of my files are immutable and are between 100MB-2GB in this repo) and have been using it for the last month. I’ve stored about 4.7TB with it and haven’t had any issues so far doing multiple
restore commands. I did note that using ‘mount’ crashed out every time but it may not be anything to do with the Pack Size, I haven’t tried
mount on the official binary.
I haven’t tried to
restore using the official binary yet to see what happens, but it didn’t look like a
restore would complicate anything, only the commands that write.
Observations so far…
- File Count: As expected, far fewer files. So that’s good.
- Transfer Rates: Much quicker sending to Google Drive and Wasabi.
- Check Times: Seem a little quicker, but Pruning still took a few hours surprisingly.
- Waiting Times: It seems like packing has to finish before an upload can start, so I end up with a longer idle time between each transfer while the packer works. Means the overall transfer rate takes a hit, but its still better. (would be good to have the packer doing the next pack while the upload is going, or upload as its packing like Duplicacy)
- Moving the Repo: Using Rclone to download the whole repo was massively faster because of the Pack Size too. But moving it between a local machine and a NAS was where it really helped because those transfers were not parallel.
So far so good. I plan on using this moving forwards at least for this Repo and the few others I have that are mainly immutable.
Its worth noting, I started the repo from scratch again after I modified the pack size. I did not continue and existing one.
I’m about to start my own experimenting with minPackSize, and found this thread. A few questions:
Would it be too difficult to have two different pack sizes, one for tree objects and other for data objects? That way we could tune them separately (perhaps keeping the trees’ minPackSize as is, and enlarging the data’s).
My backend here is a ‘unlimited’ GoogleDrive from a GSuite business account, fully legitimate (ie, not one of those shady EDUs some people use, and it has over 5 users so I don’t depend on Google looking to the other side). Therefore I have very little incentive to delete anything.
But, could never deleting anything cause any issues for restic, apart from the obvious (eg,
restic snapshots taking longer as there are more snapshots to list)?
Thanks in advance,
The biggest issue that comes to mind is memory consumption, as how much memory restic uses scales approximately linearly with the number of objects in the repository. This impacts most operations (backup and prune, most notably).
Thanks for the info, this looks like something I should prepare for, as
restic backup currently uses over 50% of the backup server’s total RAM …
My plan is to monitor
restic backup usage and as soon as it crosses some reasonable threshold, start
restic forget'ing snapshots, oldest-first.
Question: the way I understand it,
restic forget is pretty fast – but would it suffice to bring memory consumption during
restic backup down ? Or would I need to do a
restic prune (which I understand is really slow and would – due to locking – prevent any
restic backups to run in the interim). As our Google Drive is legitimately unlimited (GSuite account with over 5 users), I would rather not do a prune if I can avoid it.
forget only removes snapshot files. It doesn’t remove any objects.
Yes, you would need to prune.
Ouch! this is going to be painful…
UPDATE: seems that not doing
restic forget plus
restic prune periodically will also make
restic backup run progressively slower – I’ve created a new topic to discuss this, please see it here.