Is there a way to set the Min and/or Max Chunk/Pack size when Init or Backup to a Repository?
Thereās no way to do that without changing the code. The sizes are chosen as a compromise, originally to accommodate local/sftp backends. You can play with the constant values in the code.
What are you trying to do? Whatās your use case?
No worries. Iād come across a post that talked about it and it seemed like it might have been a hidden flag or something, but was probably just Forked code.
The main reason was because I have some large Repos that are made up of predominantly very large ProRes video files. These files āneverā (maybe in some distant future) get deleted, and only more get added. They do however get reorganized and renamed.
Having them split up into 1-8MB pieces makes using Cloud storage quite slow and restrictiveā¦ Iām currently using Wasabi, but was hoping to migrate to using Google Drive. Googleās Team Drives have a limit of 400,000 files per drive, and all my Repos already have > 1m files. Iām not sure if Google My Drive has the same limitation, but figured it would be worth getting ahead of the growth issue in general.
Forcing the file sizes to 100-250MB a piece seemed like it would probably help a lot for my situation. Although I dunno what this would mean for the handful of smaller project/meta files that do change occasionally.
For me at least, its only the Repo that has backups from my Laptops that it makes sense having smaller chunks, because files on there change often. Most of my other content is Static.
The bigger issue with large pack sizes is that even tree objects are stored in packs. When you forget some snapshots and prune, restic tries to discard all unused objects, which requires the packs containing these objects to be rewritten and consolidated.
If you have just a few large monolithic packs, prune is going to be a very painful process as it will likely have to rewrite most of the packs in the repository when you do your weekly/monthly prune.
Youād probably have to do some testing to determine the best pack size that both meets your goal of having fewer files, but doesnāt make prune operations have to download most of the data in the repository.
On the other hand, if you never delete anything, you might not really need to delete any snapshots, as you wouldnāt recover very much space, so the prune issue might be a moot point.
Good point, I didnāt realize about the trees. Iāll see if I can trackdown and play with the values to see what kind of difference it makes for me.
Please report back if you do! Iām interested in your experiences.
I have a similar use case as @jonreeves - I want to back up few TB of data to a hierarchic storage (national infrastructure for scientific computations) that only allows 200k files per user. Using the standard pack size, this is not possible. I would like to ask the following:
- Has there been any development regarding the pack size?
- Is it ādangerousā to fiddle with the pack size in source from the perspective of loosing/corrupting data?
- @jonreeves - how did the testing go - did you encounter any problems?
I modified the Pack Size to be about 250MB (most of my files are immutable and are between 100MB-2GB in this repo) and have been using it for the last month. Iāve stored about 4.7TB with it and havenāt had any issues so far doing multiple backup
, check
, prune
or restore
commands. I did note that using āmountā crashed out every time but it may not be anything to do with the Pack Size, I havenāt tried mount
on the official binary.
I havenāt tried to restore
using the official binary yet to see what happens, but it didnāt look like a restore
would complicate anything, only the commands that write.
Observations so farā¦
- File Count: As expected, far fewer files. So thatās good.
- Transfer Rates: Much quicker sending to Google Drive and Wasabi.
- Check Times: Seem a little quicker, but Pruning still took a few hours surprisingly.
- Waiting Times: It seems like packing has to finish before an upload can start, so I end up with a longer idle time between each transfer while the packer works. Means the overall transfer rate takes a hit, but its still better. (would be good to have the packer doing the next pack while the upload is going, or upload as its packing like Duplicacy)
- Moving the Repo: Using Rclone to download the whole repo was massively faster because of the Pack Size too. But moving it between a local machine and a NAS was where it really helped because those transfers were not parallel.
So far so good. I plan on using this moving forwards at least for this Repo and the few others I have that are mainly immutable.
Note:
Its worth noting, I started the repo from scratch again after I modified the pack size. I did not continue and existing one.
Iām about to start my own experimenting with minPackSize, and found this thread. A few questions:
Would it be too difficult to have two different pack sizes, one for tree objects and other for data objects? That way we could tune them separately (perhaps keeping the treesā minPackSize as is, and enlarging the dataās).
My backend here is a āunlimitedā GoogleDrive from a GSuite business account, fully legitimate (ie, not one of those shady EDUs some people use, and it has over 5 users so I donāt depend on Google looking to the other side). Therefore I have very little incentive to delete anything.
But, could never deleting anything cause any issues for restic, apart from the obvious (eg, restic snapshots
taking longer as there are more snapshots to list)?
Thanks in advance,
ā Durval.
The biggest issue that comes to mind is memory consumption, as how much memory restic uses scales approximately linearly with the number of objects in the repository. This impacts most operations (backup and prune, most notably).
Hello @cdhowie,
Thanks for the info, this looks like something I should prepare for, as restic backup
currently uses over 50% of the backup serverās total RAM ā¦
My plan is to monitor restic backup
usage and as soon as it crosses some reasonable threshold, start restic forget
āing snapshots, oldest-first.
Question: the way I understand it, restic forget
is pretty fast ā but would it suffice to bring memory consumption during restic backup
down ? Or would I need to do a restic prune
(which I understand is really slow and would ā due to locking ā prevent any restic backups
to run in the interim). As our Google Drive is legitimately unlimited (GSuite account with over 5 users), I would rather not do a prune if I can avoid it.
Cheers,
ā Durval.
No. forget
only removes snapshot files. It doesnāt remove any objects.
Yes, you would need to prune.
Ouch! this is going to be painfulā¦
UPDATE: seems that not doing restic forget
plus restic prune
periodically will also make restic backup
run progressively slower ā Iāve created a new topic to discuss this, please see it here.
@jonreeves how did you set min pack size?