I consistently get slow speeds of ~4 MB/s (half of my line speed) when backing up to Google Drive through restic and rclone. I suspect it’s because restic uploads a lot of tiny files, which is a worst-case scenario for drive. I’ve tried increasing the number of connections with -o rclone.connections=64, but it doesn’t seem to help: restic doesn’t appear to use the concurrency.
Hi @Pneumaticat it might be worth testings some Drive uploads as a baseline to compare against. I upload a log to Drive and it has some pretty heavy rate limiting of both data and API calls. My upload line speed is 970 MB/s but I can only upload even large files to Drive at a tiny fraction of that speed.
You could try copying your restic repo back and forth from Drive with rclone (no restic involved). That would test with the same file sizes. If you get the same slow speed then I would guess you are correct, it is the file sizes that limit restic speed.
Maybe there is (or could be) some option to increase the block sizes restic uses? If you did that you might get less de-duplication but maybe faster overall speed.
No, not easily. And I’m very reluctant to exposing this as a user-configurable option. If you want to experiment with it though, it’s easy to change in the source code here:
You can try setting minPackSize to something much larger than the default, like 64MiB (or even 256MiB), and try that. I think restic will cope just fine with larger packs, but I haven’t tried it in a long time. So please report back!
I think there’s no one-size-fits-all for packs, there are several trade-offs to decide and for now we go with a small pack size by default. In the long run I’d like to use a dynamic value which increases the pack size when restic detects that the backend connection has a high bandwidth.
Which I’m pretty sure is maxing out my current connection to Google Drive at this time (it tends to vary).
I haven’t noticed any other adverse effects of changing the pack size from my few minutes of usage; restic init and backup appear to work fine. I’ll report back again after using it for a little while longer.
Thank you for all your help, and your awesome work on restic!
Looks like you are in part hitting against the pretty aggressive Google Drive rate limiting. Although it is not really a solution to this issue, you could consider using a B2 or S3 account for your backups. Those services simply charge you a few cents for high API rates, rather than slowing you down. You may be trying to leverage a Drive ‘unlimited’ storage plan. Even so, B2 storage is pretty cheap, and for less drama and fewer problems like this, it may be worth a couple dollars a month.
As mentioned here, I’m preparing to test this here – it’s kinda “life or death” as I really need to speed up my restic backup: it’s taking almost 24 hours to update just 24 hours of changed data, and I need to reduce it to 12 hours or less.
EDIT: not so much “life or death situation” anymore, as I managed to work around it by moving a large part of the backup to being updated only once per week, on Friday nights when it has the whole weekend to work. But I still would like it very much to speed this up.
@fd0 (or anyone with enough knowledge), can you please tell me:
Would a restic binary with a larger minPackSize be able to ‘interoperate’ on an already existing repository, ie read/write on it along with ‘standard’ (ie 4MiB PackSize) restic?
The way I understand it, packs are generated when blobs get written, so to realize the full benefit of upping minPackSize, the repository would need to be re-initialized and regenerated from the scratch, correct? Or would it be possible to somehow “repack” a current repository into a new, larger pack-sized one?
Also, @Pneumaticat, could you please give us an update? Are you still running restic with 256MB (or MiB, I presume) minPackSize? How is it working out for you? Have you tested restic restore? What about memory usage during backup/restore?
I believe so. The pack size is not fixed, rather there is a maximum. When building packs, restic will keep adding data until the size gets too big, then it will upload that pack and start building a new pack.
At the end of a backup, there’s usually not enough data left for a full pack and so a very small pack can be written with whatever is left.
All that to say: restic already needs to work with small and variable pack sizes, so changing this variable should not affect restic’s ability to work with existing packs.
My understanding is that uploads are throttled more heavily than downloads, so the existing smaller packs should not pose that much of a problem.
There is no way that I know of to rewrite everything. However, when you prune, any pack that contains an object that is no longer used will be repacked; if multiple packs contain an object that needs to be deleted, they will be combined. After several prune operations, you should see the average pack size in the repository get larger.
Yes, there’s no other place in the source which requires a small pack size, restic will just take the files as they are stored in the backend. It’s a trade off so that people with tiny upstream bandwidth can use restic, and I have plans to adjust the pack size based on the backend upload speed. But that’s not an issue I can implement short term, sorry about that.
The reason why you also see larger files is that restic serialises metadata (file names, list of IDs of the content, modes, timestamps) as JSON and saves that as a tree blob to a file. If the directory is very large (number of files or size of files), the JSON document may grow much larger than 4MiB. Restic will still upload and process such a file.
Btw, the constant is called minPackSize because that’s the size a file has to reach before it can be considered “full enough” to be uploaded to the repo.
Some of the constraints I had in mind (from the top of my head):
The transfer of a single file should finish in reasonable time, even for users with low upstream bandwidth. Truncated files are of no use, only successfully uploaded files will be considered by e.g. rebuild-index. This takes low-end embedded systems with e.g. 512MiB into account (not considering the ongoing problems with the index loaded into memory completely).
A single file should fit into memory easily, so we can load a file during check and decrypt and verify all blobs stored in it. So a file size of e.g. 1GiB is probably too much
The file size should not be too small, after all one of the ideas of bundling together blobs into files is reducing the number of files stored in a backend.
When designing the repository format, I only had the local and sftp backend in mind and did not think about latency at all. It was simple back in the days
I hope this helps understanding the constraints a bit better
It’s a minimum, the minimal size a file must have to be uploaded to the backend. And you’re right about the last pack file being smaller (in some cases at least).
Hey just curious about the current state of building from source. Ive been trying to build and the docker container has an issue with the flags github it uses. Ive been using the build.go to build the executible but changing the const doesnt seem to have an effect at all. Im working with a symmetric gigabit connection while restic maxes out at 100Mb upload btw.