Nearly every forum thread that mentions GDrive as a backend has some kind of major issue with it working reliably or quickly. @fd0 maybe it’s time that we take the stance that GDrive is not recommended for use with restic. There’s options out there that are much better in terms of price, speed, and reliability.
Good idea, but where? In the manual, in the rclone section maybe?~
Maybe “not recommended” is too harsh, but we should have a document describing the issues with gdrive. See below for more details.
Well in terms of speed and reliability sure. But from a price perspective nothing beats 3$ for unlimited storage (when buying from a shady ebay seller)
Maybe when running
restic init against a GDrive backend we could display a warning?
If this is $3 per month then you can get 600GB for the same price on B2 and have far superior performance and reliability.
If it’s $3 one time, then yeah, you can’t beat that price – but I’m not sure I want my backups on something that I obtained from a “shady seller.”
My own years-long experience with Google Drive (after also years-long experience with both Amazon S3 and Amazon Cloud Drive) is that yes, GD can sometimes be less reliable and slower, but not significantly so and certainly not all the time, and that the eventual performance/reliability issues are more than offset by its convenience and price. I use GDrive here exclusively and would not even dream of going back to Amazon.
@fd0, please do not do this. This would in fact make GDrive into a “second-class citizen in the restic world” and also restic into a “second-class citizen in the GDrive world”, and would make it very hard for those of us trying to use either in a business setting to continue to do so, besides opening us up to a lot of blame by “choosing the wrong software” (ie, restic).
And besides, technically I don’t think it makes a lot of sense: as restic does not interact directly with GDrive but only through rclone, any and all restrictions on GDrive’s “worthiness” should be done by rclone and not restic.
PS: my EDU account is legit, got it from the university where I graduated not from ebay nor anywhere else, and the GSuite business account I use also legit, and over 5 users so it’s officially unlimited instead of depending on Google to ‘look to the other side’. Not trying to ‘judge’ anyone, but perhaps a lot of the folks complaining about GDrive are using those “shady ebay accounts” (not saying it’s your case, @Zottelchen, but even if it is, no offense meant) and therefore are getting shafted more often than the ones going the official route?
I am also against listing Google Drive as “not recommended”.
Maybe a cautionary note on performance is appropriate.
I also got my free unlimited Google account from my university, and have been using it as restic backend for nearly a year without any problem. I don’t want to see either tool being viewed as second class citizen among the other tool’s users.
My take on it…
- GDrive makes themselves a second-class citizen by implementing strong rate limits and altering behavior based on user agent.
- GDrive is billed as a human-operated collaboration system moreso than a machine-operated storage system.
- Reliability is paramount for any backup system; GDrive is unreliable for this use case by design. We should not encourage people to store their backups in a system with reliability issues.
I think users should decide for themselves which backend and provider they use. We don’t have to maintain any special backend code for gdrive, as the heavy lifting is done by rclone, so this backend does not need any engineering resources.
But problems with gdrive come up in the forum and the issue tracker regularly (and we spend time responding to them), so I’d like to have a list of typical issues with gdrive collected somewhere so that we can start pointing people to it. How about that?
That’s an interesting argument I haven’t seen before. I can understand that.
While technically true, it doesn’t really affect casual users who’re only backing up their not-so-much personal data.
For casual / not-so-technical users, Google Drive is much more convenient and intuitive than cloud storage services like B2. They (like me) probably already have some spare Google Drive storage. They are looking for a backup tool that works well with Google Drive, not looking for a backend that works well with restic. I was certainly like that. A “not-recommended / discouraged” statement on restic’s side can be viewed as restic not supporting Google Drive, and may drive people away from restic (instead of Google Drive).
I fully support this.
Fully agreed on both counts.
Great idea, @fd0! Then the restic community can centralize all that info in a single place. Even better, why not do that for all backends? It’s not like it’s just GDrive having issues (for example, I had a lot of issues with Amazon Cloud Drive back in the days, and I’ve been reading here in the forum about the current issues with Backblaze B2). We could have a single page on the ‘wiki’ (ie, https://restic.readthedocs.io which apparently can be edited via github – will have to look this up and see how it works) with a ‘section’ for each backend. How about that?
When using rclone with Google drive you should get your own Client id. Doing so solved my issues with rclone and gDrive. See this for instructions to get your own client id. I don’t think there is enough evidence to discourage people from using Google Drive. A few users have problems and complain in the forum but thousands may be using it without issue and say nothing. ;>)
You mean, everyone who is using GDrive/rclone/restic haven’t done that already!?
Seriously, this is another piece of info that should figure proeminently on that readthedocs page I’ve mentioned above.
I started using
restic because I was looking for something that could work with Google Drive. I stopped using Google Drive as my main backup service but I still have a repository stored there with daily snapshots; something like a “spare repository” just in case.
I don’t think you should discourage people from using it but “warn them” (?) about possible issues, as @fd0 said. Also, the @durval idea to do something in the docs about every possible backend sounds great. If possible, include steps on how-to setup every backend (something simple) and in the Google Drive section advice to follow the rclone instructions on how to create your own client_id, possible errors and how to determine if those errors comes from restic, rclone or it’s just the rate limit or any other limitation established by Google. Not to go deep into details but something like: “error XXX means you hit the google rate” may put the person to look for answers on how to deal with that. Also, maybe link the forum solved threads about problems with Google Drive could be helpful.
In my most humble opinion, I do share the @cdhowie point of view about Google Drive but if you discourage people from using it, they may see it as “Ok, restic does not work with Google Drive” and look for another backup tool. Instead, if you say something like: “restic works with Google Drive but you may run into some Google limitations and here’s how you can deal with it”, people then could understand that Google Drive is not optimized to work with, not just restic but any other backup solution outside their own “backup tool” that only works with Windows (and Mac?), I think.
Google Drive is not the optimum solution for storing backups and I think it never will be but if I have some extra space in there to store some backups, I will figure out how to do it.
Especially if we point out that the design problems are with Google Drive and that any backup tool is likely to run into similar problems (unless they are doing single-file backups like tar incremental archives).
Properly supporting Google Drive targets users in companies that are using Google Suite with unlimited Google Drive Storage: https://gsuite.google.com/pricing.html
Hi! rclone author here
We’ve added several nudges to the user when setting up a backend for drive with rclone to use their own key. One in the docs, and one in the configuration process.
Google used to be happy about raising the queries per second of the rclone key whenever I asked, but they started refusing a couple of years a go which means that the rclone key is permanently maxed out
Drive seems to have a fundamental limit on files per second - from the rclone docs:
Drive has quite a lot of rate limiting. This causes rclone to be limited to transferring about 2 files per second only. Individual files may be transferred much faster at 100s of MBytes/s but lots of small files can take a long time.
So lots of small files are really bad for google drive performance. The individual blobs of data restic uploads seem to be about 10MB - ideally these would be much bigger with google drive.
Uploads of large files with google drive can be speeded up a lot with using the chunk size parameter at the cost of using more RAM. However the files uploaded with restic at the moment are too small to benefit at the moment.
And finally the latest beta of rclone will use
--fast-list when interacting with google which I think should speed up the listings of the objects a lot. Not 100% sure what effect that will have on restic, but it should be good!
Fancy meeting you here
I can personally attest to that – in fact, that was what led me to restic.
The individual blobs of data restic uploads seem to be about 10MB - ideally these would be much bigger with google drive.
Great tip! And hopefully should be easy to implement. Actually, I did some grepping in the restic source tree and found this:
./internal/repository/packer_manager_test.go:51:const maxBlobSize = 1 << 20
weirdly enough, that would be 1Mi, not 10Mi… hummmrmrm… And looking at my repo’s “data” directory, I see a range of sizes starting at 118 bytes and going all the way to 51336177 (~51M) bytes… so perhaps it’s not so easy :-/
Anyway, which size would you think ideal? 100MiB? Less? More?
So, once we up the blob size as discussed above, we could benefit by upping
--drive-chunk-size when calling
rclone… I see the default is 8M, what would you suggest we start with?
Here is a quick example showing that increasing the blob size from 10MB to 100MB might give us a 4x speedup and increasing the chunk size from 8M to 32M another 20%.
I left the default --transfers as 4 so rclone uploads that many files at once. When using
rclone serve it is up to restic how many transfers are done at once and I’m not sure what the answer to that is.
(I did part of a test with 250M chunks which was running at abou 75MB/s but I’ve filled the quota of my drive and even though I’ve deleted stuff Google won’t let me upload anything at the moment!)
10 MB files
$ rclone size 1GB-of-10MB-files Total objects: 100 Total size: 1000 MBytes (1048576000 Bytes)
default chunk size 8M
$ rclone sync -P 1GB-of-10MB-files TestDrive:1GB-of-10MB-files Transferred: 1000M / 1000 MBytes, 100%, 15.537 MBytes/s, ETA 0s Errors: 0 Checks: 0 / 0, - Transferred: 100 / 100, 100% Elapsed time: 1m4.3s
32M chunk size
$ rclone sync --drive-chunk-size 32M -P 1GB-of-10MB-files TestDrive:1GB-of-10MB-files-2 Transferred: 1000M / 1000 MBytes, 100%, 17.058 MBytes/s, ETA 0s Errors: 0 Checks: 0 / 0, - Transferred: 100 / 100, 100% Elapsed time: 58.6s
100 MB files
$ rclone size 1GB-of-100MB-files Total objects: 10 Total size: 1000 MBytes (1048576000 Bytes)
default chunk size 8M
$ rclone sync -P 1GB-of-100MB-files TestDrive:1GB-of-100MB-files Transferred: 1000M / 1000 MBytes, 100%, 51.124 MBytes/s, ETA 0s Errors: 0 Checks: 0 / 0, - Transferred: 10 / 10, 100% Elapsed time: 19.5s
chunk size 32M
$ rclone sync --drive-chunk-size 32M -P 1GB-of-100MB-files TestDrive:1GB-of-100MB-files-2 Transferred: 1000M / 1000 MBytes, 100%, 60.730 MBytes/s, ETA 0s Errors: 0 Checks: 0 / 0, - Transferred: 10 / 10, 100% Elapsed time: 16.4s
BTW I discovered a post where @fd0 describes how to increase the pack size, so if you want to experiment with something that would be the thing!
4x + 20% would be hitting all the bases here seriously, my current problem is that, to update 24h of changed data on my ~61.7M files / ~25.51 TiB repository,
restic backup is taking almost 24h, and so running the risk of overwhelming my backup window and starting to pile up. To be on the safe side, I need to reduce it by 50% – so ‘just’ a 2x improvement would be enough.
Examining the above and also the restic Design Document, I can see it’s pack size (and not blob size) which needs to be enlarged.
Very encouraging that @Pneumaticat reported back in the above thread saying that, just by changing its pack size, restic started “maxing out my [his] current connection to Google Drive”, looks just like the ticket here.
Unfortunately I can’t start testing this right now because the machine where I have enough memory to run restic with this repo is almost 24x7 already running
restic backup in production during the week – and this weekend, I need to test and document the
restic restore procedure. But next weekend I should be able to patch my restic tree here and try this out. As this is starting to get out-of-topic here, I’m moving the discussion to the topic you pointed.
Thanks again, Nick!