Should we discourage users from using Google Drive?

My take on it…

  • GDrive makes themselves a second-class citizen by implementing strong rate limits and altering behavior based on user agent.
  • GDrive is billed as a human-operated collaboration system moreso than a machine-operated storage system.
  • Reliability is paramount for any backup system; GDrive is unreliable for this use case by design. We should not encourage people to store their backups in a system with reliability issues.
2 Likes

I think users should decide for themselves which backend and provider they use. We don’t have to maintain any special backend code for gdrive, as the heavy lifting is done by rclone, so this backend does not need any engineering resources.

But problems with gdrive come up in the forum and the issue tracker regularly (and we spend time responding to them), so I’d like to have a list of typical issues with gdrive collected somewhere so that we can start pointing people to it. How about that?

2 Likes

That’s an interesting argument I haven’t seen before. I can understand that. :slight_smile:

1 Like

While technically true, it doesn’t really affect casual users who’re only backing up their not-so-much personal data.
For casual / not-so-technical users, Google Drive is much more convenient and intuitive than cloud storage services like B2. They (like me) probably already have some spare Google Drive storage. They are looking for a backup tool that works well with Google Drive, not looking for a backend that works well with restic. I was certainly like that. A “not-recommended / discouraged” statement on restic’s side can be viewed as restic not supporting Google Drive, and may drive people away from restic (instead of Google Drive).

I fully support this.

2 Likes

Hello @fd0,

Fully agreed on both counts.

Great idea, @fd0! Then the restic community can centralize all that info in a single place. Even better, why not do that for all backends? It’s not like it’s just GDrive having issues (for example, I had a lot of issues with Amazon Cloud Drive back in the days, and I’ve been reading here in the forum about the current issues with Backblaze B2). We could have a single page on the ‘wiki’ (ie, https://restic.readthedocs.io which apparently can be edited via github – will have to look this up and see how it works) with a ‘section’ for each backend. How about that?

Cheers,
– Durval.

When using rclone with Google drive you should get your own Client id. Doing so solved my issues with rclone and gDrive. See this for instructions to get your own client id. I don’t think there is enough evidence to discourage people from using Google Drive. A few users have problems and complain in the forum but thousands may be using it without issue and say nothing. ;>)

3 Likes

You mean, everyone who is using GDrive/rclone/restic haven’t done that already!? :astonished:

Seriously, this is another piece of info that should figure proeminently on that readthedocs page I’ve mentioned above.

1 Like

I started using restic because I was looking for something that could work with Google Drive. I stopped using Google Drive as my main backup service but I still have a repository stored there with daily snapshots; something like a “spare repository” just in case.

I don’t think you should discourage people from using it but “warn them” (?) about possible issues, as @fd0 said. Also, the @durval idea to do something in the docs about every possible backend sounds great. If possible, include steps on how-to setup every backend (something simple) and in the Google Drive section advice to follow the rclone instructions on how to create your own client_id, possible errors and how to determine if those errors comes from restic, rclone or it’s just the rate limit or any other limitation established by Google. Not to go deep into details but something like: “error XXX means you hit the google rate” may put the person to look for answers on how to deal with that. Also, maybe link the forum solved threads about problems with Google Drive could be helpful.

In my most humble opinion, I do share the @cdhowie point of view about Google Drive but if you discourage people from using it, they may see it as “Ok, restic does not work with Google Drive” and look for another backup tool. Instead, if you say something like: “restic works with Google Drive but you may run into some Google limitations and here’s how you can deal with it”, people then could understand that Google Drive is not optimized to work with, not just restic but any other backup solution outside their own “backup tool” that only works with Windows (and Mac?), I think.

Google Drive is not the optimum solution for storing backups and I think it never will be but if I have some extra space in there to store some backups, I will figure out how to do it.

2 Likes

Especially if we point out that the design problems are with Google Drive and that any backup tool is likely to run into similar problems (unless they are doing single-file backups like tar incremental archives).

1 Like

Properly supporting Google Drive targets users in companies that are using Google Suite with unlimited Google Drive Storage: https://gsuite.google.com/pricing.html

1 Like

Hi! rclone author here :slight_smile:

We’ve added several nudges to the user when setting up a backend for drive with rclone to use their own key. One in the docs, and one in the configuration process.

Google used to be happy about raising the queries per second of the rclone key whenever I asked, but they started refusing a couple of years a go which means that the rclone key is permanently maxed out

Drive seems to have a fundamental limit on files per second - from the rclone docs:

Drive has quite a lot of rate limiting. This causes rclone to be limited to transferring about 2 files per second only. Individual files may be transferred much faster at 100s of MBytes/s but lots of small files can take a long time.

So lots of small files are really bad for google drive performance. The individual blobs of data restic uploads seem to be about 10MB - ideally these would be much bigger with google drive.

Uploads of large files with google drive can be speeded up a lot with using the chunk size parameter at the cost of using more RAM. However the files uploaded with restic at the moment are too small to benefit at the moment.

And finally the latest beta of rclone will use --fast-list when interacting with google which I think should speed up the listings of the objects a lot. Not 100% sure what effect that will have on restic, but it should be good!

5 Likes

Howdy Nick,

Fancy meeting you here :slight_smile:

I can personally attest to that – in fact, that was what led me to restic.

The individual blobs of data restic uploads seem to be about 10MB - ideally these would be much bigger with google drive.

Great tip! And hopefully should be easy to implement. Actually, I did some grepping in the restic source tree and found this:

 ./internal/repository/packer_manager_test.go:51:const maxBlobSize = 1 << 20

weirdly enough, that would be 1Mi, not 10Mi… hummmrmrm… :thinking: And looking at my repo’s “data” directory, I see a range of sizes starting at 118 bytes and going all the way to 51336177 (~51M) bytes… so perhaps it’s not so easy :-/

Anyway, which size would you think ideal? 100MiB? Less? More?

So, once we up the blob size as discussed above, we could benefit by upping --drive-chunk-size when calling rclone… I see the default is 8M, what would you suggest we start with?

Thanks,
– Durval.

1 Like

Here is a quick example showing that increasing the blob size from 10MB to 100MB might give us a 4x speedup and increasing the chunk size from 8M to 32M another 20%.

I left the default --transfers as 4 so rclone uploads that many files at once. When using rclone serve it is up to restic how many transfers are done at once and I’m not sure what the answer to that is.

(I did part of a test with 250M chunks which was running at abou 75MB/s but I’ve filled the quota of my drive and even though I’ve deleted stuff Google won’t let me upload anything at the moment!)

10 MB files

$ rclone size 1GB-of-10MB-files
Total objects: 100
Total size: 1000 MBytes (1048576000 Bytes)

default chunk size 8M

$ rclone sync -P 1GB-of-10MB-files TestDrive:1GB-of-10MB-files
Transferred:         1000M / 1000 MBytes, 100%, 15.537 MBytes/s, ETA 0s
Errors:                 0
Checks:                 0 / 0, -
Transferred:          100 / 100, 100%
Elapsed time:      1m4.3s

32M chunk size

$ rclone sync --drive-chunk-size 32M -P 1GB-of-10MB-files TestDrive:1GB-of-10MB-files-2
Transferred:         1000M / 1000 MBytes, 100%, 17.058 MBytes/s, ETA 0s
Errors:                 0
Checks:                 0 / 0, -
Transferred:          100 / 100, 100%
Elapsed time:       58.6s

100 MB files

$ rclone size 1GB-of-100MB-files
Total objects: 10
Total size: 1000 MBytes (1048576000 Bytes)

default chunk size 8M

$ rclone sync -P 1GB-of-100MB-files TestDrive:1GB-of-100MB-files
Transferred:         1000M / 1000 MBytes, 100%, 51.124 MBytes/s, ETA 0s
Errors:                 0
Checks:                 0 / 0, -
Transferred:           10 / 10, 100%
Elapsed time:       19.5s

chunk size 32M

$ rclone sync --drive-chunk-size 32M -P 1GB-of-100MB-files TestDrive:1GB-of-100MB-files-2
Transferred:         1000M / 1000 MBytes, 100%, 60.730 MBytes/s, ETA 0s
Errors:                 0
Checks:                 0 / 0, -
Transferred:           10 / 10, 100%
Elapsed time:       16.4s

BTW I discovered a post where @fd0 describes how to increase the pack size, so if you want to experiment with something that would be the thing!

2 Likes

Hello Nick,

4x + 20% would be hitting all the bases here :slight_smile: seriously, my current problem is that, to update 24h of changed data on my ~61.7M files / ~25.51 TiB repository, restic backup is taking almost 24h, and so running the risk of overwhelming my backup window and starting to pile up. To be on the safe side, I need to reduce it by 50% – so ‘just’ a 2x improvement would be enough.

Wow, @ncw, as they say: signed, sealed, delivered! :slight_smile: seriously, thank you very much for digging this up. And of course thanks @fd0 for bringing it up in the first place.

Examining the above and also the restic Design Document, I can see it’s pack size (and not blob size) which needs to be enlarged.

Very encouraging that @Pneumaticat reported back in the above thread saying that, just by changing its pack size, restic started “maxing out my [his] current connection to Google Drive”, looks just like the ticket here.

Unfortunately I can’t start testing this right now because the machine where I have enough memory to run restic with this repo is almost 24x7 already running restic backup in production during the week – and this weekend, I need to test and document the restic restore procedure. But next weekend I should be able to patch my restic tree here and try this out. As this is starting to get out-of-topic here, I’m moving the discussion to the topic you pointed.

Thanks again, Nick!
– Durval.

1 Like

The docs on restic.readthedocs.io are generated from the files in the doc/ directory using Sphinx, there’s a helpful Makefile so installing Sphinx and running make html in the doc/ directory should build everything. Please create an issue on GitHub so we can improve the process if you get stuck.

I’d love to switch to something different (preferably Hugo) and hosting the docs myself instead of using Sphinx and ReadTheDocs, but I don’t have the time. If anybody wants to give this a try please let me know!

I’m not sure what you’re trying to say, did that sentence get truncated when posting? The linked page only lists pricing, so your post reads as “You should pay Google if you want to use features of their services”, is that what you wanted to say? :slight_smile:

If I look at my own backups I see an average pack size of 4.7MiB. The largest ones are about 11MiB.

I’d argue that this is too small for any cloud storage system as the round trip times will really be eating into the upload bandwidth. Uploading multiple packs at once will help with this of course. However smaller packs also cost more transactions which cost actual money on some storage platforms (eg s3, b2).

So an option to increase the pack size would

  • increase transfer speeds on remote backends (especially on google drive)
  • decrease costs (on all backends which charge per transaction)
  • produce fewer files (so less likely to breach the 400,000 file limit on team drives for example)

I’m not sure what the downsides would be though. More memory use for definite.

What do you think @fd0 could we make this a parameter to restic?

3 Likes

I believe the point is that, many organizations already use GSuite, which comes with unlimited storage. Users under such organizations have a strong incentive to use a backup solution that can utilize their (practically free) unlimited Google Drive storage. Supporting Google Drive (rather than discouraging it) can target these users for restic.

I myself am such a user (unlimited GD through university alumni account). Supporting Google Drive was a high priority when I was looking for a backup solution.

Full agree with you. My case is just like that.

Custom pack sizes are supported since 0.14.0 :

https://restic.readthedocs.io/en/latest/047_tuning_backup_parameters.html#pack-size

1 Like