Low Upload Bandwidth Scenario

Hello, I’m a home backup user looking to transition from Crashplan to B2. I’m looking at Restic and some other Linux tools to do automated uploads to B2, and I like how Restic handles the snapshots. However, I’m on a connection that has a very low 1Mbps uplink, so my first snapshot will be an ~ approx 200GB upload, followed by small nightly incremental snapshots. I also need to limit the window in which backups take place, because a saturated uplink during daily use is unbearable. So the ability to schedule during the early AM is a big must!

My question is, how would Restic handle a scenario where it will take about a month to upload the initial snapshot? Can it resume uploads? Will it break everything up into piecemeal chunks? How about when it’s about half way through the initial snapshot upload, say 10 days from now, and there are a few days of incremental snapshots already queuing up? How does that get handled?

My current Crashplan setup handles this nicely. It just has a % progress indicator that just resumes on a set nightly schedule. I have it uploading between 2 - 8am, and it’s perfect. I’m wondering if Restic would be similar before I take the dive. Thanks for any input!

Very much welcome to the restic community symtry!

Regarding scheduling, that’s something you manage outside of restic.

You can indeed start the initial backup, and in case it would be interrupted for some reason, the data that has already been backed up/uploaded will not have to be re-sent. Restic will see that those blobs of data are already in the repository, and not upload them again. So you can just restart it, and in the end you’ll reach the goal of having one initial snapshot.

Yes, restic splits up the data into chunks that it also deduplicates. It’s all described in the design document, but in short that’s how it works.

I don’t really see the point of running a backup for the same dataset until the intial one has been backed up, as restic will naturally just continue to back up the initial data that has not yet been uploaded - your latest changes might come last, or first, entirely depends on which of the files those changes are in.

My suggestion would instead be to start the initial backup for the “big” dataset, and then until that is done, run smaller backups for those specific parts of your dataset/filesystem that you know changed and really want to have backed up ASAP. That way you should be able to make sure that those specific files/folders are backed up, and the big one can be continued more long-term until it’s done. See what I mean?

However, an IMO even better approach is to start by backing up only those files/folders that you feel are most important to you, to get that done first of all, then once you’re done with all of that/those, you can run the backup for the entire dataset. That way you prioritize things better.

Yes, restic can run backups concurrently, I often run three backups to the same repo at the very same time. Not a problem. And since you can always cancel a backup and re-run it later, you can do that if you want to split the big backup run into several days of work but only certain times during the day.

You can use the --limit-download and --limit-upload arguments to control the maximum rates in KiB/s.

Hope that helps. Feel free to ask any follow-up questions!

Thanks for your detailed response, rawtaz - much appreciated!

The reason I wanted to keep running the same backup on the dataset while it’s still being uploaded is because I envisioned just making a simple cron daily job that would run on the same 2 directories (photos and docs). So understanding that on the first day, I will have triggered the backup of about 200GB, if I ran the same command the next night, I would want to the system to continue the upload and then I guess enqueue any other additions to the dataset to be uploaded when the initial part is done. To keep things hands-off after the initial setup and test.

From your response I gather I could initiate a daily job to work on only the photos (largest dataset by far), and then another independent job on the documents, which could be done in a day or two. My only concern is with the set of photos. I just wanted to make sure that if someone puts a few photos into the directory while I am still at 50% done (say day 10 of 20) the initial backup, that it will not corrupt anything. That the additional data would be managed appropriately and uploaded whenever the algorithm decides it’s ready for it.

And to manage the upload schedule, would you just build in a kill command into the cron script? Thanks again for your help!

Hey, there’s another way: If you have approximately the same space available locally, you could use restic to backup into a directory on some machine, and then use e.g. rclone to synchronize changes to B2 during the night. This way, you can start very quickly, and have daily snapshots, which will then eventually be uploaded to B2.

In the case of a restore, you can always use restic to access B2 directly (be sure to test that).

Did you discover the mount command yet?