I recently changed my ISP which increased my upload speed from 2Mbps to 42 Mbps. I am currently using Dropbox to store a 2nd copy of my primary restic backup to a local attached USB drive. I use the restic (rclone) copy command to duplicate the USB repo to Dropbox. What I found was that restic cannot take full advantage of the new upload speed. However, if I use the rclone copy command outside of restic, it WILL take full advantage of the increased upload speed. My question: what is the disadvantage of going this route ? Does restic add further checking during the copy upload that I will not have with rclone alone ? It must be doing something more, because there is obviously more work going on with restic-rclone-copy then rclone-copy alone. This particularly test is a 334 GB repo that takes about a day with rclone alone, but will be much longer with restic-rclone. Both methods are equally tweaked on the configuration to achieve the full speed upload.
restic copy works “inside” the repository while rclone works “outside.” This means that
restic copy can work in multiple scenarios where rclone cannot:
- The source and destination repositories do not need to have the same master encryption key. The downside is that restic has to decrypt and encrypt each blob that it copies between repositories to account for this.
- You can selectively copy only some snapshots; restic will copy only the blobs used by those snapshots that don’t exist in the destination repository. This obviously requires more work: the source snapshots have to be crawled to find the blobs that don’t exist in the destination, then restic has to create new packs for those and transfer them to the remote repository.
In comparison, file copy/sync tools like rclone/rsync are only suitable when you are trying to duplicate an entire repository and all of its contents indiscriminately. Off-site mirroring is a good example of this.
On the other hand, if you wanted the off-site repository to have a different master key, or retain a different set of snapshots, then rclone/rsync would not be appropriate anymore and you would need to use
restic copy instead.
restic copy is way more flexible, but generally will require more CPU time, read IOPs, and memory to transfer the same amount of data.
rclone is also generally better about doing transfers in parallel. You can even specify the number of parallel transfers with the
--transfers flag. On the other hand, I believe restic limits the number of archiver threads to 2, so in practice you won’t see more than two parallel transfers, which could easily account for the slower overall speed.
Hello cdhowie ! Your response is very helpful, and I do appreciate it. It does prompt some additional questions that I hope you might be able and willing to answer. Is there a place where I can find what rclone switches are not available through the internal restic rclone ? For example, you mentioned that only 2 parallel threads are available through restic/rclone. I think the default in rclone alone is 4 and I set it to 8 to achieve the throughput. Also, I set the dropbox-chunk-size to the maximum of 149. Other tweaks were doubled. But, if restic/rclone does not accept these changes, then this could certainly explain part of the difference.
Also, if one uses the same master encryption key for both local and remote repositories, then could you not mix both rclone-only and restic-rclone backups to both repositories ? My idea is to use the rclone-only copy to gain the initial large transfer of data to the remote repository, than use the restic-rclone copy thereafter for the much smaller incremental changes that are not so time consuming. I guess I will just have to try and see if it works.
restic is such a powerful backup/restore tool. It has been very interesting to learn how to use it.
copy command, the transfer is not parallelized at all ATM. This might also explain the speed difference that @freelsjd is observing (given that there is enough CPU power for the crypto).
This should work in principle. But by using the “rclone-only” copy, you basically have two exactly identical repositories (also with identical repository IDs). The cache identifies the repositories by the rpos ID and I don’t know what effects you get with the copy command and identical repo IDs.
I would advise you to just wait the extra time for the initial copying (this is just a one-time action) using
restic copy. Also recently there is much work on progress to parallelize stuff in restic, so I think the parallelized data transfer within
restic copy will be just a matter of time to be realized.
Feel free to open an issue on github if this is an important feature for you!
FWIW, I have used this to copy between two repositories with identical IDs and everything seemed to work correctly.
I found a work around. I decided to take alexweiss’ recommendation and stick with what I know works consistently and stay within the restic internal copy command with rclone backend. This keeps me out of data quality issues.
To gain the speed I want, I can breakdown my large repo into logical smaller repos. So, instead of 1 large I now have 7 smaller. To gain my speed, I can run 2-3 of the copy jobs at a time. I have plenty of memory, cpu, and upload bandwidth to support 2-3 restic/rclone jobs at a time, Run it at night when I don’t need to use the machine interactively. Problem solved.
Yeah, this should work. If you ever want to make a “combinded” backup again, you may have a look at the state of this PR:
If you use a version where this is merged in, your “combining” backup will be most likely much faster. But if you “combine” without this, also just one backup is affected…
About the copy parallelization issue, there is actually
Which is hopefully merged soon. This does not yet parallelize the data copy, but I can work on a PR for that once #3106 is finished.