Multiple (parallel) backups to the same repo. Good Idea?

Confuset · May 3, 2019, 8:06am

Just playing around with restic to backup my desktop windows pc with round about 650GB of stuff distributed on 3 different drives.

I noticed two things.
Backup of the 650GB for the fist time took 14Hours but at around 98/99 Mb/s throughput for my 100Mb/s network. Really nice work on that

Backup for the second time took around 1 Hours with mostly everything idle except the IO-queue on one disk was at 100% all the time. so the disk IOPS was probably the bottleneck here.

For now I do the backup in one call including all 3 disks. But since the backup mostly utilizes only one of them I was thinking about how to run all 3 backups in parallel.

So it is a good idea to call ‘restic backup’ 3 times for the same repo with different folders on different disks?

cdhowie · May 3, 2019, 12:37pm

To be honest, the best advice I can give is “try it and see.” It could theoretically reduce the time of the overall backup, depending on how the disks are connected and whether they share I/O bandwidth (e.g. USB) as well as what they are mostly doing (seeking vs. transmitting data across the I/O bus).

Another perspective is if this is a scheduled job running in the middle of the night for example, does it matter if it takes an hour? (Is it worth complicating the backup procedure to save a bit of time when nobody is using the system?)

Confuset · May 3, 2019, 1:00pm

mh. the problem is that I turn of my PC when not used, so this matters and can not run during the night…

So runnig ‘backup’ on the same repo is not a problem…will try that and see what happens…will report back

cdhowie · May 3, 2019, 1:01pm

You could have your backup script power off the computer when the backup finishes? Then you could start a backup and go to bed.

Dj0k3 · May 3, 2019, 3:36pm

I have a similar use case but in my case is just one drive. In my primary drive I have a lot of files filled with a lot of documents that I’m working on every day. It’s not much in size but there are a lot of small files and a lot of directories. There is just one directory that contains big files, so what I do is to make a backup for all directories excluding “E” directory (E as an example directory) which is the directory that contains big files. Because I want to easily find snapshots for this “E” I use --host to override the hostname and give it an easy name to remember and to apply forget rules that are different than the other directories. I run a backup first for all directories excluding “E” and then a backup only for “E” directory and never had a problem. It faster now because this “E” directory do not change a lot but when there are changes it can take a while (not hours, tho). I’m not in Windows, but I suppose it should work the same.

mano28193 · October 4, 2022, 10:58am

Dear All, I have similar question. I am new to restic and would like to know how many “Parallel operations” from the same device can be run without issue? considering the locking of the repository, could you propose ideal parallelization factor?