Slow restoring speed

Hello,

I’ve been using restic for several months, saving a daily snapshot of four PCs (all of them running Windows 10 Pro) into two different repos (which are running a rest server also under Windows 10 Pro), all of them placed on the same 1Gbps LAN.

I’m very happy with my setup since restic is really fast taking new snapshots: the fastest PC (which has 4th generation a Core i7 and 16 GB of RAM) could add 12GB to the fastest repo (which is on a PC with a 6th generation Core i5 and 16 GB of RAM) in 14 minutes (which equals around 14.6 MBps), and the slowest PC took 6 minutes to add 2GB (which is equivalent to 5.7 MBps).

I know these results are a bit relative since they depend of a lot of factors like the amount of changed information, the raw power of the involved PCs, the bandwidth of the connections between them, etc. Anyway, this is a really good solution for my needs, so in first place I want to say thank you to the restic community for developing and supporting such a wonderful tool.

However, when I restore from the fastest repo to my fastest PC I barely get 2.4 MBps (obviously on a slower PC is even worse). I’ve also tried mounting the hard drive that holds the repo locally on the fastest PC, even from a live Linux distro, but I get the same results: it starts restoring at 6.2 MBps on the first minute or two, but then it gets slower and slower until reaching those 2.4 MBps.

By the way, I’ve set Windows Defender to exclude restic (as explained here), and I prune the repos every one or two weeks to keep only the last 6 daily, 4 weekly, 6 monthly, a 1 yearly snapshots.

On the other hand, it really surprises me that both the PC of the repo and the one that I restore the snapshot to, barely use around a 2% of their CPU and 230 MB of memory for the restic process, so apparently there’s more than enough resources available to improve those results.

I would like to keep using restic to backup my PCs, but I think that 2.4 MBps are OK if you just need to restore a few gigabytes (it would take 1 hour to restore 8.4 GB), and not that great if you have to restore some hundreds of them.

My questions are:

a) Is this restoring speed normal for a domestic setup?

b) With such a low CPU a memory usage, shouldn’t it be faster?

c) Has anybody any tips to improve this slow restoring speed?

Thank you very much in advance.

What disks do you have in the repo and client machines, and what IOPS and disk load are you seeing when you restore?

I’m using a 2.5" SATA3 SSD on the restoring machine, which has these benchmarks (according to CrystalDiskMark):

-----------------------------------------------------------------------
CrystalDiskMark 6.0.2 x64 (C) 2007-2018 hiyohiyo
                          Crystal Dew World : https://crystalmark.info/
-----------------------------------------------------------------------
* MB/s = 1,000,000 bytes/s [SATA/600 = 600,000,000 bytes/s]
* KB = 1000 bytes, KiB = 1024 bytes

   Sequential Read (Q= 32,T= 1) :   551.237 MB/s
  Sequential Write (Q= 32,T= 1) :   492.553 MB/s
  Random Read 4KiB (Q=  8,T= 8) :   401.607 MB/s [  98048.6 IOPS]
 Random Write 4KiB (Q=  8,T= 8) :   345.674 MB/s [  84393.1 IOPS]
  Random Read 4KiB (Q= 32,T= 1) :   111.235 MB/s [  27157.0 IOPS]
 Random Write 4KiB (Q= 32,T= 1) :    98.476 MB/s [  24042.0 IOPS]
  Random Read 4KiB (Q=  1,T= 1) :    24.843 MB/s [   6065.2 IOPS]
 Random Write 4KiB (Q=  1,T= 1) :    62.809 MB/s [  15334.2 IOPS]

  Test : 1024 MiB [C: 63.6% (50.4/79.2 GiB)] (x5)  [Interval=5 sec]
  Date : 2019/07/06 13:57:49
    OS : Windows 10 Professional [10.0 Build 17134] (x64)
    Restoring machine - SSD SATA3

On the fast repo, I’m using a 2.5" external mechanical hard drive in an USB3 enclosure, which has these benchmarks:

-----------------------------------------------------------------------
CrystalDiskMark 6.0.2 x64 (C) 2007-2018 hiyohiyo
                          Crystal Dew World : https://crystalmark.info/
-----------------------------------------------------------------------
* MB/s = 1,000,000 bytes/s [SATA/600 = 600,000,000 bytes/s]
* KB = 1000 bytes, KiB = 1024 bytes

   Sequential Read (Q= 32,T= 1) :   147.629 MB/s
  Sequential Write (Q= 32,T= 1) :   125.392 MB/s
  Random Read 4KiB (Q=  8,T= 8) :     1.314 MB/s [    320.8 IOPS]
 Random Write 4KiB (Q=  8,T= 8) :     7.458 MB/s [   1820.8 IOPS]
  Random Read 4KiB (Q= 32,T= 1) :     1.269 MB/s [    309.8 IOPS]
 Random Write 4KiB (Q= 32,T= 1) :     7.315 MB/s [   1785.9 IOPS]
  Random Read 4KiB (Q=  1,T= 1) :     0.483 MB/s [    117.9 IOPS]
 Random Write 4KiB (Q=  1,T= 1) :     4.782 MB/s [   1167.5 IOPS]

  Test : 1024 MiB [S: 0.0% (0.4/4656.9 GiB)] (x5)  [Interval=5 sec]
  Date : 2019/07/06 14:39:25
    OS : Windows 10 Professional [10.0 Build 17134] (x64)
    Repo machine - USB3 HDD SATA3

It looks like I’ve got a bottleneck caused by the HDDs which store the repos.

I’ve tried to move the smallest repo to an external SSD, and this way I’m getting around 30MBps when restoring (using just 10% of the CPU and 200MB of RAM), which is much faster (almost 10 times more) than using the repo from the external HDD.

However, I’m still unsure about the root cause for this speed. If I copy the repo from the external SSD to my internal SSD using rclone, copying only 1 file at a time (I don’t really know if restic threads the restore process, so I’m using the worst possible scenario), using this command:

rclone sync -P --transfers=1 SOURCE DESTINATION

then I get an average speed of 115MBps, which is almost 4 times more than the 30MBps that restic achieves. Taking into account the 10% CPU usage, shouldn’t restic be faster when restoring?

So, my questions still are:

a) Is this restoring speed normal for a domestic setup?

b) With such a low CPU a memory usage, shouldn’t it be faster?

c) Has anybody any tips to improve this restoring speed?

Thanks.

My suspicion is that copying the repo will require far fewer seeks. Restic, I believe, restores using multiple threads. This is good for CPU but bad for I/O when the repo is stored on an HDD. When restoring many small files, restic is also not going to read entire packs, but rather just read the required blob out of a larger pack so the access is significantly less linear even on a single thread.

What restic version do you use? Can you try restore using restic built from https://github.com/ifedorenko/restic/tree/out-of-order-restore-no-progress branch? The problem you describe, i.e. restore is fast at the beginning, then slows down, sounds very much like https://github.com/restic/restic/issues/2074 and that branch is supposed to fix underlying restore inefficiency.

Latest 0.9.x (and the branch) restore implementation is multi-threaded and should read each repository file at most once, using single continuous range request. You should see restore performance comparable to rclone, at least if the target filesystem can handle writes.

I was using official restic 0.9.5 compiled with go1.12.4 on windows/amd64, but I’ve tried the branch you mention @ifedorenko, and I have to say… what a difference! Your branch is almost 3 times faster than the official one in some cases, so thank you very much.

For my first test, I restored a snapshot which only contains one big 13.4GB file, and these were the results:

  • From an external SSD repo to an internal SSD:

    • Your branch writes the info at an average rate of 86.8MBps, taking 170MB of RAM, around 24% of my CPU, and 02:37. Curiously enough, there are around 30 seconds at the beginning that the process seems to be idle (no CPU usage, no disk I/O). But after that, it burst out writing at peaks above 160MBps.

    • The current official restic release achieves an average writing speed of 30.9MBps, takes 152MB of RAM, around 9% of CPU, and 07:24 to complete the same restore job.

  • From an external HDD repo to an internal SSD:

    • Your branch writes the info at an average rate of 36.5MBps (with peaks at 65MBps), taking 148MB of RAM, around 7% of my CPU, and 06:16. There is also an initial 35 seconds delay before the the process actually starts to show some I/O.

    • The current official restic release achieves an average writing speed of 19.9 MBps, takes 154MB of RAM, around 9% of CPU, and 11:28 to complete the same restore job.

So in this first test, @ifedorenko’s branch is 3 times faster than official 0.9.5 release.

For the second test I restored a snapshot that stats (in restore-size mode) Total File Count: 23, and Total Size: 47GiB, and these were its figures:

  • From a remote repo (so the bottleneck in this case should be the network bandwidth) which is stored on an external HDD, served from a restic rest-server connected to the Internet with a 600Mbps (megabits per second) symmetrical connection, restoring to the internal SSD of a PC which also has a 600Mbps symmetrical connection:

    • Your branch writes the info at an average rate of 24.5MBps (with peaks at 45MBps), taking 588MB of RAM, around 5% of my CPU, and 32:41. This time there is a 2 minutes initial delay.

    • The current official restic release achieves an average writing speed of 10.1MBps, takes 530MB of RAM, around 15% of CPU, and 01:19:06 to complete this restore job.

In this case your branch performs 2 times better, which is also very significant.

In my third, and last, test I restored a snapshot which stats (in restore-size mode) Total File Count: 14147, and Total Size: 80GiB, yielding these numbers:

  • From an external HDD repo to an internal SSD:

    • Your branch writes the info at an average rate of 45MBps (with peaks at 65MBps), taking 372MB of RAM, around 12% of my CPU, and 30:04. This time there is no initial delay.

    • The current official restic release achieves an average writing speed of 32.4MBps, takes 346MB of RAM, around 7% of CPU, and 42:07 to complete this restore job.

This time it “only” performs a 28% better, which may be a little less impressive, but still relevant in my opinion.

Though my personal results should not be taken as a general rule (but I’d really appreciate if someone could confirm or replicate some of them), I think that it’s fair to say that these differences are not negligible, and that @ifedorenko’s branch improves the restoring performance of restic.

Are there any trade-offs? If not, would @fd0 please consider merging this branch into the main one, so the official releases can offer this boost in performance too?

Thanks a lot.

Current master guarantees all files are restored start-to-finish, while the branch does not. Start-to-finish is the most “intuitive” restore behaviour and makes it easier to see restore progress and manually recover in case of crashes. Start-to-finish comes at the cost of extra code and runtime memory. And it fundamentally cannot write to the same file concurrently (so your “13.4GB file” test was single-threaded in 0.9.5).

There is also a bug/inefficiency in master that runs O(N^2) lookup of file blobs, which results in very poor restore performance files larger that few 100s MB. I think it is possible to optimize this in start-to-finish implementation, but that will further complicate already complicated code.

Optimizing that already complicated code sounds like a long term target.

In the meantime, would it be possible to add a restore flag to use the out-of-order algorithm (something like --fast, or may be --out-of-order)? This way, those who prefer the predictability of the start-to-finish approach will have it by default (and even those who will eventually need it to manually recover from a crash as you suggest, will also be able to use it), while those who’d rather have the performance improvement offered by the out-of-order branch, could select it by using the new flag.

This is really up to @fd0, but personally I think choice between “fast” and “ordered” restore is too esoteric. And, to be frank, ordered restore is just plain over-engineering, there is no real usecase for it. Even for crash recovery, nobody is going to check individual files; “start over” is the most likely reaction to a crash during restore. (I implemented ordered restore, I can call it names :slight_smile:)

1 Like