Testing Restic with ~4TiB Data

Hi!

I just tested restic on some data, I’d like to share the results and discuss some observations.

Setup

restic version: restic 0.16.3 compiled with go1.21.6 on linux/amd64
data: 121191 files, 3.869 TiB, mostly video and photo files
source machine: Ubuntu 22.04.3, ZFS, compression on (zstd, default), dedup off
destination machine: Synology NAS, ext4

The source machine connects to the destination machine via SFTP through a 1000Mbps network.
The repo is initialized without any parameters.

Results

The backup took about 42.5 hours to finish.

restic reported “3.655 TiB added to repo, 3.531 TiB stored on disk”.

With ncdu I manually measured that the source file (on ZFS) has apparent size 3.8TiB and on-disk size 3.6TiB. (The apparent size is different from the number reported by restic. I am not sure why, but I guess there might be a bug in my exclude filters)

There were 220568 files in the repo.

Observations

If I read it correctly, dedup saved ~200GiB (3.869TiB - 3.655 TiB) data, which surprised me.

Meanwhile, the final repo size on disk (~3.5TiB) is similar to the original data size on disk (~3.6TiB). I didn’t check whether ZFS and restic are using the same compression level, but I expected the restic repo size is smaller, because dedup already saved quite some bytes.

It also seems that the system was not fully utilized during the backup. I was not able to determine the bottleneck. I have checked that

  • The CPU usage is low on both machines
    • On the source machine there are 8 cores, but restic was using at most 200%.
  • The ssh process also used ~10% CPU
    • I was using a customized sftp.command, basically ssh with specific private key.
  • Network load was ~50MB/s
    • If I had to guess, I’d say the bottleneck is the network, but I have no idea.
  • Disks were not fully utilized, I knew the HDD on both machines can handle at least 100MB/s

Lastly, it seems that restic became completely idle from time to time.
The screenshot below shows CPU and Network usage.
Is it expected?

2 Likes

It turned out the source machine somehow regularly blocks the IP address of the destination machine for one minute. :rofl:
So the “idle problem” has nothing to do with restic.

1 Like

Just found the bug about SFTP.
After switching to rclone as the backend, now restic can fully utilize the bandwidth.

Thanks for sharing your results.
You might want to look into restic’s REST server (GitHub - restic/rest-server: Rest Server is a high performance HTTP server that implements restic's REST backend API.). That should provide better performance compared to sftp.

I run the REST server on a ds923+ via docker which works great. On LAN you could also consider using http which would not require encryption and might reduce CPU overhead even further.

But if rclone already works for you, no need to change, just an additional idea to consider.

Thanks! Yes I heard rest-server performs much better. I plan to give a try.