Prune performance

Hi,

I run both Duplicacy and Restic(0.9.5) on the same dataset (~615GB). The former sends data to Backblaze B2 for offsite protection while Restic stores data on a local NAS device for instant recovery. While I know these are different apps, I find the following datapoints very interesting. As a reminder this is the same dataset with similar retentions:

Backup:
Duplicacy(B2): 5 minutes
Restic(Local): 7 minutes

Prune:
Duplicacy(B2): 18 minutes
Restic(Local): 4 hours and 12 minutes

I am shocked by the time difference on the prune process (Note that for restic, I run “forget” with “–prune”. Also remember that Duplicacy is dealing with remote object storage with B2 while Restic has full Gigabit Ethernet access to a local NAS device. Is there a problem here? Is there a way for me to optimize this or should I just expect 12x slower prune speeds?

TIA!

Questions

  • What operating system is your computer using?
  • What antivirus program are you using?
  • What is the key bottleneck? Is it using a lot of CPU, a lot of I/O, etc?

If using Windows, please open the resource manager during a prune and check what process(s) are accessing the disk. As I discovered in this thread antivirus programs can massively increase the time required to do backups / prunes. I found I had to give restic an exception to virus scanning. In the end I got rid of Avira and used Windows Defender, I can’t remember if I had to create an exception for that or not.

Hi,

Sorry for the delay in responding. I am running Restic on Ubuntu Linux. It is running on a small 64bit single board computer (SBC) with 4GB of RAM. It is not running anti-virus, and is very lightly loaded.

I actively track all CPU and memory usage. During backup, CPU usage peaks at about 60%, and typically stays around 50%. Memory is fine and never exceeeds 25%. I do not really have good I/O data and will explore tracking that more effectively.

In general, it does not feel like I am bottlenecking on the SBC, but I could be wrong. I still am wondering why the garbage collection process is so much slower on Restic…

The CPU iowait% is one metric you could use.

Restic has to read each pack header, then crawl every tree that’s used by any snapshot. If the repository is stored on an HDD then it’s likely that the disk is spending a lot of time seeking. Can you test on an SSD?

It’s possible that you’d see similar numbers with restic against B2 if the local storage has slow seek performance; if the requests to B2 are made in parallel then it’s entirely possible that B2 can outperform a local disk when it comes to fetching many small files. However, this is untested speculation.

I am seeing similar problems with running prune --forget using Backblaze B2 as a storage with a backup size of ~2TB - it takes over 8+ hours to perform this operation. I’m running on a Linux so no antivirus involved (nor local disk).

I can imagine. My speeds are very slow using local disk which is relatively high performance. I can only imagine how it would perform with cloud-based object storage. My Duplicacy instance screams with B2 which leaves me scratching my head since it is the same dataset. I guess that it is just a difference in the design…

My one concern is that my 4 hour prune is with 600GB of data, and so it is not that much information. It is likely to get worse over time, and I worry that at some point, it will become unmanageable.

I just ran restic prune / forget on my local Windows PC on a 279GB repository, which is sitting on a locally attached SATA 7200RPM spinning HGST disk. This PC is an i7 2600 with 16GB RAM and OS on SSD - it’s not a new PC but it’s fast enough for everything I do including video editing.

The repository backs up photos and videos - 40,000 files, about 45% very small xmp metadata, 45% raw files that are about 20MB each, 8% jpeg files a few MB, and 2% videos that vary between 20MB and 500MB.

The “forget” command took 10 seconds and the “prune” command took 6 minutes. My execution log from the console is below. I could see restic.exe was using all the CPU time - the virus scanner wasn’t doing anything, no other process was doing much.

Can you please do the following, showing us the output of each step?

  • run “restic version”
  • Copy your restic repo onto a local disk (SSD if you have enough capacity, spinning disk fine)
  • Run the forget/prune across the network, and provide the log similar to below
  • Run the forget/prune on the copy on the local disk, and provide the log similar to below
  • Run a disk benchmarking program against your local disk and NAS. I’d like to see throughput for both large files and with random I/O.

When you run restic prune please include the “-v” flag

Here’s my log

restic version
restic 0.9.5 compiled with go1.12.4 on windows/amd64

time
13:57:32.48

restic.exe --repo c:\Photos forget --keep-daily 7 --keep-weekly 8 --keep-monthly 24

time
13:57:41.26

restic.exe --repo c:\Photos prune
repository xxx opened successfully, password is correct
counting files in repo
building new index for repo
[5:25] 100.00% 57184 / 57184 packs
repository contains 57184 packs (226553 blobs) with 279.044 GiB
processed 226553 blobs: 0 duplicate blobs, 0 B duplicate
load all snapshots
find data that is still in use for 22 snapshots
[0:02] 100.00% 22 / 22 snapshots
found 226547 of 226553 data blobs still in use, removing 6 blobs
will remove 0 invalid files
will delete 1 packs and rewrite 0 packs, this frees 78.879 KiB
counting files in repo
[0:02] 100.00% 57183 / 57183 packs
finding old index files
saved new indexes as
remove 21 old index files
[0:00] 100.00% 1 / 1 packs deleted
done

time
The current time is: 14:04:09.01

I would love to run these tests, but I do not have enough local disk space to move the repository to a local disk. (I have a 32GB eMMC for the OS and local storage while I am protecting 600GB+.)

I do have the logs that you mention and can share them. I can also run some performance tests. Do you have a preferred disk performance tool in Linux? I can track something down but figured that I would ask if you recommend anything before I start searching. (I have used dd in the past.)

Can you plug in a USB disk? SSD if you have one, spinning disk fine, even if you have to make a smaller repo to test with. The main thing I’m trying to work out is if there is high latency, low bandwidth, or low transactions per seconds to the NAS which could cause a problem.

I don’t know any Linux disk testing programs sorry.

Not being able to run this on both local disk and the NAS significantly reduces the information we would get from this. Given my system runs two orders of magnitude faster than yours I expect it’s a problem with your setup rather than restic.

Can you run this test on a PC against both the NAS and a local disk inside the PC?

bonnie++ is more or less the standard disk benchmarking tool on Linux.

I am running an extended multi-day performance test with iozone which started before the Bonnie++ recommendation. I will share the results when they are complete…

That doesn’t seem necessary. I would cancel it and run a quick test.