Prune performance

jl_678 · October 1, 2019, 2:10am

Hi,

I run both Duplicacy and Restic(0.9.5) on the same dataset (~615GB). The former sends data to Backblaze B2 for offsite protection while Restic stores data on a local NAS device for instant recovery. While I know these are different apps, I find the following datapoints very interesting. As a reminder this is the same dataset with similar retentions:

Backup:
Duplicacy(B2): 5 minutes
Restic(Local): 7 minutes

Prune:
Duplicacy(B2): 18 minutes
Restic(Local): 4 hours and 12 minutes

I am shocked by the time difference on the prune process (Note that for restic, I run “forget” with “–prune”. Also remember that Duplicacy is dealing with remote object storage with B2 while Restic has full Gigabit Ethernet access to a local NAS device. Is there a problem here? Is there a way for me to optimize this or should I just expect 12x slower prune speeds?

TIA!

tomwaldnz · October 7, 2019, 1:54am

Questions

What operating system is your computer using?
What antivirus program are you using?
What is the key bottleneck? Is it using a lot of CPU, a lot of I/O, etc?

If using Windows, please open the resource manager during a prune and check what process(s) are accessing the disk. As I discovered in this thread antivirus programs can massively increase the time required to do backups / prunes. I found I had to give restic an exception to virus scanning. In the end I got rid of Avira and used Windows Defender, I can’t remember if I had to create an exception for that or not.

jl_678 · October 8, 2019, 12:16pm

Hi,

Sorry for the delay in responding. I am running Restic on Ubuntu Linux. It is running on a small 64bit single board computer (SBC) with 4GB of RAM. It is not running anti-virus, and is very lightly loaded.

I actively track all CPU and memory usage. During backup, CPU usage peaks at about 60%, and typically stays around 50%. Memory is fine and never exceeeds 25%. I do not really have good I/O data and will explore tracking that more effectively.

In general, it does not feel like I am bottlenecking on the SBC, but I could be wrong. I still am wondering why the garbage collection process is so much slower on Restic…

cdhowie · October 9, 2019, 1:54am

The CPU iowait% is one metric you could use.

Restic has to read each pack header, then crawl every tree that’s used by any snapshot. If the repository is stored on an HDD then it’s likely that the disk is spending a lot of time seeking. Can you test on an SSD?

It’s possible that you’d see similar numbers with restic against B2 if the local storage has slow seek performance; if the requests to B2 are made in parallel then it’s entirely possible that B2 can outperform a local disk when it comes to fetching many small files. However, this is untested speculation.

jarmo · October 13, 2019, 10:39am

I am seeing similar problems with running prune --forget using Backblaze B2 as a storage with a backup size of ~2TB - it takes over 8+ hours to perform this operation. I’m running on a Linux so no antivirus involved (nor local disk).

jl_678 · October 15, 2019, 12:36am

I can imagine. My speeds are very slow using local disk which is relatively high performance. I can only imagine how it would perform with cloud-based object storage. My Duplicacy instance screams with B2 which leaves me scratching my head since it is the same dataset. I guess that it is just a difference in the design…

My one concern is that my 4 hour prune is with 600GB of data, and so it is not that much information. It is likely to get worse over time, and I worry that at some point, it will become unmanageable.

tomwaldnz · October 15, 2019, 1:18am

I just ran restic prune / forget on my local Windows PC on a 279GB repository, which is sitting on a locally attached SATA 7200RPM spinning HGST disk. This PC is an i7 2600 with 16GB RAM and OS on SSD - it’s not a new PC but it’s fast enough for everything I do including video editing.

The repository backs up photos and videos - 40,000 files, about 45% very small xmp metadata, 45% raw files that are about 20MB each, 8% jpeg files a few MB, and 2% videos that vary between 20MB and 500MB.

The “forget” command took 10 seconds and the “prune” command took 6 minutes. My execution log from the console is below. I could see restic.exe was using all the CPU time - the virus scanner wasn’t doing anything, no other process was doing much.

Can you please do the following, showing us the output of each step?

run “restic version”
Copy your restic repo onto a local disk (SSD if you have enough capacity, spinning disk fine)
Run the forget/prune across the network, and provide the log similar to below
Run the forget/prune on the copy on the local disk, and provide the log similar to below
Run a disk benchmarking program against your local disk and NAS. I’d like to see throughput for both large files and with random I/O.

When you run restic prune please include the “-v” flag

Here’s my log

restic version
restic 0.9.5 compiled with go1.12.4 on windows/amd64

time
13:57:32.48

restic.exe --repo c:\Photos forget --keep-daily 7 --keep-weekly 8 --keep-monthly 24

time
13:57:41.26

restic.exe --repo c:\Photos prune
repository xxx opened successfully, password is correct
counting files in repo
building new index for repo
[5:25] 100.00% 57184 / 57184 packs
repository contains 57184 packs (226553 blobs) with 279.044 GiB
processed 226553 blobs: 0 duplicate blobs, 0 B duplicate
load all snapshots
find data that is still in use for 22 snapshots
[0:02] 100.00% 22 / 22 snapshots
found 226547 of 226553 data blobs still in use, removing 6 blobs
will remove 0 invalid files
will delete 1 packs and rewrite 0 packs, this frees 78.879 KiB
counting files in repo
[0:02] 100.00% 57183 / 57183 packs
finding old index files
saved new indexes as
remove 21 old index files
[0:00] 100.00% 1 / 1 packs deleted
done

time
The current time is: 14:04:09.01

jl_678 · October 15, 2019, 7:43pm

I would love to run these tests, but I do not have enough local disk space to move the repository to a local disk. (I have a 32GB eMMC for the OS and local storage while I am protecting 600GB+.)

I do have the logs that you mention and can share them. I can also run some performance tests. Do you have a preferred disk performance tool in Linux? I can track something down but figured that I would ask if you recommend anything before I start searching. (I have used dd in the past.)

tomwaldnz · October 15, 2019, 8:31pm

Can you plug in a USB disk? SSD if you have one, spinning disk fine, even if you have to make a smaller repo to test with. The main thing I’m trying to work out is if there is high latency, low bandwidth, or low transactions per seconds to the NAS which could cause a problem.

I don’t know any Linux disk testing programs sorry.

Not being able to run this on both local disk and the NAS significantly reduces the information we would get from this. Given my system runs two orders of magnitude faster than yours I expect it’s a problem with your setup rather than restic.

Can you run this test on a PC against both the NAS and a local disk inside the PC?

cdhowie · October 15, 2019, 9:06pm

bonnie++ is more or less the standard disk benchmarking tool on Linux.

jl_678 · October 17, 2019, 4:00pm

I am running an extended multi-day performance test with iozone which started before the Bonnie++ recommendation. I will share the results when they are complete…

tomwaldnz · October 17, 2019, 4:37pm

That doesn’t seem necessary. I would cancel it and run a quick test.

kellytrinh · June 9, 2020, 8:00am

Was there any particular outcome on this ? I was doing some testing on local HDD and the prune times can be really wacky (12 seconds vs 15 minutes). That’s on a HDD backend of ~490Gb of backup for testing purpose.

I have started a prune on my ‘real’ backend of Onedrive (via rclone) and it has been 7 hours!

(Borg can do in about 5 minutes or so for something similar)

cdhowie · June 9, 2020, 4:21pm

If you have quite a bit of RAM, the system I/O cache can hold much of the data restic needs to prune. If you prune multiple times in a row (to test performance) then subsequent prune operations can complete very quickly if the I/O cache can satisfy most of the I/O requests.

You would need to run echo 3 > /proc/sys/vm/drop_caches between invocations to clear the I/O cache.

alexweiss · June 9, 2020, 5:06pm

The current implementation of prune has many known issues.
Most of these are cured by PR 2718.
I also expect speed improvements of factor 10 or even more - maybe much much more for remote repositories.

This however still needs code reviews and some testing for all kind of scenarios to make it into the master. If you can test it out with a test setting or can try it out on a copy of your repository, I’m happy to get feedback.

kellytrinh · June 10, 2020, 5:42am

@cdhowie the IO cache explains the wide volatility in the results so the ‘quick’ prune result unrepresentative and I’ll probably drop it; thanks for the tip on the clear caching if I ever need to revisit

@alexweiss I am more than happy to support on testing; I am running both Win and Linux (Ubuntu) if that is relevant and can support on both a HDD / Rclone (onedrive) backend - both are testing repos so no worries there. Only piece is that I am not a coder so would need some support on a prebuilt binary or somesuch. Any suggestions on how to go about that?

alexweiss · June 10, 2020, 7:57am

@kellytrinh I made two prebuild binaries. The contain PR 2718 and PR 2749.

linux binary
windows binary

Please make sure that you use binaries from untrusted sources only in pure test environments and not in environments containing sensible data!

kellytrinh · June 10, 2020, 8:21am

Proud to update - the new version with the PR worked flawless [tested on linux] first time!

Took 10 minutes (stats below on the job size to give context).

I really like the additional reporting as well; the old process just seem to hang without informing what was status (my previous 12 hour adventure had a “error timeout to onedrive” after 8 hour so I wasn’t sure whether still going or what). This new version much better for informing what step is being taken

Well done; makes restic much much stronger!!!

[ PS Windows worked as well - but obviously less to do since it is already pruned! ]
[ will be doing some more testing and report here if findings ]

====

used: 414960 blobs / 447.161 GiB
duplicates: 0 blobs / 0 B
unused: 11888 blobs / 8.540 GiB
unreferenced: 2.389 GiB
total: 426848 blobs / 458.090 GiB
unused size: 1.86% of total size

to repack: 615 blobs / 7.792 MiB
-> prunes: 254 blobs / 3.394 MiB
to delete: 9427 blobs / 7.289 GiB
delete unreferenced: 2.389 GiB
total prune: 9681 blobs / 9.682 GiB
unused size after prune: 0.28% of total size

wscott · June 10, 2020, 11:19am

I am really looking forward @alexweiss’s fix to prune, but this original thread was comparing prune on Restic to Duplicity. I see a few ways that dulicacy is fast. This is just based on casual observation and may be wrong.

It is an index only operation
In Duplicacy the remote file names are a hash of the contents and indexes are cached locally. After doing a full directory listing of the remote side and possibly downloading some “snapshot” files created by other hosts writing to the same repository the prune command has everything it needs to determine what to prune. @alexweiss’s new prune is similar.
It is more willing to tradeoff wasted space for performance
Because of Duplicacy’s chucking model they don’t have pack’s like restic and don’t need to deal with partially populated packs when doing a prune. Instead the backup operation with always creates fully populated chunks and the “snapshot” equilivant will list the chucks needed for each backup. Chunks no longer referenced became stale. So if a back changes a small file in the middle of an existing chuck then a whole new chunk will be uploaded and the old chunk becomes stale. The trade-off on chunksize is internal fragmentation for large chucks and external fragmentation for small chunks. And large indexing metadata.
backups and prunes don’t need to be locked
Backups on duplicacy are lock-free. The only thing a backup command can do is add files to a repository so parallel backups have no problem running together. A prune runs in two phases, first, it deletes any files that were marked for pruning over a week ago. Then it finds files that are no longer needed by the current backups and renames them to a fossil directory. Files have to live in that fossil directory for a week before being deleted and if a backup needs data from the fossil directory it will use it and move the file back out of that directory. This way as long as a single backup completes in less than a week prunes can run without locks. Very nice.

Any for most repository sizes the approach used by Dulicacy is a nice tradeoff.

kellytrinh · June 10, 2020, 1:58pm

I’ve been doing the testing as a part of a comparison between the current backup solutions I have and some promising candidates (see below).

https://f000.backblazeb2.com/file/backblaze-b2-public/Backup_Tool_Comparison.xlsx

Duplicacy is very strong contender - that being said; restic was having a very strong showing except for the prune performance. Previously it was so horrible (10+ hours) that despite its strengths elsewhere I wasn’t so hopeful.

Given the amazing improvement Restic in contention again - the lock free part less relevant for me since I won’t be concurrently accessing the storage from multiple systems.