I run both Duplicacy and Restic(0.9.5) on the same dataset (~615GB). The former sends data to Backblaze B2 for offsite protection while Restic stores data on a local NAS device for instant recovery. While I know these are different apps, I find the following datapoints very interesting. As a reminder this is the same dataset with similar retentions:
Prune:
Duplicacy(B2): 18 minutes
Restic(Local): 4 hours and 12 minutes
I am shocked by the time difference on the prune process (Note that for restic, I run āforgetā with āāpruneā. Also remember that Duplicacy is dealing with remote object storage with B2 while Restic has full Gigabit Ethernet access to a local NAS device. Is there a problem here? Is there a way for me to optimize this or should I just expect 12x slower prune speeds?
What is the key bottleneck? Is it using a lot of CPU, a lot of I/O, etc?
If using Windows, please open the resource manager during a prune and check what process(s) are accessing the disk. As I discovered in this thread antivirus programs can massively increase the time required to do backups / prunes. I found I had to give restic an exception to virus scanning. In the end I got rid of Avira and used Windows Defender, I canāt remember if I had to create an exception for that or not.
Sorry for the delay in responding. I am running Restic on Ubuntu Linux. It is running on a small 64bit single board computer (SBC) with 4GB of RAM. It is not running anti-virus, and is very lightly loaded.
I actively track all CPU and memory usage. During backup, CPU usage peaks at about 60%, and typically stays around 50%. Memory is fine and never exceeeds 25%. I do not really have good I/O data and will explore tracking that more effectively.
In general, it does not feel like I am bottlenecking on the SBC, but I could be wrong. I still am wondering why the garbage collection process is so much slower on Resticā¦
Restic has to read each pack header, then crawl every tree thatās used by any snapshot. If the repository is stored on an HDD then itās likely that the disk is spending a lot of time seeking. Can you test on an SSD?
Itās possible that youād see similar numbers with restic against B2 if the local storage has slow seek performance; if the requests to B2 are made in parallel then itās entirely possible that B2 can outperform a local disk when it comes to fetching many small files. However, this is untested speculation.
I am seeing similar problems with running prune --forget using Backblaze B2 as a storage with a backup size of ~2TB - it takes over 8+ hours to perform this operation. Iām running on a Linux so no antivirus involved (nor local disk).
I can imagine. My speeds are very slow using local disk which is relatively high performance. I can only imagine how it would perform with cloud-based object storage. My Duplicacy instance screams with B2 which leaves me scratching my head since it is the same dataset. I guess that it is just a difference in the designā¦
My one concern is that my 4 hour prune is with 600GB of data, and so it is not that much information. It is likely to get worse over time, and I worry that at some point, it will become unmanageable.
I just ran restic prune / forget on my local Windows PC on a 279GB repository, which is sitting on a locally attached SATA 7200RPM spinning HGST disk. This PC is an i7 2600 with 16GB RAM and OS on SSD - itās not a new PC but itās fast enough for everything I do including video editing.
The repository backs up photos and videos - 40,000 files, about 45% very small xmp metadata, 45% raw files that are about 20MB each, 8% jpeg files a few MB, and 2% videos that vary between 20MB and 500MB.
The āforgetā command took 10 seconds and the āpruneā command took 6 minutes. My execution log from the console is below. I could see restic.exe was using all the CPU time - the virus scanner wasnāt doing anything, no other process was doing much.
Can you please do the following, showing us the output of each step?
run ārestic versionā
Copy your restic repo onto a local disk (SSD if you have enough capacity, spinning disk fine)
Run the forget/prune across the network, and provide the log similar to below
Run the forget/prune on the copy on the local disk, and provide the log similar to below
Run a disk benchmarking program against your local disk and NAS. Iād like to see throughput for both large files and with random I/O.
When you run restic prune please include the ā-vā flag
Hereās my log
restic version
restic 0.9.5 compiled with go1.12.4 on windows/amd64
restic.exe --repo c:\Photos prune
repository xxx opened successfully, password is correct
counting files in repo
building new index for repo
[5:25] 100.00% 57184 / 57184 packs
repository contains 57184 packs (226553 blobs) with 279.044 GiB
processed 226553 blobs: 0 duplicate blobs, 0 B duplicate
load all snapshots
find data that is still in use for 22 snapshots
[0:02] 100.00% 22 / 22 snapshots
found 226547 of 226553 data blobs still in use, removing 6 blobs
will remove 0 invalid files
will delete 1 packs and rewrite 0 packs, this frees 78.879 KiB
counting files in repo
[0:02] 100.00% 57183 / 57183 packs
finding old index files
saved new indexes as
remove 21 old index files
[0:00] 100.00% 1 / 1 packs deleted
done
I would love to run these tests, but I do not have enough local disk space to move the repository to a local disk. (I have a 32GB eMMC for the OS and local storage while I am protecting 600GB+.)
I do have the logs that you mention and can share them. I can also run some performance tests. Do you have a preferred disk performance tool in Linux? I can track something down but figured that I would ask if you recommend anything before I start searching. (I have used dd in the past.)
Can you plug in a USB disk? SSD if you have one, spinning disk fine, even if you have to make a smaller repo to test with. The main thing Iām trying to work out is if there is high latency, low bandwidth, or low transactions per seconds to the NAS which could cause a problem.
I donāt know any Linux disk testing programs sorry.
Not being able to run this on both local disk and the NAS significantly reduces the information we would get from this. Given my system runs two orders of magnitude faster than yours I expect itās a problem with your setup rather than restic.
Can you run this test on a PC against both the NAS and a local disk inside the PC?
I am running an extended multi-day performance test with iozone which started before the Bonnie++ recommendation. I will share the results when they are completeā¦
Was there any particular outcome on this ? I was doing some testing on local HDD and the prune times can be really wacky (12 seconds vs 15 minutes). Thatās on a HDD backend of ~490Gb of backup for testing purpose.
I have started a prune on my ārealā backend of Onedrive (via rclone) and it has been 7 hours!
(Borg can do in about 5 minutes or so for something similar)
If you have quite a bit of RAM, the system I/O cache can hold much of the data restic needs to prune. If you prune multiple times in a row (to test performance) then subsequent prune operations can complete very quickly if the I/O cache can satisfy most of the I/O requests.
You would need to run echo 3 > /proc/sys/vm/drop_caches between invocations to clear the I/O cache.
The current implementation of prune has many known issues.
Most of these are cured by PR 2718.
I also expect speed improvements of factor 10 or even more - maybe much much more for remote repositories.
This however still needs code reviews and some testing for all kind of scenarios to make it into the master. If you can test it out with a test setting or can try it out on a copy of your repository, Iām happy to get feedback.
@cdhowie the IO cache explains the wide volatility in the results so the āquickā prune result unrepresentative and Iāll probably drop it; thanks for the tip on the clear caching if I ever need to revisit
@alexweiss I am more than happy to support on testing; I am running both Win and Linux (Ubuntu) if that is relevant and can support on both a HDD / Rclone (onedrive) backend - both are testing repos so no worries there. Only piece is that I am not a coder so would need some support on a prebuilt binary or somesuch. Any suggestions on how to go about that?
Proud to update - the new version with the PR worked flawless [tested on linux] first time!
Took 10 minutes (stats below on the job size to give context).
I really like the additional reporting as well; the old process just seem to hang without informing what was status (my previous 12 hour adventure had a āerror timeout to onedriveā after 8 hour so I wasnāt sure whether still going or what). This new version much better for informing what step is being taken
Well done; makes restic much much stronger!!!
[ PS Windows worked as well - but obviously less to do since it is already pruned! ]
[ will be doing some more testing and report here if findings ]
I am really looking forward @alexweissās fix to prune, but this original thread was comparing prune on Restic to Duplicity. I see a few ways that dulicacy is fast. This is just based on casual observation and may be wrong.
It is an index only operation
In Duplicacy the remote file names are a hash of the contents and indexes are cached locally. After doing a full directory listing of the remote side and possibly downloading some āsnapshotā files created by other hosts writing to the same repository the prune command has everything it needs to determine what to prune. @alexweissās new prune is similar.
It is more willing to tradeoff wasted space for performance
Because of Duplicacyās chucking model they donāt have packās like restic and donāt need to deal with partially populated packs when doing a prune. Instead the backup operation with always creates fully populated chunks and the āsnapshotā equilivant will list the chucks needed for each backup. Chunks no longer referenced became stale. So if a back changes a small file in the middle of an existing chuck then a whole new chunk will be uploaded and the old chunk becomes stale. The trade-off on chunksize is internal fragmentation for large chucks and external fragmentation for small chunks. And large indexing metadata.
backups and prunes donāt need to be locked
Backups on duplicacy are lock-free. The only thing a backup command can do is add files to a repository so parallel backups have no problem running together. A prune runs in two phases, first, it deletes any files that were marked for pruning over a week ago. Then it finds files that are no longer needed by the current backups and renames them to a fossil directory. Files have to live in that fossil directory for a week before being deleted and if a backup needs data from the fossil directory it will use it and move the file back out of that directory. This way as long as a single backup completes in less than a week prunes can run without locks. Very nice.
Any for most repository sizes the approach used by Dulicacy is a nice tradeoff.
Duplicacy is very strong contender - that being said; restic was having a very strong showing except for the prune performance. Previously it was so horrible (10+ hours) that despite its strengths elsewhere I wasnāt so hopeful.
Given the amazing improvement Restic in contention again - the lock free part less relevant for me since I wonāt be concurrently accessing the storage from multiple systems.