Prune performance

I also observed this; this is when forget removes your snapshot files.
In the last commits in the PR I changed forget to use the same deletion mechanism as used in prune. This parallelizes this deletion (should be much faster now!) and also automatically prints a progress bar.

So I just can encourage you to try out the actual version of the PR :wink:
EDIT: Just realized that you aren’t able to build restic yourself; will provide test-binaries soon.

updated binaries are now available at the above linked destination.

Again the same disclaimer: These are test-binaries; only use after intensive testing and not on production data!

no worries dude; there is a certain amount of experimentation and things breaking allows me more things to play with so all good!

Confirm that behaviour is now different and it shows the intermediate steps. However for large amounts of snapshots being deleted the current approach isnt so helpful as you know have many snapshots to forget but then the listing is just the hashes and so can see things moving but can’t tell progress. One approach used by competitor (cough cough borg) is to postpend a counter like "(120/200) and can easily see how far along. Perhaps to consider as a fairly easy improvement (and it may help the prune code that uses the same approach on progress!)

The test to be honest wasn’t so ‘heavy’ this time coz had just sorted out prune earlier. Probably would need a bit of time before can give it a real stress test and will update to all on results then.

Can you send me how you called forget and the output?
It actually should print the progress like

remove 1 snapshots:
ID        Time                 Host        Tags        Paths
--------------------------------------------------------------------------------
301cc83d  2020-07-08 11:27:22  thinkpad                /home/thinkpad/data-small
--------------------------------------------------------------------------------
1 snapshots

[0:00] 100.00%  1 / 1 files deleted
1 snapshots have been removed, running prune

Hi,

I am launching like this:

restic-newprune	\
		-r $TARGET				\
		--verbose 				\
		forget 					\
			--keep-daily 	7		\
			--keep-weekly	4		\
			--keep-monthly	3		\
			--prune				

I hope that gives clues on what is going on. the problem occurs when there is a lot of snapshots to deal with and since just did a prune there is a only a few so hard to replicate.

I can watch the run interactively in a day or two and update this post then.

If you run with --verbose, it will print out each filename that was deleted. Try without --verbose. find and prune are quite verbose by standard.

I dropped the --verbose and so dont have the listing of each snapshot deleted… but then the original problem (deleting lots of snapshots results in a big pause) happens. Not sure if progress indicator not working or not…

For what I tested this morning:

328 snapshots

[[[ pause - no progress update]]]

[0:42] 100.00%  328 / 328 files deleted

this was using restic version

restic 0.9.6 (v0.9.6-252-g2c3bf71e-dirty) compiled with go1.14.3 on linux/amd64

The last line indicates that the progress bar was working.

Didi you run this on a terminal?

The progress bar only updates if you are on a terminal. If you run this by some service like systemd and check logfies or tee’d output you may not see the progress bar. This is intended as a progress bar is only of interest when being called by a user and not by a automatization. This should however affact all progress bars and not only this one…

Oh that’s it! This is a script that I use both interactively/via-cron so there is a tee that is probably messing things up.

I’ll check again in a few days when there is more snapshots to make sure without it the progress bars work.

Thanks for all the help!

Confirmed; it was because i was putting output to “tee” that was causing problem with prune appearing to not work. All good with the latest version and updates.

Hi all; did some further testing with some real loads.

Leaving for folks later on for info on how progress meter works (particularly with my backend - Onedrive for Business via rclone).

Happy to do more testing if anyone has suggestions.

The tl;dr - there is a step in middle (data files processed) where the progress seems to get stuck (see details below on lead up and output). The whole process did eventually work without a hitch but did take a decent while (total - 40 minutes with the no-progress sections accounting for 2/3 of that…)

===

delete snapsnots

~1800 snapshots

Got stuck at around ~400 or so - very fast 1 minute to 400 and then next 9 minutes to get to 402
Terminated via control c and the unlock and try again.

Next run fine:

[3:10] 100.00%  1429 / 1429 files deleted

collect data files for deletion and repacking

Had for a very long time:

0% 0 / 2428 

and then suddenly:

data file fbbcd5de is not referenced in any index and not used -> will be removed.
(((many lines)))
data file ff442e9d is not referenced in any index and not used -> will be removed.
[4:46] 100.00%  2428 / 2428 data files processed

repacking data files

Ok fast up initally but great slowdown around 5 mins; go from 56 to 60 over next 7 minutes

[11:36] 74.07%  60 / 81 data files repacked

deleting obsolete data files…

Instant to 69 but then:

[0:37] 51.11%  69 / 135 files deleted

took another 8 minutes to get to 74 files deleted but then ran really fast and finished

Your “slow” deletion is most likely a backend issues, especially if in combination with Onedrive. There are known issues with Onedrive, see e.g.

https://jaytuckey.name/2020/07/17/problems-with-onedrive-as-a-backend-to-restic-backup-tool/

To debug this you may want to run restic like this:

RCLONE_LOG_LEVEL=DEBUG restic ...

I’m in the process of reviewing online backup options and have settled on either B2/Wasabi as my backend. I was all set on restic until testing prune.

I’ve read through the comment history for several of the PRs (and this thread) and @alexweiss has done some incredible work here - looking forward to seeing it merged. I’m still not entirely sure whether these improvements will reduce the data transfer required for pruning when restic is paired with object storage such as B2/Wasabi , or just reduce the processing power required? - Is anyone able to confirm?

The main improvement for remote repositories is the reduced data transfer. The prune reimplementation uses the information that is usually locally cached (+ just the list of objects which exist in your remote backend) to determine what to do. Moreover you can trade used space for lower repacking rate (repacking implies reading and re-writing the contents) and fine-tune the pruning using various parameters. E.g. I’m using the new implementation with a cold storage where no file is read from the remote repository during prune (Note that I use another patch for this to fully work but this is about locks and key files and does not change anything about traffic in a measurable way).

Amazing, thanks so much for your hard work

@ArandoDrive - something to be careful about; wasabit has a min object storage retention period of 90 days; so if you end up pruning too quick; then it will actually cost you.

wasabi ingress/egress is free so the only “extra” cost is for storage deleted prior to 90 days (at the same rate as storage, 0.00016243 USD per GB per day). I use a --keep-daily 28 --keep-weekly 14 forget policy, but still end up with some deleted storage because or repacks, etc. The source for the backup involves the addition/deletion of 5 - 20 GB total per week (once per day backup), and the relatively balanced addition/deletion is not intentional, it just works out that way.

For my last billing period I had 2.99 TB total storage and 81.2 GB deleted storage; the costs for each in USD were 14.41 and 0.47.

Downloads for prunes were 14 - 16 GB each. Uploads for prunes varied between 35 - 92 GB each.

I use the rclone backend, and have had flawless operations with wasabi by dropping the prune from the forget operation; the sequence I use is forget, check, prune, check.

@kellytrinh Thanks - I’d spotted that but appreciate you flagging it as I’m sure many would easily miss it.

@doscott Thanks for your account of things, really useful. My concern with the current prune was around the bandwidth implications (due to my own limitations). My backups are for personal data that is pretty static, so I don’t expect to feel much pain for the 90 day cost. I’ll likely avoid too much pruning until the new changes are merged to master (where bandwidth will be less of an issue)

Thanks for all the great work done here. Looking forward to the faster prune. I have just had a prune complete using restic 0.9.5, which reduced a B2 repository from 46TB to 23 TB …

The output of my script shows …


Pruning XXXXXXXX restic snapshots … --keep-weekly 2 --keep-monthly 12

real 124864m1.270s
user 0m0.014s
sys 0m0.010s
Thu Oct 15 22:12:00 BST 2020

Pruning XXXXXXXX restic snapshots finished RC=0


124864 minutes is 86 DAYS!!! A long time to wait for the next backup!

1 Like

As


and

were merged, the improved pruning is now available in the latest beta builds and I removed the provided binaries in my private github repo.

Hope that the new pruning is getting widely used - and if you encounter an issue with pruning in the recent master branch, don’t hesitate to open an issue in github!