Prune performance

@kellytrinh Thank you for testing!

Did you specify something like --max-unused-percent 0.3? Else I cannot explain why exactly those blobs were repacked in your test.

However, (as expected) good speed results :slight_smile:

About the comparison of restic and duplicity: While storing multiple blobs in packfiles has many disadvantages which include higher code complexity, here are the disadvantages when storing each blob in a single file:

  • The overhead of handling small files is transferred to the backend. E.g. when using an ext4 filesystem, each file will automatically occupy a multiple of the blocksize (usually 4kb). When using cloud storages there are often similar restriction which only apply to the billing. Hence comparing repository sizes by summing up file sizes is not really fair as in fact a repository may use much more available storage or cost more than expected.
  • You can get much higher latency when handling lots of files instead of a single packfile.
  • The repository contents may be better estimated even if you encrypt your files.

However, it depends on the specific use case how advantages and disadvantages trade off.

Back to restic, there are already ideas how to implement lock-free operations. Working on the needed implementation is one of my top projects in near future.

@alexweiss I saw the extended discussion about additional commandline options to tweak the prune process but wasnā€™t looking for anything fancy and wanted to see performance ā€˜out of the boxā€™. The command line I used

/home/username/system/restic/restic-newprune    \
                -r rclone:onedrive:Restic       \
                --verbose                       \
                forget                          \
                        --keep-last 1           \
                        --prune                 

Does that explain why the output occured?

Oh and on the file size on storage piece - I do notice that downloading from storage backends generally quicker with larger files; if lots of little files then maybe 5MB or so due to overheads and ramp up on download speeds; but with larger files it would be closer to 30MB (which is the max coz my test machine is on a wifi connection to router)

Alex, you might want to click the link I have for Duplicacyā€™s chunking model. It doesnā€™t store packs, but it doesnā€™t put individual files on the filesystem either. The normal settings for Duplicacy write chunks with a size from 1-8 megs in size. The difference is that if a file in that chunk is changed then Duplicacy will reupload that entire chunk, restic will add the new file to a new pack with other changes from the update.

I think you found a minor bug in the part where the packs-to-repack are selected. ATM if --max-unused-space is not set to 100, there will be always one partly used pack (if one existes) chosen for repacking. This is only a minor performance bug in the sense that prune does a little more work than it needs to :wink: - Iā€™ll change this soon!
Thanks for testing!

cool; keep up the good work! This patch was amazing in terms of the performance uplift; look forward to when it is fully tested and ready for inclusion to mainline.

I just signed up on this forum to like and respond to this. Thank you so much for putting some work into this and providing binary links. This has make pruning a viable option to conduct on a daily or weekly basis without interrupting new backups. My prune time on my 2.2TB backup has dropped from 4-5 hours to under 10 minutes. The non-locking pruning is now a bonus feature given that the actual prune doesnā€™t take too much time. Iā€™ve been on the fence about moving away from restic but this has really made the difference for me to continue using it.

Thanks again!

3 Likes

@fmoledina Thanks for your response - this is the kind of feedback that keeps me working on improvements :smiley:

Just be aware that this PR has not been reviewed in detail and even though it has been already tested successfully often, there might be some bugs lurking inā€¦ I hope we can optimize quality further and bring this into the master release.

Also note, that there have been quite some other improvements (which already made it into master) that improve performance and reduce memory usage. Looking forward to the next official release!

3 Likes

Understood completely. Iā€™m using the restic-newprune binary only for pruning in my workflow, and continuing to use the mainline restic v0.9.6 for my backups. In my mental model, as long as mainline restic is successfully able to continue creating new snapshots, I should be okay. This is for my home server where restic is used as one of a few backup tools.

Let me know if thereā€™s anything else youā€™d like to test. Thanks again!

This is not entirely true. If a bug in the prune operation generates invalid indexes, backup could deduplicate data that isnā€™t actually in the repository.

3 Likes

Rats, well, itā€™s still worth the risk to me for off-site backups of my home server. Thanks for the clarification.

@alexweiss, just wanted to connect up with you on some feedback; dont want to be seen as demanding as your work has been so great already and a vast improvement over previous.

What I have observed is when running the prune operation - I get the listing of snapshots then:

1881 snapshots

[[[ big gap ]]]

1881 snapshots have been removed, running prune
get all snapshots
load indexes
find data that is still in use for 7 snapshots

The big gap in the middle was quite lengthy and it is unclear if the process is dead or ongoing. Could I suggest to have some progress meter there?

I also observed this; this is when forget removes your snapshot files.
In the last commits in the PR I changed forget to use the same deletion mechanism as used in prune. This parallelizes this deletion (should be much faster now!) and also automatically prints a progress bar.

So I just can encourage you to try out the actual version of the PR :wink:
EDIT: Just realized that you arenā€™t able to build restic yourself; will provide test-binaries soon.

updated binaries are now available at the above linked destination.

Again the same disclaimer: These are test-binaries; only use after intensive testing and not on production data!

no worries dude; there is a certain amount of experimentation and things breaking allows me more things to play with so all good!

Confirm that behaviour is now different and it shows the intermediate steps. However for large amounts of snapshots being deleted the current approach isnt so helpful as you know have many snapshots to forget but then the listing is just the hashes and so can see things moving but canā€™t tell progress. One approach used by competitor (cough cough borg) is to postpend a counter like "(120/200) and can easily see how far along. Perhaps to consider as a fairly easy improvement (and it may help the prune code that uses the same approach on progress!)

The test to be honest wasnā€™t so ā€˜heavyā€™ this time coz had just sorted out prune earlier. Probably would need a bit of time before can give it a real stress test and will update to all on results then.

Can you send me how you called forget and the output?
It actually should print the progress like

remove 1 snapshots:
ID        Time                 Host        Tags        Paths
--------------------------------------------------------------------------------
301cc83d  2020-07-08 11:27:22  thinkpad                /home/thinkpad/data-small
--------------------------------------------------------------------------------
1 snapshots

[0:00] 100.00%  1 / 1 files deleted
1 snapshots have been removed, running prune

Hi,

I am launching like this:

restic-newprune	\
		-r $TARGET				\
		--verbose 				\
		forget 					\
			--keep-daily 	7		\
			--keep-weekly	4		\
			--keep-monthly	3		\
			--prune				

I hope that gives clues on what is going on. the problem occurs when there is a lot of snapshots to deal with and since just did a prune there is a only a few so hard to replicate.

I can watch the run interactively in a day or two and update this post then.

If you run with --verbose, it will print out each filename that was deleted. Try without --verbose. find and prune are quite verbose by standard.

I dropped the --verbose and so dont have the listing of each snapshot deletedā€¦ but then the original problem (deleting lots of snapshots results in a big pause) happens. Not sure if progress indicator not working or notā€¦

For what I tested this morning:

328 snapshots

[[[ pause - no progress update]]]

[0:42] 100.00%  328 / 328 files deleted

this was using restic version

restic 0.9.6 (v0.9.6-252-g2c3bf71e-dirty) compiled with go1.14.3 on linux/amd64

The last line indicates that the progress bar was working.

Didi you run this on a terminal?

The progress bar only updates if you are on a terminal. If you run this by some service like systemd and check logfies or teeā€™d output you may not see the progress bar. This is intended as a progress bar is only of interest when being called by a user and not by a automatization. This should however affact all progress bars and not only this oneā€¦

Oh thatā€™s it! This is a script that I use both interactively/via-cron so there is a tee that is probably messing things up.

Iā€™ll check again in a few days when there is more snapshots to make sure without it the progress bars work.

Thanks for all the help!