Restic prune is very slow

kmille · July 28, 2024, 7:53am

+restic prune                                                                                                         
repository 60fa2560 opened (version 2, compression level auto)             
loading indexes...                                                                                                     
[1:00] 100.00%  631 / 631 index files loaded                                                                           
loading all snapshots...                                                                                               
finding data that is still in use for 243 snapshots                                                                    
[36:04] 100.00%  243 / 243 snapshots                                                                                   
searching used packs...                                                                                                
collecting packs for deletion and repacking                                                                            
[1:48] 100.00%  85889 / 85889 packs processed                                                                          
                                                                                                                       
to repack:      12362685 blobs / 622.529 GiB                                                                           
this removes:    3407521 blobs / 177.849 GiB                                                                           
to delete:      12757156 blobs / 843.293 GiB               
total prune:    16164677 blobs / 1021.142 GiB              
not yet compressed:              243.443 GiB               
remaining:      10886739 blobs / 686.428 GiB                                                                                                                                                                                                   
unused size after prune: 0 B (0.00% of remaining size)                                                                 
                                                                                                                                                                                                                                               
deleting unreferenced packs                                
[0:00] 100.00%  19 / 19 files deleted                                                                                  
repacking packs                                                                                                        
[19:35:53] 14.07%  1941 / 13799 packs repacked

So my prune is in progress. So far it took ~19 hours to get 15% done… The repo is big, but not super huge… Is there anything I can do (in the future)?

martinleben · July 28, 2024, 8:41am

Welcome @kmille !

Make sure that you are using a recent restic. (Some distros include quite old SW.) Prune speed was improved a lot in version 0.12.0.

MichaelEischer · July 28, 2024, 8:55am

Which restic version are you using and which backend? How exactly are you calling restic? How fast is the upload to / download from the repository? Without more information it’s impossible to tell whether the speed is reasonable or not (although it seems to be rather slow).

kmille · July 28, 2024, 9:23am

Hey,
thanks for the fast response. On the client side, I have restic 0.16.5 compiled with go1.22.5 on linux/amd64. The server side is a Hetzner StorageBox [0]. I only can run rclone serve restic --stdio on the server. So I don’t know which version they are running. But I’m trying to find out.

On the client side, I use

export RESTIC_REPOSITORY="sftp:storagebox:backups/restic"
restic prune

I don’t think network speed is a limitation here.

[0] Storage

MichaelEischer · July 28, 2024, 10:41am

You’ll definitely want to upgrade to restic 0.17.0, which should solve the sftp upload performance problem.

kmille · July 28, 2024, 11:20am

Do you think it makes sense to abort/cancel, update and redo the restic forget?

repacking packs                                                                                                        
[23:02:30] 16.46%  2271 / 13799 packs repacked

kmille · July 28, 2024, 5:26pm

Now using restic 0.17.0-dev (v0.17.0-3-g76d56e24d). Still very slow:

[3:16:20] 2.77%  416 / 14993 packs repacked

I changed the network setup a bit. Now the network is limited to ~ 20 mbit/s UP and DOWN. Still a bit dissapointed.
What’s the limiting factor here?

kapitainsky · July 28, 2024, 5:32pm

you have 622 GiB to repack. In ideal world it would take ≈70h just to download this amount of data with you network speed. So your results do not look very slow really. Approximating your current speed all process will take about 120h - which is not surprising in real world:)

IMO it is much better to prune frequently even limiting amount of data pruned in single run (--max-repack-size) than wait until single run takes days to finish.

MichaelEischer · July 28, 2024, 7:06pm

You never answered my question how fast your connection to the storage is. So we can only guess.

Judging from dividing the number of pack by the elapsed time, the repack step is now 30% faster.

kmille · August 2, 2024, 12:10pm

Hey,

sorry for the late response.

you have 622 GiB to repack. In ideal world it would take ≈70h just to download this amount of data with you network speed. So your results do not look very slow really.

Why do you compare it with downloading it? I thought prune goes through every data block and checks if it is used by at least one snapshot. If not, delete it from disk. So I don’t know why downloading things matters here. For me it’s just a server side cpu/disk task.

IMO it is much better to prune frequently even limiting amount of data pruned in single run (–max-repack-size) than wait until single run takes days to finish.

I understand that pruning more frequently helps. The thing is that we use an append-only repo and we need to prune from a safe laptop. Also during that time, we can’t create new backups. Why do you propose to limit the amouont of data pruned per restic prune? Can you explain that a little more? Thanks!

Regarding the network speed: Hetzner says nothing about the speed, just unlimited traffic (Storage). I think they have 1Gbit/10Gbit/s servers there. On the client side (for pruning), it depends… My wifi/setup is not great here and I can’t use Ethernet.

Hetzner responded and they just said that they only have rclone on the server and rclone has its own restic implementation. I still don’t know the version.

I also stopped the second run (where you said it’s 30% faster). I think I will setup a small server and run the prune from there over night.

kapitainsky · August 2, 2024, 12:54pm

Blocks are stored in packs - they have to be downloaded, repacked and uploaded back.

I am not sure what server side you are talking about. Unless I do not understand your setup you do not have any restic server running. All your operations are done by client(s) and require all data to be shipped over network. rclone serve restic is only connectivity “proxy”.

As it allows you to run prune when you have spare time (no backups running). You have ballpark figure now how many GB you can prune per hour. So for example everyday you can run prune for few hours.

And if you have 24/7 busy environment you could look at alternatives - check rustic which supports lock free operations.

doscott · August 2, 2024, 1:05pm

This is an old article, but I believe it may help you understand the general process of pruning:
https://restic.net/blog/2016-08-22/removing-snapshots/

What you refer to as a ‘data block’ corresponds with a ‘blob’. These 'blob’s are assembled into ‘pack’ files. It is the 'blob’s that are marked in the forget operation. During a prune, if a ‘pack’ file contains a ‘blob’ that is forgotten, then that ‘pack’ file has to be downloaed, the unforgotten pack files need to be reassembled into new 'blob’s, the new 'blob’s need to be uploaded, and only when all of that has been successfully done the original 'blob’s are deleted.

The max-repack-size @Kapitainsky suggested, which defaults to 5%, can speed things up when only a small portion of a ‘blob’ has been forgotten because it is not downloaded, reassembled and uploaded.

kapitainsky · August 2, 2024, 1:08pm

you are mixing --max-unused (which default to 5%) with --max-repack-size (which is unlimited by default)

But both flags can be used to limit amount of data and time taken to prune. All is actually well documented - Removing backup snapshots — restic 0.17.0 documentation

On the side note it would be nice option to have ability to limit prune run to limited period of time. Something like --max-prune-time. I might try to implement it myself as a golang learning excersise and propose PR:) Unless somebody needs it urgently and get it done faster.

kmille · August 8, 2024, 11:15am

Thanks for your help. I ended up running prune on a server with a ssh key with dedicated permissions. Used the new release binary from Github.

repository 60fa2560 opened (version 2, compression level auto)
loading indexes...
[0:39] 100.00%  771 / 771 index files loaded
loading all snapshots...
finding data that is still in use for 282 snapshots
[28:15] 100.00%  282 / 282 snapshots
searching used packs...
collecting packs for deletion and repacking
[2:39] 100.00%  90303 / 90303 packs processed

to repack:      12941965 blobs / 650.664 GiB
this removes:    5392874 blobs / 363.675 GiB
to delete:      11890343 blobs / 720.292 GiB
total prune:    17283217 blobs / 1.059 TiB
not yet compressed:              240.331 GiB
remaining:      10763801 blobs / 695.947 GiB
unused size after prune: 0 B (0.00% of remaining size)

deleting unreferenced packs
[0:00] 100.00%  7 / 7 files deleted
repacking packs
[6:22:35] 100.00%  16396 / 16396 packs repacked
rebuilding index
[0:08] 100.00%  908 / 908 indexes processed
[0:13] 100.00%  774 / 774 old indexes deleted
removing 56795 old packs
[20:03] 100.00%  56795 / 56795 files deleted
done

real    437m13.928s
user    479m32.755s
sys     41m50.693s

kapitainsky · August 8, 2024, 12:35pm

From initial 100+ hours to 7 hours. Nice improvement