+restic prune
repository 60fa2560 opened (version 2, compression level auto)
loading indexes...
[1:00] 100.00% 631 / 631 index files loaded
loading all snapshots...
finding data that is still in use for 243 snapshots
[36:04] 100.00% 243 / 243 snapshots
searching used packs...
collecting packs for deletion and repacking
[1:48] 100.00% 85889 / 85889 packs processed
to repack: 12362685 blobs / 622.529 GiB
this removes: 3407521 blobs / 177.849 GiB
to delete: 12757156 blobs / 843.293 GiB
total prune: 16164677 blobs / 1021.142 GiB
not yet compressed: 243.443 GiB
remaining: 10886739 blobs / 686.428 GiB
unused size after prune: 0 B (0.00% of remaining size)
deleting unreferenced packs
[0:00] 100.00% 19 / 19 files deleted
repacking packs
[19:35:53] 14.07% 1941 / 13799 packs repacked
So my prune is in progress. So far it took ~19 hours to get 15% done⦠The repo is big, but not super huge⦠Is there anything I can do (in the future)?
Which restic version are you using and which backend? How exactly are you calling restic? How fast is the upload to / download from the repository? Without more information itās impossible to tell whether the speed is reasonable or not (although it seems to be rather slow).
Hey,
thanks for the fast response. On the client side, I have restic 0.16.5 compiled with go1.22.5 on linux/amd64. The server side is a Hetzner StorageBox [0]. I only can run rclone serve restic --stdio on the server. So I donāt know which version they are running. But Iām trying to find out.
you have 622 GiB to repack. In ideal world it would take ā70h just to download this amount of data with you network speed. So your results do not look very slow really. Approximating your current speed all process will take about 120h - which is not surprising in real world:)
IMO it is much better to prune frequently even limiting amount of data pruned in single run (--max-repack-size) than wait until single run takes days to finish.
you have 622 GiB to repack. In ideal world it would take ā70h just to download this amount of data with you network speed. So your results do not look very slow really.
Why do you compare it with downloading it? I thought prune goes through every data block and checks if it is used by at least one snapshot. If not, delete it from disk. So I donāt know why downloading things matters here. For me itās just a server side cpu/disk task.
IMO it is much better to prune frequently even limiting amount of data pruned in single run (āmax-repack-size) than wait until single run takes days to finish.
I understand that pruning more frequently helps. The thing is that we use an append-only repo and we need to prune from a safe laptop. Also during that time, we canāt create new backups. Why do you propose to limit the amouont of data pruned per restic prune? Can you explain that a little more? Thanks!
Regarding the network speed: Hetzner says nothing about the speed, just unlimited traffic (Storage). I think they have 1Gbit/10Gbit/s servers there. On the client side (for pruning), it depends⦠My wifi/setup is not great here and I canāt use Ethernet.
Hetzner responded and they just said that they only have rclone on the server and rclone has its own restic implementation. I still donāt know the version.
I also stopped the second run (where you said itās 30% faster). I think I will setup a small server and run the prune from there over night.
Blocks are stored in packs - they have to be downloaded, repacked and uploaded back.
I am not sure what server side you are talking about. Unless I do not understand your setup you do not have any restic server running. All your operations are done by client(s) and require all data to be shipped over network. rclone serve restic is only connectivity āproxyā.
As it allows you to run prune when you have spare time (no backups running). You have ballpark figure now how many GB you can prune per hour. So for example everyday you can run prune for few hours.
And if you have 24/7 busy environment you could look at alternatives - check rustic which supports lock free operations.
What you refer to as a ādata blockā corresponds with a āblobā. These 'blobās are assembled into āpackā files. It is the 'blobās that are marked in the forget operation. During a prune, if a āpackā file contains a āblobā that is forgotten, then that āpackā file has to be downloaed, the unforgotten pack files need to be reassembled into new 'blobās, the new 'blobās need to be uploaded, and only when all of that has been successfully done the original 'blobās are deleted.
The max-repack-size @Kapitainsky suggested, which defaults to 5%, can speed things up when only a small portion of a āblobā has been forgotten because it is not downloaded, reassembled and uploaded.
On the side note it would be nice option to have ability to limit prune run to limited period of time. Something like --max-prune-time. I might try to implement it myself as a golang learning excersise and propose PR:) Unless somebody needs it urgently and get it done faster.