Prune: Release lock

Prune should only have exclusive lock while editing index files, but when deleting unindexed packs, there isn’t a reason to block new backups.
Thoughts?

709 snapshots have been removed, running prune
counting files in repo
building new index for repo
[2:26:54] 100.00%  101817 / 101817 packs
incomplete pack file (will be removed): 20d908e7ea827985b91ee6b335c2f72fb1d6a28f2a375f64a34d63f50c8ae9ca
incomplete pack file (will be removed): 400d361dd902cb99e92b7b47f62b00aa6ed79110087a9546c10758c9b578926a
repository contains 101815 packs (2778852 blobs) with 463.580 GiB
processed 2778852 blobs: 734970 duplicate blobs, 91.447 GiB duplicate
load all snapshots
find data that is still in use for 49 snapshots
[3:34] 100.00%  49 / 49 snapshots
found 661747 of 2778852 data blobs still in use, removing 2117105 blobs
will remove 2 invalid files
will delete 56997 packs and rewrite 24516 packs, this frees 400.907 GiB
[23:20:15] 100.00%  24516 / 24516 packs rewritten
counting files in repo
[1:03:49] 100.00%  28013 / 28013 packs
finding old index files
saved new indexes as [49d51ef5 44fbce95 a1abebe4 bce89479 556d327d 23af6a35 bd59120a a5f4d733 8384ba5a afe18968]
remove 1566 old index files
[36:07:28] 94.18%  76766 / 81513 packs deleted

I think you’re right!
In your example the last 36 minutes could be done without exclusive locking.

What exactly is your use case? Do you want to minimize the time the repository is exclusively locked? The reduction of pruning time is actually work in progress (I made an PR to improve this a lot). This might already help in your specific use case (as soon as the PR is reviewed and merged).

See also https://forum.restic.net/t/prune-performance

If a concurrent backup manages to create an identical pack (extremely unlikely, but possible) then it could upload that pack only to have a concurrent prune operation come along and delete it.

I don’t think this is safe, even though the possibility for data loss is very small.

1 Like

Only way I see around that is having a ‘prune index’:

  • pruner about to delete > prune index
  • new backup: create pack: check if is in prune index, check if name already exists
  • prune delete complete, rm prune index

I would like to argue against this opinion, IMHO this is safe :smile:

You are completely right, but remember that for the encryption of each part in the pack a random 128 bit nonce is generated. So - yes it is (theoretical) possible. But it is extremely unlikely and we do have to bring this unlikeliness into the right perspective. In fact there are so much more likely things (e.g. hardware failures etc.) which will blow your backup such that this probability can be really neglected!
This is a bit like the discussion about hash collisions. If you do have a hash collision (and it is possible to have one), you will loose data in your backup. It is however that much unlikely, that this possibility can be neglected.