Two-step prune?

wscott · December 16, 2017, 11:25am

Would restic be able to adapt Duplicacy’s two-step strategy for prune to avoid making prune a locking operation?

As I write the below I realize I have a weak grasp on how it really works so I am trying to puzzle out the details. Understand that I am probably missing details be missing something.

As I understand it, the prune process looks sorta like this:

Prunes are run periodically at a period longer than normal backups should take (not sure this restriction is necessary). Locking is only needed to prevent concurrent prunes.
The normal prune operation is done where the objects needed by all snapshots are compared to the object store. Any objects in ‘fossils’ are moved back the to normal location.
The rest of the objects in ‘fossils’ are deleted.
Now any unused objects in the object store are moved to the fossils directory. They will sit here until the following week to make sure and cover any concurrent backups.

Restore, check and mount commands know about the fossils directory and will look for any missing objects in that directory.

The backup command doesn’t see the fossils directory and if a needed object happens to be in ‘fossils’ it will just get re-uploaded.

I realize that objects don’t exist as single files but it packs containing a collection of potentially unrelated objects. So there is some mess of repacking objects that might be happening at some point.

fd0 · December 16, 2017, 2:34pm

We’ve started that discussion already in #1141. It may be possible, but requires larger changes in the repo format.

I also have several ideas on how to improve the prune speed without these changes, we’ll try those first.

The general approach is: Build it, make it correct, make it fast. We’re not yet at the third stage