Should prune depend on valid indexes?

#1

The first step of the prune operation is apparently to rebuild indexes, but for some reason it does not appear that these indexes actually get used when traversing snapshots looking for objects to delete. If I remove all index files in the repository, prune fails:

$ restic prune
enter password for repository:
repository b01bdece opened successfully, password is correct
counting files in repo
building new index for repo
[0:01] 100.00%  41022 / 41022 packs
repository contains 41022 packs (1190738 blobs) with 185.295 GiB
processed 1190738 blobs: 0 duplicate blobs, 0 B duplicate
load all snapshots
find data that is still in use for 1204 snapshots
tree c64d27507b229b968b1f41dcf5e72f5dcb770716bc6680a6b07f51dd9e077e30 not found in repository
github.com/restic/restic/internal/repository.(*Repository).LoadTree
        /restic/internal/repository/repository.go:653
github.com/restic/restic/internal/restic.FindUsedBlobs
        /restic/internal/restic/find.go:11
main.pruneRepository
        /restic/cmd/restic/cmd_prune.go:191
main.runPrune
        /restic/cmd/restic/cmd_prune.go:85
main.glob..func18
        /restic/cmd/restic/cmd_prune.go:25
github.com/spf13/cobra.(*Command).execute
        /restic/vendor/github.com/spf13/cobra/command.go:762
github.com/spf13/cobra.(*Command).ExecuteC
        /restic/vendor/github.com/spf13/cobra/command.go:852
github.com/spf13/cobra.(*Command).Execute
        /restic/vendor/github.com/spf13/cobra/command.go:800
main.main
        /restic/cmd/restic/main.go:86
runtime.main
        /usr/local/go/src/runtime/proc.go:201
runtime.goexit
        /usr/local/go/src/runtime/asm_amd64.s:1333

After running rebuild-index, prune works again:

$ restic rebuild-index
enter password for repository:
repository b01bdece opened successfully, password is correct
counting files in repo
[0:01] 100.00%  41022 / 41022 packs
finding old index files
saved new indexes as [41a81729 c8249f7d 70201955 52d70e30 b384994c fd369cb4 ff465472 bb186880 ae112ea2 7a053fc8 e5075ba1 3810386e a2d34da7 8a4915a2]
remove 0 old index files

$ restic prune
enter password for repository:
repository b01bdece opened successfully, password is correct
counting files in repo
building new index for repo
[1:07] 100.00%  41022 / 41022 packs
repository contains 41022 packs (1190738 blobs) with 185.295 GiB
processed 1190738 blobs: 0 duplicate blobs, 0 B duplicate
load all snapshots
find data that is still in use for 1204 snapshots
[2:40] 100.00%  1204 / 1204 snapshots
found 1190738 of 1190738 data blobs still in use, removing 0 blobs
will remove 0 invalid files
will delete 0 packs and rewrite 0 packs, this frees 0 B
counting files in repo
[0:01] 100.00%  41022 / 41022 packs
finding old index files
saved new indexes as [e54741fe 7d5aae91 14648424 f796412b 05fe0d53 936c62d2 b87aca4c 861e9fd4 3b9facda 8f7366ff 6e1e9279 c700cffb 902c329a 22c7c034]
remove 14 old index files
done

What is the point of prune rebuilding the index as its very first step if it uses the repo’s own index files anyway?

0 Likes

#2

@fd0 Was wondering if you had a comment on this. It seems like a bug to me.

0 Likes

#3

Ah, you’re right: prune rebuild the index, but the old one is used for lookup, but it should! That’s a limitation we still need to resolve. I’m not sure it’s tracked somewhere on GitHub (I’ve lost overview a bit…), so please create a new issue if you don’t find one. Thanks!

0 Likes

#4

So what does the initial rebuild do currently? Prune rebuilds the index, looks for duplicate objects, crawls snapshots to find used objects, removes old objects (delete or repack), then rebuilds the index a second time.

Does the first rebuild actually do anything right now?

0 Likes

#5
1 Like

#6

Yes, the index built in the first stage of prune is used for all lookups if blobs are present or missing. Just when data needs to be fetched from the repo (e.g. when walking the snapshots and a new tree needs to be loaded), the data from the original index files is considered. So prune uses both, kinda.

The problem is that the index is tightly coupled with the Repository object and it’s not easy to untangle. I need to finish that sometime, but it’s hard for me at the moment to find ~6-10 hours without interruptions :wink:

0 Likes