How to cause a superseded index file?

David · December 31, 2019, 9:34pm

Under what conditions can an index file become superseded? I’ve been trying for days by forgetting old snapshots, running prunes, etc.

And although I see indices with “supersedes” entries, none of the superseded indices actually exist.

How can I create an index that both exists and is superseded?

fd0 · January 1, 2020, 11:20am

Hey, good question! When writing the design document, I though it was a good idea to add this, but in practice it turned out that having superseded index files does not happen often. In regular operation, you should not have an index file that is superseded by another file.

What are you trying to do?

David · January 1, 2020, 6:05pm

Thanks for the response, and happy new year!

What am I doing? Mostly experimenting and getting to know the repository format better. I’ve spent the last few days writing some tools that directly access the repository (thank you for the excellent description of the repository format!), and that’s been a lot of fun.

I became interested in this because I noticed unexpected behavior when running restic stats --mode files-by-content against the entire repository (without specifying a snapshot):

It’s really slow on my machine (over an hour to run against a 30GB repo on local storage)
It uses very little memory, and surprisingly little disk IO.
But it maxes out a single CPU core on my machine for the entire duration (and leaves all other cores unused).

I started wondering “why is the client working so hard to calculate these stats? Is it decrypting only what is needed for the statistic? Are there some shortcuts available to generate the stats faster? Could it be improved with multithreading? Could I increase speed if I allow it to use more memory?”

So I decided to write a little tool to test those questions.

It’s a work-in-progress. Currently, I can open the repo, get the keys, read and parse the index files (and create an in-memory map of the repo, combining all indices, honoring all indices and mapping blobs to packs), read and parse the snapshots, locate and read blobs.

Next up: parse trees. I think that should be sufficient to begin experimenting with calculating statistics

fd0 · January 1, 2020, 8:12pm

Cool, please keep us posted! Especially if you notice any things missing or unclear in the design document!

preranaprabhu02 · February 6, 2023, 11:28am

hey do you have a github repo for everyone to use these tools?