Snapshot Design Question

Zigsaz · September 25, 2018, 3:16am

Hello,

I had a question about restic’s snapshots that keeps unnerving me whenever I come back to try it again. I’ve searched here and on GitHub but couldn’t find anything.

I’ll forgo any complicated example and just give this boiled down version. Lets say I have the following 3 snapshots:

9/1/2018 Files: A.txt, B.txt, C.txt
9/2/2018 Files: A.txt, B.txt, C.txt, D.txt
9/3/2018 Files: A.txt, B.txt, C.txt

Now lets say I decide to prune snapshots 1 and 2 (a bit of an aggressive strategy) before noticing that file D.txt had only existed in the one single snapshot. This now means that D.txt is gone forever, correct?

I think this logically makes sense, since restic can’t just keep every file forever. But I’m wondering what measures can be taken to prevent such a scenario.

With restic diff, it’s only comparing the 2 snapshots given and nothing in-between, right? So if I did restic diff 1 3 I’d get an empty diff? If so, how feasible would it be to implement a diff command that is cumulative across all snapshots between the two given? I think such a feature could be very beneficial in catching anything that might slip through the cracks, but I’m unsure if it clashes with restic’s design philosophy or would be very expensive to compute, etc.

Hopefully that makes sense!

Thanks for your time and an awesome tool!

764287 · September 25, 2018, 7:29am

The diff shouldn’t be empty becasue file D.txt was added. restic will report any file which was added, modified or removed. Did you mean to diff 1 and 3?

I’m wondering what would be gained with such a feature. If you want to make sure that you are not deleting any new (or modified) files, why would you want to prune in the first place? Keep in mind that with restic’s design, snapshots which do not contain any changed files are using almost no storage space (unless you have a lot of files which are changing frequently).

fd0 · September 25, 2018, 10:48am

Hi, and welcome to the forum!

Correct. Running restic prune will remove all unreferenced data from the repository, which includes the metadata and content from file D.txt.

Can you clarify what “such a scenario” means, exactly? I can think of several interpretations:

“Make sure that no snapshots are removed which reference a file that is not contained in any other snapshots”, so that snapshot 2 is not removed in the first place?
“Make sure that all snapshots which add new files stay in the repo forever”
“Make sure that there’s at least one snapshot which references D.txt”

Correct, the content is the same.

It clashes with the philosophy of only adding an option/command if we’re convinced that it is important and/or helps most users

What would such a hypothetical command print? To which snapshots would it apply? After all, snapshots are grouped by different attributes (e.g. the paths, for details see the documentation on forget)…

Vartkat · September 25, 2018, 7:30pm

That crosses a bit my idea of implementing (for my own only) an SQLite database keeping history of diff per sanpshot. That way, at least, @Zigsaz should be able to know when he lost his D.txt file.

Zigsaz · September 26, 2018, 2:08am

Gah, yes diff 1 3. Sorry, that was a pretty significant mistake to make.

would be what I’m getting at in this case.

I mulled over this some more and realized that I think I’m getting myself confused. I think I’m thrown off by restic having only snapshots, where something like duplicity has snapshots + deltas. But I ran through some scenarios in my head and realized you can get into the same situations with that as well.

I still think it’d be very useful to get a brief summary of each snapshot in chronological order, sort of like git log -p. Similarily, it’d be useful to be able to “squash” snapshots that are exactly the same. But I can understand if those would be difficult computationally.

Sorry to waste your time