Per-file version history


#1

Hello everyone,

A common use case of backup software for me is to restore an individual file that I found has been messed up. This is straightforward to do with restic, given the problem becomes apparent before any further snapshots are taken - just restore the latest version. Yet, when the messed-up file has been sitting there for some time, with multiple snapshots taken inbetween, it becomes cumbersome to identify the exact snapshot that contains the file in its previous, healthy state.
So what I would like to see is a history of a particular file across snapshots, that clearly identifies when in the past that file has changed. This would allow me to select a snapshot from which I can then restore the file. I think, currently this is possible only by manually browsing the mounted repository, checking the file in each snapshot, going back in time one by one.
This feature can be thought of as an extension of the find command, adding the following information/processing:

  • in addition to the “snapshot contains file” information the output should include the file’s hash in each containing snapshot
  • the snapshot’s timestamp should be given
  • the resulting list of hash-per-snapshot records should be sorted by timestamp and grouped by hash, so that only the youngest snapshot containing the file with a particular hash is shown

Does this make sense to others? Would that be hard to implement?


#2

For a backup solution restoring files is of course equally important as saving them, hence the restore process should be easy and comfortable. While restic could be improved in this regard by providing more information it should also be as simple as possible.

Is this information really necessary? Hardly anybody knows the hash of the file they are searching for. Most people will use restic find --oldest DATE --newest DATE.

While I agree that this information would be nice to have I don’t think it is necessary. When using restic find --long the modification time of the file is displayed, which usually is all you are looking for.

You are right. Currently there doesn’t seem to be any kind of sorting which is kond of confusing.


#3

I believe this is more about knowing when the file actually changed. Sometimes mod-time can be unreliable.

Same as above, mod-time may be unreliable.


#4

Thanks, @764287, for pointing out the --long option to the file command; I was not aware of it, and it helps.

Yet, instead of the current output, which seems to be organized by snapshot first, then by file match

$ restic find --long /home/marcus/.xsession-errors
repository 2a7fabeb opened successfully, password is correct
Found matching entries in snapshot d95b4a13
-rw-------  1000  1000 581436 2018-08-26 17:49:27 /home/marcus/.xsession-errors

Found matching entries in snapshot ded778cf
-rw-------  1000  1000 557607 2018-08-26 12:55:07 /home/marcus/.xsession-errors

Found matching entries in snapshot f6814a59
-rw-------  1000  1000 545425 2018-08-27 20:27:41 /home/marcus/.xsession-errors

I’d find the following, grouped by file match first, then by snapshot, much more useful:

repository 2a7fabeb opened successfully, password is correct
Found matching entries for file /home/marcus/.xsession-errors
snapshot f6814a59: 2018-08-27 20:27:41 545425 -rw-------  1000  1000
snapshot d95b4a13: 2018-08-26 17:49:27 581436 -rw-------  1000  1000
snapshot ded778cf: 2018-08-26 12:55:07 557607 -rw-------  1000  1000

If sorted, this represents a useful file history. Then, to make it more concise, an extra option (like --unique) could drop records that apparently represent the same file state, by modtime. I am with @cfbao in that I’d rather not rely on modtime and prefer the content hash instead, but as restic IIRC does select files for backup by modtime before computing their hashes, it would be pointless to be more strict for the file command.

How does that sound?


#5

@BenBipod find takes a pattern so the file path may be different for each match, and they may be multiple matching files in each snapshot. So your example is an edge case for find output.

But there could be a compact one-line output for find. Did you test find with the --json option? If it produces JSON output you could format it on to one line with jq


#6

@whereisaaron, I’m aware of my example being an edge case as, typically, I’d want the history of a specific file and not of a file pattern.
Thanks for pointing out the JSON-Option and jq, that might be the way to go for a work-around.


#7

If you don’t mind the file path repeating, then a compact find output would be close to your ideal and still work for multiple matches, e.g.

repository 2a7fabeb opened successfully, password is correct
Found matching entries
snapshot f6814a59: 2018-08-27 20:27:41 545425 -rw-------  1000  1000  /home/marcus/.xsession-errors
snapshot d95b4a13: 2018-08-26 17:49:27 581436 -rw-------  1000  1000  /home/marcus/.xsession-errors
snapshot ded778cf: 2018-08-26 12:55:07 557607 -rw-------  1000  1000  /home/marcus/.xsession-errors

Next question is why you might need to restore your xsession error log :slight_smile:


#8

Well, my idea was that pattern matching still applies, but that snapshot and file path change places. Currently, the output has pattern matches per snapshot and i would like this to be snapshots per pattern match (not really caring for the pattern match myself, as in my use case I would use it with an absolute path). Having a more compact output than the current one is a step in the right direction, but doesn’t do the trick; sorting is essential.
And you’re right, my example would certainly be more convincing with .ssh/id_rsa :wink:


#9

I can see why you would like that output sorting.

There may be a pragmatic reason for snapshot order rather than file match order. My guess is find is looping through the snapshots is date/time order and outputting matches as soon as it finds them.

The find doesn’t know what files matches it will find until it has looped through all snapshots. So to achieve your proposed file-path-sorted output it would have to delay output and remember all matches in all snapshots, then sort and output those results after it has finished. A very broad (e.g. *.jpg) pattern might require caching a lot of data to sort and display at the end of the process.


#10

Your reasoning seems totally applicable to me. While I tried to be unobtrusive by asking for a change to an existing command, the history use case and the find command don’t appear to be the ideal match. So have a new history command, similar to find but for unique, full-path files only?


#11

That’s exactly the reason why it is the way it is right now. :slight_smile: