Per-file version history

BenBipod · August 26, 2018, 5:05pm

Hello everyone,

A common use case of backup software for me is to restore an individual file that I found has been messed up. This is straightforward to do with restic, given the problem becomes apparent before any further snapshots are taken - just restore the latest version. Yet, when the messed-up file has been sitting there for some time, with multiple snapshots taken inbetween, it becomes cumbersome to identify the exact snapshot that contains the file in its previous, healthy state.
So what I would like to see is a history of a particular file across snapshots, that clearly identifies when in the past that file has changed. This would allow me to select a snapshot from which I can then restore the file. I think, currently this is possible only by manually browsing the mounted repository, checking the file in each snapshot, going back in time one by one.
This feature can be thought of as an extension of the find command, adding the following information/processing:

in addition to the “snapshot contains file” information the output should include the file’s hash in each containing snapshot
the snapshot’s timestamp should be given
the resulting list of hash-per-snapshot records should be sorted by timestamp and grouped by hash, so that only the youngest snapshot containing the file with a particular hash is shown

Does this make sense to others? Would that be hard to implement?

764287 · August 27, 2018, 8:16am

For a backup solution restoring files is of course equally important as saving them, hence the restore process should be easy and comfortable. While restic could be improved in this regard by providing more information it should also be as simple as possible.

Is this information really necessary? Hardly anybody knows the hash of the file they are searching for. Most people will use restic find --oldest DATE --newest DATE.

While I agree that this information would be nice to have I don’t think it is necessary. When using restic find --long the modification time of the file is displayed, which usually is all you are looking for.

You are right. Currently there doesn’t seem to be any kind of sorting which is kond of confusing.

cfbao · August 27, 2018, 3:56pm

I believe this is more about knowing when the file actually changed. Sometimes mod-time can be unreliable.

Same as above, mod-time may be unreliable.

BenBipod · August 27, 2018, 6:51pm

Thanks, @764287, for pointing out the --long option to the file command; I was not aware of it, and it helps.

Yet, instead of the current output, which seems to be organized by snapshot first, then by file match

$ restic find --long /home/marcus/.xsession-errors
repository 2a7fabeb opened successfully, password is correct
Found matching entries in snapshot d95b4a13
-rw-------  1000  1000 581436 2018-08-26 17:49:27 /home/marcus/.xsession-errors

Found matching entries in snapshot ded778cf
-rw-------  1000  1000 557607 2018-08-26 12:55:07 /home/marcus/.xsession-errors

Found matching entries in snapshot f6814a59
-rw-------  1000  1000 545425 2018-08-27 20:27:41 /home/marcus/.xsession-errors

I’d find the following, grouped by file match first, then by snapshot, much more useful:

repository 2a7fabeb opened successfully, password is correct
Found matching entries for file /home/marcus/.xsession-errors
snapshot f6814a59: 2018-08-27 20:27:41 545425 -rw-------  1000  1000
snapshot d95b4a13: 2018-08-26 17:49:27 581436 -rw-------  1000  1000
snapshot ded778cf: 2018-08-26 12:55:07 557607 -rw-------  1000  1000

If sorted, this represents a useful file history. Then, to make it more concise, an extra option (like --unique) could drop records that apparently represent the same file state, by modtime. I am with @cfbao in that I’d rather not rely on modtime and prefer the content hash instead, but as restic IIRC does select files for backup by modtime before computing their hashes, it would be pointless to be more strict for the file command.

How does that sound?

whereisaaron · August 31, 2018, 6:41pm

@BenBipod find takes a pattern so the file path may be different for each match, and they may be multiple matching files in each snapshot. So your example is an edge case for find output.

But there could be a compact one-line output for find. Did you test find with the --json option? If it produces JSON output you could format it on to one line with jq

BenBipod · September 1, 2018, 7:35pm

@whereisaaron, I’m aware of my example being an edge case as, typically, I’d want the history of a specific file and not of a file pattern.
Thanks for pointing out the JSON-Option and jq, that might be the way to go for a work-around.

whereisaaron · September 1, 2018, 7:44pm

If you don’t mind the file path repeating, then a compact find output would be close to your ideal and still work for multiple matches, e.g.

repository 2a7fabeb opened successfully, password is correct
Found matching entries
snapshot f6814a59: 2018-08-27 20:27:41 545425 -rw-------  1000  1000  /home/marcus/.xsession-errors
snapshot d95b4a13: 2018-08-26 17:49:27 581436 -rw-------  1000  1000  /home/marcus/.xsession-errors
snapshot ded778cf: 2018-08-26 12:55:07 557607 -rw-------  1000  1000  /home/marcus/.xsession-errors

Next question is why you might need to restore your xsession error log

BenBipod · September 1, 2018, 8:21pm

Well, my idea was that pattern matching still applies, but that snapshot and file path change places. Currently, the output has pattern matches per snapshot and i would like this to be snapshots per pattern match (not really caring for the pattern match myself, as in my use case I would use it with an absolute path). Having a more compact output than the current one is a step in the right direction, but doesn’t do the trick; sorting is essential.
And you’re right, my example would certainly be more convincing with .ssh/id_rsa

whereisaaron · September 2, 2018, 7:39pm

I can see why you would like that output sorting.

There may be a pragmatic reason for snapshot order rather than file match order. My guess is find is looping through the snapshots is date/time order and outputting matches as soon as it finds them.

The find doesn’t know what files matches it will find until it has looped through all snapshots. So to achieve your proposed file-path-sorted output it would have to delay output and remember all matches in all snapshots, then sort and output those results after it has finished. A very broad (e.g. *.jpg) pattern might require caching a lot of data to sort and display at the end of the process.

BenBipod · September 4, 2018, 5:54pm

Your reasoning seems totally applicable to me. While I tried to be unobtrusive by asking for a change to an existing command, the history use case and the find command don’t appear to be the ideal match. So have a new history command, similar to find but for unique, full-path files only?

fd0 · September 6, 2018, 9:46am

That’s exactly the reason why it is the way it is right now.

cfbao · May 24, 2019, 1:46am

I vaguely remember that the inclusion of the snapshot date in restic find --long was discussed (and agreed upon?) in a GitHub issue.
Do I remember correctly? If someone knows what I’m talking about, could you post a link?

If snapshot date is included, I think file hash is the only thing missing for restic find to work as a simple file version history tracker.

764287 · May 24, 2019, 7:23am

This one?

github.com/restic/restic

Improve output for `restic find`

opened 07:33AM - 05 Nov 18 UTC

closed 09:36AM - 17 Dec 19 UTC

fbarbeira

type: feature suggestion help: good first issue

Output of `restic version` -------------------------- restic 0.9.3 compiled …with go1.11.1 on linux/amd64 What should restic do differently? Which functionality do you think we should add? ---------------------------------------------------------------------------------- Every time I try to find a particular file, I execute the command: ``` # restic find myfile.php repository 285d013e opened successfully, password is correct Found matching entries in snapshot 08364c47 /usr/home/myfile.php Found matching entries in snapshot 11cb608b /usr/home/myfile.php Found matching entries in snapshot 1a1e12d9 /usr/home/myfile.php [...] # ``` After that, I have to run the command `restic snapshots` to know when the snapshot 11cb608b was taken. So I have to run two commands to know exactly the snapshot I need to restore. It would be very useful if the `restic find` command output the snapshot date as well as snapshot id, some like this for example: ``` # restic find myfile.php repository 285d013e opened successfully, password is correct Found matching entries in snapshot 08364c47 - 2018-10-23 04:11:05 /usr/home/myfile.php Found matching entries in snapshot 11cb608b - 2018-10-24 04:11:04 /usr/home/myfile.php Found matching entries in snapshot 1a1e12d9 - 2018-10-28 04:11:19 /usr/home/myfile.php [...] # ``` Another option is to combine in some way the output of `snapshots --compact` with the output of `find`. ``` # bin/restic_v0.9.3-57-gc0572ca1_linux_amd64 snapshots --compact repository 35bee826 opened successfully, password is correct ID Time Host Tags ----------------------------------------------------------- 8945cf19 2018-08-15 22:50:03 xxxxxxxxxxxx 01f5b1cf 2018-08-16 07:48:06 xxxxxxxxxxxx 1cf4ec94 2018-08-18 10:28:54 xxxxxxxxxxxx 701c2b2f 2018-08-20 07:11:26 xxxxxxxxxxxx 18d3396d 2018-08-21 07:39:31 xxxxxxxxxxxx 8457abc4 2018-08-22 07:22:04 xxxxxxxxxxxx bf1074f8 2018-08-23 07:21:35 xxxxxxxxxxxx 9fd3c14f 2018-08-24 07:49:50 xxxxxxxxxxxx e93b0a12 2018-08-27 07:24:22 xxxxxxxxxxxx b1bc35db 2018-08-28 07:41:10 xxxxxxxxxxxx ``` Did restic help you or made you happy in any way? ------------------------------------------------- Very happy!! really good piece of software!

cfbao · May 24, 2019, 3:48pm

Yes, exactly! Thank you!

martin_w · May 25, 2021, 6:42am

I just ran a restic find --no-lock "/home/myusername/sites/backups/mysqldb/wp*_5000.sql.gz" command. Files were found in many different snapshots, but the snapshots are listed in no discernable order. The first snapshot listed was from 2021-05-18. The next one was from 2021-04-30. The final one (of many others) was from 2021-05-23. And somewhere in between, there was one from today (2021-05-24)!

The snapshots are not listed in the order in which they were created, nor in the order of their hashes. Seems to be no logic to the sorting.

In addition, the find command took quite a long time – several minutes. I’m not sure what performance to expect, but I was thinking it would be speedier, like an indexed database search. (restic stats: Snapshots processed: 33. Total File Count: 3222114. Total Size: 14.050 GiB.)

I would much prefer it if restic find listed matches in descending order of the snapshot date. That has the advantage that (a) the newest matches are listed first, so I can quit if I don’t want to wait for more, and (b) it effectively gives me a version history if I want to look for versions of a specific file.

Thanks for considering this!

alexweiss · May 25, 2021, 10:59am

See

github.com/restic/restic

Show history of file

opened 02:19PM - 08 Nov 20 UTC

aawsome

category: user interface type: feature suggestion

Output of `restic version` -------------------------- restic 0.11.0 (v0.11.0…-42-g9e4e0077) compiled with go1.14.7 on linux/amd64 What should restic do differently? Which functionality do you think we should add? ---------------------------------------------------------------------------------- Add a possibility to show some kind of "history" for one or more file(s). An option would be to add an option to `restic find`. E.g. `restic find --history --long /data/my_file.txt` might produce something like: ``` Found matching entries in snapshot 3e8ff4a9 from 2020-02-04 03:53:08 (+3 subsequent snaphots) -rw-r--r-- 1000 1000 6 2020-02-04 03:41:48 /data/my_file.txt Found matching entries in snapshot b41a0aa2 from 2020-02-04 04:10:27 -rw-r--r-- 1000 1000 6 2020-02-04 03:58:10 /data/my_file.txt Found matching entries in snapshot c3d8da1e from 2020-02-04 04:17:20 (+1 subsequent snaphots) -rw-r--r-- 1000 1000 6 2020-02-04 04:15:23 /data/my_file.txt ``` Of course, if using `find`, we should also sort/group the snapshots by paths and date. Just realized that this actually is not the case. What are you trying to do? What problem would this solve? --------------------------------------------------------- If a file is backuped by a automated procedure, it will be usually be contained in many snapshots. Now imagine you need this file and just realize it has been "damaged" (e.g. by a user trying to work on it), You may want to get the last "undamaged" version from your backup. However, restic so far can only produce a list of snapshots where the file is contained and you have to manually go through all of those to find the version. This may even apply if the file was just changed a few times. So it might be handy to have restic dermine how many different "versions" of this file really exist in the backup and which snapshots can be used to access those. Did restic help you today? Did it make you happy in any way? ------------------------------------------------------------ Backing up shared directories (where many users can write files) with restic makes me feel much more relaxed. Having many users with write access increases the risk of errors by mistake a lot. I'm happy to have a very good backup utility with restic here which simply works!

and the suggestions therein for some workarounds with current restic versions.