Unforgettable snapshot


#1

Hi,

Is there a way to make some snapshot unforgettable in some way? For now I put specific tag on the one I want to keep (such as KEEP) and use --keep-tag=KEEP. But accidental deletion still may happen.

My use case:

I got year of photos ordered by years. Since those files are not meant to be modified but last year I do not see the point of re-running backups on the past years. This speeds up the backups.

The idea is to have a snapshot per year and make them unforgettable to prevent from loosing everything.

TIA.

S.


#2

Maybe for your case you could do a couple of things. For example:

  1. Override the hostname using --host flag. That way you could have “photos” as the hostname instead of having the same hostname you use for other possibles backups with the same repo.
  2. I guess you have your directories as follows: /Pictures/2017, /Pictures/2018, for example. That’s great because you can run restic forget --keep-policies "keep" --keep-last 1 and restic will always forget grouping by path first; and the great thing about it is that as a security feature, restic will not forget a snapshot if is the only one (grouping by path by default), unless you strictly run restic forget latest or specify the snapshot ID. So, in this scenario, let’s say you have two snapshot and the first one, the path is /Pictures/2017 and the second one /Pictures/2018. If you run restic forget --keep-last 1 then you’ll end up with the same two snapshots because both snapshots are from different paths. So, basically those snapshots are unforgettable unless you strictly specify restic to forget one of them.

I said the host thing because it could also be more easily for you to list them using snapshots --host Photos for example. You could use a tag but this way you have the possibility to use a tag for another purpose, like the year, or idk, whatever you want to use it. Just my humble opinion.


#3

thanks for the response.

I guess you meant --keep-tag instead of --keep-policies. The late one is not a valid option.


#4

In fact what I meant was “keep policies” as if you were to run a lot of policies but I don’t really know why I added the “keep” because it does seems like I was trying to say --keep-tag. Anyways, What I meant to say was that if you run forget with your keep policies (–keep-daily, --keep-monthly, etc.), being “2017” and “2018” different paths, for example, restic will pass the keep policies considering the path by default. Let’s say you have 2 snapshots for every path, so your snapshot list will list two snapshots for /Pictures/2017 and two for /Pictures/2018, and your only policy is --keep-last 1 because you just want one snapshot for every path; in this specific situation restic will keep 1 snapshot for every path. At the end you will end up with two snapshots: 1 for /Pictures/2017 and 1 for /Pictures/2018. If you then run forget again, it will not forget anything because there are no snapshots other than the latest one for each path. If I were to do this I would do this:

  1. Create sub-directories for years (as in /Pictures/2017, /Pictures/2018, etc.).
  2. Start backing up normally the path /Pictures/2017.
  3. Apply policies from time to time keeping just latest snapshot (--keep-last 1).
  4. At Dec 31, 2017 make sure to take one last snapshot.
  5. At Jan 1, 2018 change the path you’re backing up to /Pictures/2018 and apply forget --keep-last 1 so for /Pictures/2017 there will be no more new snapshots and this will keep just the latest snapshot from 2017, meaning the Dec 31, 2017 snapshot will always be available.

To make it completely “unforgettable” you should tag the snapshot and use --keep-tag keep for example, as you said in your original post, so it would be “more difficult” to forget that snapshot.

You should know that there are other forget tags available like --host or --tag that “overrides” the grouping options when forgetting. For example, if you run restic forget --host "YOURHOSTNAME" --keep-last 1 then restic will forget snapshots ONLY for YOURHOSTNAME. The same with --tag flag; let’s say you have “TAG1”, “TAG2” and “TAG3” in your repository. If you run restic forget --tag "TAG1" --keep-last 1 restic will forget every snapshot tagged with “TAG1” keeping just the latest snapshot. I don’t really know now where I was going with this but if you didn’t knew this, maybe you could find it useful for you use case.

Bottom line, --keep-tag is one of the best solutions for keeping snapshots you don’t want to delete. It will give you the opportunity to apply other policies at the same time for other snapshots not tagged with the tag you want to keep.


#5

What’s the impact on backup speed? Because it seems to me like it needs to be pretty substantial before this kind of solution starts making sense.


#6

@odin: indeed you’re right. I guess this was my fault at the first restic uses.

I changed paths from /Photos/2010,/Photos/2011,/Photos/2013 etc… to /Photos and it took time to recompute files. I guess this is because how the cache works.

For example:

restic -r repo --verbose  /Volumes/ExtraStorage02/Pictures
open repository
repository 6f50c184 opened successfully, password is correct
lock repository
load index files
using parent snapshot 875e33df
start scan on [/Volumes/ExtraStorage02/Pictures]
start backup on [/Volumes/ExtraStorage02/Pictures]
scan finished in 11.651s: 119616 files, 1.361 TiB

Files:           0 new,     0 changed, 119616 unmodified
Dirs:            0 new,     0 changed,     2 unmodified
Data Blobs:      0 new
Tree Blobs:      0 new
Added to the repo: 0 B

processed 119616 files, 1.361 TiB in 0:17
snapshot 55061248 saved

real	0m18.280s
user	0m31.138s
sys	0m8.909s

vs.:

restic -r repo --verbose  /Volumes/ExtraStorage02/Pictures/2016/12
open repository
repository 6f50c184 opened successfully, password is correct
lock repository
load index files
start scan on [/Volumes/ExtraStorage02/Pictures/2016/12/]
start backup on [/Volumes/ExtraStorage02/Pictures/2016/12/]
scan finished in 3.560s: 2236 files, 23.366 GiB

Files:        2236 new,     0 changed,     0 unmodified
Dirs:            4 new,     0 changed,     0 unmodified
Data Blobs:      0 new
Tree Blobs:      5 new
Added to the repo: 1.799 KiB

processed 2236 files, 23.366 GiB in 4:58
snapshot 06c4b763 saved

real	4m59.386s
user	2m20.903s
sys	0m17.524s

As you can see it took 5 minutes to add nothing.

I got this from the doc:

Please be aware that when you backup different directories (or the directories to be saved have a variable name component like a time/date), restic always needs to read all files and only afterwards can compute which parts of the files need to be saved. When you backup the same directory again (maybe with new or changed files) restic will find the old snapshot in the repo and by default only reads those files that are new or have been modified since the last snapshot

https://restic.readthedocs.io/en/latest/040_backup.html

Maybe there is a good reason to not reuse previous computed values?


#7

The problem is that restic simply doesn’t know whether the files have been changed if it doesn’t have some reference to compare them to.

If you back up the same path set twice, restic finds a prior snapshot and can use the metadata (mtime, size, permissions, etc.) to determine if a file has changed. If the metadata hasn’t changed then restic assumes that the file’s contents are unchanged without even looking at them. This is the fast path.

If there is no parent snapshot, restic doesn’t have a set of metadata to look at and must chunk every file and hash every chunk, and at the end of that process it finds that the chunks already exist and so are deduplicated. This is the slow path.

How would you expect restic to locate a suitable set of metadata for the files you’re backing up if the particular path you’re backing up doesn’t appear anywhere in the repository? Basically, restic can’t magically determine that information and it’s a bit unrealistic to expect it to do so.

There is --parent, which you can use to force a specific parent snapshot, but if the paths don’t match then it’s not useful. Restic simply doesn’t know that the two different paths contain mostly the same files.

What could make sense is some manual option that says “path x previously appeared as path y, so use y as the prior directory for the purposes of a metadata check.” However, this is such a niche case that it may not be very widely useful.

Consider instead using something like --group-by host,tags when running restic forget and make sure you tag each snapshot with the year (--tag pictures-2019 for example). Note that the default option is --group-by host,paths (which is why using different paths solves this problem) so make sure that host,tags won’t mess up any of your other use cases.


#8

Thanks for the details.

I naively thought the local cache would helped for that.

I think the best way it to keep backing up the whole /Photos directory since once the chunks are cached it’s pretty fast this makes my original question obsolete.

Sorry if question seems naive, I am new to restic and I really like to understand how things work especially regarding backup / restoration ;-).


#9

The local cache is only a copy of the indexes (which objects are in which packs) and I think tree objects might also be cached, since these contain the metadata that’s used to take the fast path in subsequent backups.

Of course, this can’t really help when paths change and restic doesn’t know where to look for the metadata.

No problem!

I realize now some parts of my reply may sound aggressive or antagonistic but this was not my intent. I was just trying to show the reasoning for why restic can’t take the fast path in this situation.


#10

No offence taken.

BTW I had a look at some (old) talks about restic this also helped a lot.
Don’t think this one is in the docs: https://www.youtube.com/watch?v=Vc0URl-GWrg.