Understanding parent directories and subdirs and other understanding questions

Hi all!

I have a few questions of understanding restic, maybe you can help me getting them answered?

a) How are parent dirs and subdirs handled?
Lets assume i start my backups with some subdir with a bunch of data (lets say “/home/me/videos”) and on another day i choose to backup another subdir (lets say “/home/me/pictures”). Later i choose to backup my whole homedir ("/home/me"). What will happen? I understand that deduplication takes care that videos and pictures are not saved twice, but will they be transferred to the repo?
Is it wise to always backup the parent dir or can i lets say hourly backup up just my mail-file and my whole homedir daily?
Can i find and restore files without having to thing with which directory “value” i did my backup?

b) how much data is really transferred?
On the backup command i can see how much data restic found in my dir which is pending, also i can see the already backuped amount also i can see transfer speed and ETA and in the summary the average transfer speed and needed time. But when i do a new snapshot on the same dir, i can’t see the “delta” which was transferred/changed - did i miss any option?

c) is restic block based?
I am pretty sure, but i couldn’t find it in the user-docs. Means if i have a 4 GB mail file (of thunderbird f.e.) and it has been backed up once, is it smart enough to only transfer the changed blocks on next backup?

d) how to forget and prune in relation to a)?
Lets say i backed up my data like described in a) and then i do a restic forget --keep-last 1 --prune, will i then have the last backed up file state of every file for the full homedir (/home/me) or will i have the last state of ever file for every folder i used for backup (/home/me, /home/me/videos, /home/me/pictures)?

EDIT: i forgot one question:
e) removing parent dir and adding very similar other parent dir, without having need to transfer all again, possible?
I have a pretty small uplink and i already transferred a big directory (lets say /abc). Now i will have a very similar directory (lets say /def) with a lot of same data and i have backed up both - is the amount of transferred data on backup of /def minimal (only different parts)?
Now i want to get rid of /abc in the backup to save space (because i never want to back it up again) and keep /def (which in fact would mean only remove the differences of /abc to /def). Is that even possible? And if yes, how?
(Background of that crazy question: I backing up my Windows, i started without VSS and now want to introduce it - which will have a different parent directory - but i dont want to transfer everything again)

Thanks a lot, restic is really a great tool!

Hey Tim, welcome to the forum!

I’ll answer the questions for the currently latest released version (0.8.1), as there is a change in the making which improves some situations a lot (but it’s not released yet, for details see #1491). I’m currently rewriting the archiver part of restic, which is the oldest code there is.

The program will re-read all files locally, but then detect that the data (assuming the files haven’t changed) is already stored in the repo. So only new metadata will be transferred and added to the repo. The process of reading (and hashing) all local files may take some time, depending on the size of the files and how powerful your hardware is.

Both works fine, restic will make a note what has been saved. You can get an overview by running restic snapshots.

Yes, there’s the restic find command which you can use to find files, restic ls simply lists the contents of a particular snapshot, and restic restore can be used with --include to only restore a subset of the files (or even a single file.

If you run something else than Windows, you can also use restic mount to locally mount the repository and seamlessly browse around in the snapshots that you have. All data is only loaded on demand from the repo.

Be sure to try that out before you need it!

No, you did not, that’s simply not implemented yet. I’m working on it. The feature is very useful and often requested.

It is, restic uses a thing called “Content Defined Chunking” (CDC) to split files into blocks. The deduplication happens on the block level, which means that each block is only saved once in the repo. So if only a few blocks of your mail file have changed, a new snapshot will not take much space. But please don’t take my word for it: try it out! :slight_smile:

You can find more information on the design of the restic repository here: https://restic.readthedocs.io/en/latest/100_references.html#design

Restic is a bit more intelligent: It will first group all snapshots by the paths they contain (among others), then it applies the forget policy. So for every group of snapshots you’ll end up with only the last one. In your situation, this means the last snapshot of /home/me, the last snapshot of /home/me/videos and the last snapshot of /home/me/pictures will be kept. There’s a bit more explanation in the manual here: https://restic.readthedocs.io/en/latest/060_forget.html#removing-snapshots-according-to-a-policy

Yes, only blobs not already present in the repo will be transferred and stored there. It’s not strictly minimal, because a single changed byte in a blob leads to it being stored again (as a new blob), and blobs can be up to 4MiB in size. But it’s a good trade off I think.

That’s possible, it’s exactly what restic forget (with the snapshot ID of /abc) and restic prune does. Although (at least for now) running prune may download and re-upload much data. There’s a lot of optimization potential for prune, we haven’t gotten around to doing that.

Restic won’t transfer it again, worst case is that it will re-read everything locally. But that shouldn’t be an issue for your use case.

Please give it a try and report back :slight_smile:

Ah, I forgot: We recently added a diff command which allows comparing two snapshot. It also prints a nice summary, e.g. like this:

$ restic -r sftp:server:/srv/data/backup/fd0 diff 562c d52f
[...]
M    /work/go/src/github.com/restic/restic/internal/archiver/new_test.go
+    /work/go/src/github.com/restic/restic/internal/fs/const.go
+    /work/go/src/github.com/restic/restic/internal/fs/const_unix.go
[...]
M    /hosts/cubicle/.zshhistory
[...]

Files:         201 new,     3 removed,    77 changed
Dirs:           78 new,     0 removed
Others:          0 new,     0 removed
Data Blobs:    282 new,    79 removed
Tree Blobs:    313 new,   240 removed
  Added:   23.546 MiB
  Removed: 21.275 MiB

Wow Alexander, that was light speed! :smiley:
Thanks a lot for answering a lot of the questions already.

I started on Linux and reported a problem in the fuse mount back then in #313 - which is solved now. :+1:
Now i start with windows and just saw the different plans for mounting. That is very cool, i look forwared to it. I definitly will try direct restore, once i backed up a few parts.

You already answered that with your second answer, awesome! (However, having that info on the backup itself would be great, but reading your answer it sounds like it is already in the planning - excellent!)

I will, i will, but my uplink is small, initial backup of that file will take some, ehm, weeks (and i hope the resuming in restic works smooth…) :smiley:
I recommed to mention that (in smaller, easier words) on the website (https://restic.net/) - people might be looking for that (including me). I just found the github repo for the website - if i find a good wording i might even send in a pull request.

I saw that, but i lacked understanding it with regards to using different directory “values” on backing up. Thanks for explaining me.
But that also means that i need to be wise on which “directory” value i use. Or could i say forget and prune all /home/me/videos and all /home/me/pictures and keep only the last /home/me snapshot? That would still give leave me kind of the last (backup) state of my homedir in the repo, correct?

Oh, i didn’t know that (much data) - thanks for telling me. That brings up another question:
Since the remote SFTP location is my own Linux server - can i install restic also there and do the forgetting and pruning stuff there (which then should be fast, because locally)? (Keep in mind that i backup Windows and want to do that on Linux) From security point of view it would be questionable, because i would need to provide the password also there, but would it work? (Let me guess: “Try it out?” :smiley: )

Yes, you can manually remove the last snapshot for /home/me/pictures and only keep the backup of /home/me. But don’t take my word for it: Please try it out (with a local repo or a smaller data set).

Yes, that works. And you’ve already discovered that you need to enter the password on the server :wink:

Thanks again.

Feedback to some tests:

  • Working on the repo on the remote server with the Linux version of restic directly, forget, prune and find things:
    Works, just the find output does not look Windows-ish (pathes) - hower it looks like this might have destroyed things. See bug below.
  • Getting rid of previous subdir snapshots and keep the parent folder snapshot:
    Works.
  • Having nearly same data under different parent subfolder (due to VSS):
    Works, took just seconds
  • Find files in backup:
    Does not work, opened #1527 for that. (However finding locally directly on the Linux server works, see Issue)
1 Like