Issues with folders rescanning

So I’m not sure if I’m doing something wrong, or if this is just how Restic works, but I’m hoping someone can help me.

I’m backing up about 63TB of data to S3, and so far it’s going ok, but I am trying to see if I can do it in segments to avoid losing too much progress if the backup fails (internet goes out, server reboot, etc.). It’s going to take almost a month total.

One idea I had was to backup one directory at a time. I’ve got 7 directories, ranging in size from 100GB to 30TB. I started with the 100GB folder, and got it uploaded. I then added the second folder, about 600GB and started it.

What I don’t understand is that if I start with just folder1, the backup finishes in 30 seconds with accurate stats. As soon as I add folder2, it lists the files in folder1 as new and rescans them. It doesn’t upload again, so it’s faster than starting from scratch, but I want it to just mark those files as unchanged and start with folder2. Is there a way to do that?

TL;DR: Add the --parent <snapshot-ID-from-first-backup-run-with-folder1-only> option to your second backup run with folder1 and folder2 in it.

Assuming that your first backup run was for just folder1, and that in your second backup run you added folder2, I’m guessing that in your second backup run output you see the text “no parent snapshot found, will read all files”. This is because while you do have a first snapshot, which contains/references metadata for the files in folder1, restic does not find that snapshot when it looks for snapshots to compare the current backup run’s files’ metadata with in order to see which files it does not need to scan again because they haven’t changed. The reason it doesn’t find the parent snapshot is because the paths you give it differ - restic looks for snapshots with the same paths as the ones you’re trying to back up, when trying to locate a parent snapshot.

That was a lot of text, but putting it another way, try grabbing the snapshot ID from the first run (e.g. from that run’s output, or using the snapshots command) and then add --parent <snapshotID> to your second backup run (right after the backup command, so you don’t accidentally misplace the option). This should make restic know that it can compare the files in folder1 with the already backed up files’ metadata, and thereby it should be able to know that it does not need to scan those files again.

In short, what you want to see is something like using parent snapshot 7e64a9cf in your second backup run. You can read a little bit more here: FAQ — restic 0.12.0 documentation

Another thing on a related note/topic that might be good to know about at some point are the --ignore-ctime and --ignore-inode options. That’s just a sidenote though, probably not relevant for what you described.

2 Likes

Thank you very much, I believe this is what I need. With the issue being that the previous snapshot isn’t recognized as a parent because the source has changed, would I be able to also do this in the opposite way with excludes?

For example, since everything I want is in a single top level directory, could I just set that directory as the source, then exclude all but one folder. Once that backup completes, if I remove another folder from the exclusions, would it still see the existing snapshot as the parent? I’m going to try this and find out.

Yes, of course.

Also note that there are some open PRs:

1 Like