How to exclude directories already in source control?

Restic is fantastic! Still, I have just learned about it and some details are hard to understand.

So what is the best way to exclude directories that are already in source control?
What I tried is: --exclude_if_present .git
However, although restic excludes all the files in a directory that has a .git under, still it continues to backup everything “inside” the .git directory. Do I also need to add something like /**/.git to exclude patterns?

1 Like

How you’d achieve that depends on what “exclude directories that are already in source control” means exactly for you. From what you wrote I’m guessing that you mean “all files and sub-directories contained in a directory which has a .git directory”. That’s only one common case for “directory in source control”, there are many different. For example, for Git, the .git dir can be anywhere on the local file system.

So, the easiest way you have already discovered: --exclude-if-present .git takes care of the directory content other that the .git directory, and --exclude .git will exclude the .git dir itself. But this is also not exact, because it means that a file called .git will also be excluded (together with all other files in the directory).

If you can identify “directories that are already in source control” via another way, it’d be better to write these directories to a file or stdout, and use --exclude-file, like this:

restic backup --exclude-file <(find /srv -type d -name .git | xargs dirname) /srv

The <(...) is a fancy shell way of saying “take the output of the command and make it available as something which resembles a file and return a file name, so the process can be read from it”, this way you don’t have to write an actual file.

Yes, I realize my question was not very precise; thanks for addressing it nonetheless.

I understand that for you, as a developer of restic, it is important to consider all corner cases while at the same time trying to make the major use cases very easy. These things are usually hard to balance at the same time.

Anyway, the pattern of using a --exclude-if-present and a --exclude together seems to be easy enough to remember and might be just enough for a lot of use cases. The shell trick is cool but for those like me who might not be very proficient in the UNIX ways looks more cryptic, thus harder to read/remember.

Maybe it would be nice to mention such patterns in the documents (with proper warnings about cases that they may break down).

If you are willing to contribute to the documentation, feel free to open a PR on GitHub and we can see where and if this can find a place. Especially if you think the current documentation is lacking critical information or is not going enough into detail.

Would a command like

restic backup --exclude-if-present .git/ --exclude .git/ /srv

get around problems with a file named .git while achieving the intended result?

I would also like to know the syntax to use on a windows client while excluding repositories. My first guess (untested) is:

restic backup --exclude-if-present .git\ --exclude .git\ d:\Users\myuser\repos\

I’m guessing you mean: “Does restic test for the presence of a directory if the exclude path ends with a slash?” That’s not the case, restic looks for something with that name in each directory (file, dir, symlink, …). So you can leave out the trailing slash.

The syntax looks good, although I’d leave out the trailing backslashes.

I would like to provide a counter-argument for your stated goal of “don’t backup anything that’s in source control.”

How important is your time? If you had to restore from backup, how long is it going to take you to figure out all of the stuff you had copies of and put them back where they were? What about local git config such as remotes?

How important is your data? What if you lost local modifications that you had not yet pushed? What if you have a repository you have created but haven’t even pushed anything anywhere else? If you blindly exclude anything containing .git then you might accidentally exclude something you don’t have a copy of.

On my system I have 163 Git clones using a combined 8.4GB. Some of these have non-standard configurations (multiple remotes, local scripts). Rebuilding all of this would mean I have to figure out the origin of 163 repositories. I might even have unpushed work here.

I would save 8.4GB in my Restic repository by not backing these up. That amount of cloud storage ($0.042/mo on B2) is not even close to the value of the amount of time it would take to restore all of those repositories from the origins after a catastrophic disk failure. I’ll give Backblaze a nickel a month not to have to do that work.

Disk space is cheap; your time is not. Back them up!

3 Likes