Skipping git-ignored files

Hi!

I’m trying to trim down the set of files that restic backs-up from my machine. I have a lot of git repositories cloned on my machine from various current and past projects (around 1600 of them…), and I would like to prevent restic from backing up any files that I’ve set to be ignored by git. These files tend to be intermediate files that I could regenerate if I needed to, so there’s no need to back them up.

Has anyone come up with a workable solution for this kind of problem?

I can probably get git to generate a list of all of the files it’s ignoring for each repository, and then put that list into an exclude file that I feed into restic. However this seems like a somewhat cumbersome way of doing it – if restic knew how to interpret git ignore files then it would be really handy!

Sorry but you’ll have to do this with additional tooling. There’s excludes as you already mentioned, there’s also the --files-from* options you can use.

You might want to take a look at the discussions in Use .gitignore (or other VCS) for excluding files · Issue #1514 · restic/restic · GitHub

Thanks! Your comments pointed me in the right direction: In #2246, which is linked to from #1514 that @MichaelEischer directed me to. Using fd to do the gitignoring results in a working arrangement.

The only complexity I found was that with the --files-from-raw argument results in restic’s cache directory exclusion, so its functionality has to be hacked back in again. Here’s what I ended up with:

# Find the ID of the latest shapshot
parent=$(restic --json snapshots -H $(hostname -s) | jq -r 'max_by(.time) | .id')

restic backup -x --exclude-caches \
        --parent $parent \
        --files-from-raw \
        <(fd -H0 --one-file-system \
	      -t f -t l \
	 	  --ignore-file <(fd -H --one-file-system CACHEDIR.TAG $HOME -x echo {//}/) \
		  --ignore-file ~/.restic-exclude \
		  . $HOME)

I think the -x and --excludes-caches arguments to restic are probably entirely superfluous.

… and after playing with that a bit I find I run into the same problem reported in #3451, which will hopefully be resolveable when pull request 3200 is merged :crossed_fingers:

One recommendation where you could improve this a bit: You can use the flag --latest 1 to only output the latest snapshot available. Depending on where your repository lives, this could save some time.
Like so:

parent=$(restic --json snapshots --latest 1 -H $(hostname -s) | jq -r 'max_by(.time) | .id')