Restore include and exclude are mutually exclusive


#1

I noticed recently that a restore’s list of includes and its excludes are mutually exclusive. I believe this is not the case with backup, which allows you to specify both files/folders to include and exclude.

At a high level, what would be required to make restore accept both a list of includes and excludes like backup does?

I know there is some discussion here about unifying and standardizing the include/exclude UI: https://github.com/restic/restic/issues/233 - however, that seems like a slightly different issue…


#2

The difference between backup and restore is that backup does not have --include :slight_smile:

The big problem with allowing exclude and include filters at the same time is the user interface. From an algorithmic/implementation point of view it’s not hard to do that. I’ve described how I can imagine a possible user interface for that in the issue.

I’m not against adding this to restic, but the prerequisite would be to take all the select (exclude) filter functions from package cmd/restic and move them to to e.g. internal/select, adding proper tests. Historically, these functions are still in the main package. Afterwards we can build an implementation for mixing include and exclude rules.

For an implementation I’d expect a write-up how the user interface would work exactly before proposing the implementation.

This is also blocked by 2032, which should be merged first.

Interesting observation: There are two things in restic that surprised me a lot in terms of complexity: backup retention policies and include/exclude stuff. I would never have imagined these two areas to be so complex…


#3

True, but backup does accept a list of files to include:

Usage:
  restic backup [flags] FILE/DIR [FILE/DIR] ...

Which feels like being able to specify multiple --include flags.

So does that work differently than the --include flag does for restore? (Apologies if the thread on GitHub already talks about this, if it does it wasn’t clear to me; I did just skim it.)

Ha, yeah, it’s the unexpected stuff that’ll get ya! I’m still discovering the little nuances of all this…


#4

Indeed, backup has implicit support for including files with the list of targets to save. The difference to --include (as I see it) that this is just one level and the user interface is clear: restic takes the list of files/dirs and applies the excludes. That’s it.

When mixing the patterns it gets messy fast:

$ restic backup \
   --exclude '*.go' \
   --exclude ~/work/secret \
   --include ~/work/secret/project \
   ~/work

Should restic archive ~/work/secret/project/foo.go? What takes precedence?


#5

That’s indeed a difficult topic. I’d assume some want it as simple as possible and others want it to be as powerful as possible. I could imagine 2 approaches.

  1. Allow mixing include and exclude and set precedence by reversing the order they are entered. While this is obviously quite powerful it can become very complicated aswell.
  2. include always takes precedence over exlude. This approach isn’t as powerful as the first approach but is easy to understand (and debug).

#6

For the record: My approach here is to prefer simplicity over power of expression. Making a user interface simpler is a good thing, even if that means limiting what users can do :slight_smile:


#7

I see what you’re saying. My initial thought was to NOT have both a list of files at the end (positional arguments) and an --include flag. Thus you simply choose the files to back up, minus any on the exclude list. Kind of like how it works now.

I am wondering if it would be desirable to have it work the same way for restore.

However, to answer your question, I would expect that the more specific (longer) paths take precedence, no matter which list they appear in or which order they are in. Wildcards, especially in short patterns, should be the lowest precedence. I can see how it’s confusing, but again, what I really want to get at is having restore work similarly to backup does; since that appears to have already been figured out.


#8

Not sure if of interest, but…

Over the weekend I implemented a prototype web-based restore for Relica. Since it does not use restic restore, I had to implement my own file traversal logic, including deciding which files to include and exclude, using both an include and exclude list. I wonder if this would be a helpful demonstration of what I was thinking restic restore could do, too, to utilize both lists.

// combine both include and exclude lists into a map
// of the file path to whether it is INCLUDED or not
fileSelection := make(map[string]bool)
for _, f := range includeList {
	fileSelection[f] = true
}
for _, f := range excludeList {
	fileSelection[f] = false
}

// for every included file, walk that tree
for _, includedFile := range includeList {
	// ... begin walk
	if !shouldWalkFile(fpath, fileSelection) {
		return // skip this file/folder
	}
}

// ...

// shouldWalkFile sees if the include or exclude list for the
// file at fpath matches more closely. It tries to find the
// longest matching path in fileSelection by iteratively
// truncating segments off of fpath.
func shouldWalkFile(fpath string, fileSelection map[string]bool) bool {
	for strings.Contains(fpath, "/") {
		if keep, ok := fileSelection[fpath]; ok {
			if keep {
				return true
			}
			return false
		}
		// truncate the last portion of the path
		parts := strings.Split(fpath, "/")
		fpath = strings.Join(parts[:len(parts)-1], "/")
	}
	return false // weird; no matches in either list 🤔
}

This kind of logic allows both include and exclude lists to function in a predictable way, IMO. Simply, the longest matching path (per segment, not per-character) wins. If the longest matching path is in the include list, the file is included. If the longest matching path is in the exclude list, the file is skipped.

I don’t yet know how to apply that logic to patterns like '*.go' but I would imagine if '*.go' is on the exclude list and foo.go is on the include list, that foo.go would be included.

Just my $0.02.