Sorry about all the questions lately! I’m just starting to use restic for backup, and am trying to understand some of the subtleties of its behavior in order to optimize my usage.
As I understand it (see here) the --parent flag is only used in backup during the scan of pending included files. Depending on what’s in the parent, one of two things happens:
- file in parent: metadata (mtime, inode, etc?) compared to determine if full file read/dedup is necessary, which avoids reading in the entire file to detect changes
- file not in parent: no metadata comparison available, so full file is always read/deduplicated
There is no change to the data stored in the repo, but backup scan time is greatly improved if metadata comparison is available. So ideally the parent would be very large and contain metadata for every pending file.
I’m trying to figure out why only a single snapshot can be used as a parent. Presumably it’s to limit complexity and memory usage, but are there cases where you wouldn’t want as large a parent as possible?
I began looking at this while running a few backups on different directories, where the first two roughly summed to the third:
- backup /home/foo
- backup /home/bar
- backup /home
I expected the third backup to go quickly, since most of its files had been previously added to the repo. But it took as long as a new /home scan, due to no directory-matching parent. I could have selected 1 or 2 manually, but I didn’t think to do that beforehand.
Perhaps it would be possible to have a parent object that doesn’t directly correspond to a single snapshot? Then you could merge snapshots according to some policy (last N, dir/host/tag matching, merged size, etc.) for more metadata and faster backup scans.
…or maybe my usage is broken and isn’t expected to work efficiently? My mental model was that backup speed depends on what’s been added to the repo, not the order of snapshot creation or the per-snapshot directory spec.