Restic slow to snapshot new hard links of large files

cdhowie · December 6, 2018, 10:04pm

Can you give us an example of the scripts that perform these operations, including invoking restic? In particular, when you perform a backup, are the paths you give to restic always the same, or do they change every backup? My suspicion is that the path differs each time.

When performing a backup, restic always looks at absolute paths and never relative. If there is a snapshot with the same hostname and exactly the same set of paths, restic uses this as the “parent” and will look at file metadata (at least mtime and size). If the metadata is identical in the parent snapshot, restic does not scan the file contents at all and assumes the file did not change.

If restic can’t find another snapshot with the same absolute paths then restic has no parent snapshot to use (and the absolute paths wouldn’t line up anyway, so restic wouldn’t know which file corresponds to which) and so it has to process the contents of all files. Obviously it will be able to deduplicate the data, but it doesn’t know it can until it hashes each chunk of each file.

Note that this has nothing to do with hard links, necessarily. It’s just that restic is only able to skip over unchanged files efficiently if the absolute path is the same as it was in the last snapshot, and the file metadata hasn’t changed.