Randomly Needs to Rescan All Data

lrrrac · December 17, 2020, 8:55pm

incorrect totals though. I didnt let it fully scan. takes forever.

cdhowie · December 17, 2020, 9:01pm

If you add --ignore-inode does it still scan everything? In addition to ignoring the inode numbers, this also ignores ctimes.

lrrrac · December 17, 2020, 9:03pm

Trying now. Will this affect other things? Like it wont catch other changes? Is it a good permanent fix?

cdhowie · December 17, 2020, 9:04pm

Ctime is updated when the node’s metadata is updated, so file ownership and permissions changes won’t be noticed unless the file’s data also changes (causing an mtime change).

lrrrac · December 17, 2020, 9:05pm

I guess thats not a big issue. I have all of the permission sensitive data on my ssd which those backups dont seem to have this issue. Its just the HDDs through the HBA card. I dont have any other HDDs.

lrrrac · December 17, 2020, 9:06pm

So far its skipped previous files that it was reading. So potential fix here.

cdhowie · December 17, 2020, 9:07pm

So the question now is why did the ctime change for these files.

lrrrac · December 17, 2020, 9:10pm

Access: 2020-11-18 02:03:58.694997083 -0500
Modify: 2020-09-22 13:31:47.941259070 -0400
Change: 2020-11-13 12:27:50.813345859 -0500

Here are dates for one file that it was rereading. Could it change it without setting it to the current time? My last good backup was yesterday.

lrrrac · December 17, 2020, 9:20pm

After all of this its looking like some of my hardlinks got screwed up yesterday. Neat. So that sounds like its most likely the main issue. Thanks for working on this with me. I figured out what it was. restic works great!

cdhowie · December 17, 2020, 9:21pm

Glad you were able to figure out the issue!

lrrrac · December 17, 2020, 9:22pm

Now to look through and see if I can fix all of the issues. And figure out what caused it…

lrrrac · December 17, 2020, 9:23pm

I do have a question however. If I attempt to restore a folder with hardlinks, will it restore the hardlinks or just restore the file itself not linked?

cdhowie · December 17, 2020, 9:27pm

Hard links should be restored, but I think only if the restore operation actually includes multiple links to the same inode. For example, if /a/b and /c/d refer to the same inode and you only restore /a/b then it will not be made a hard link to whatever inode /c/d on the restore target actually refers to; only if /a/b and /c/d are restored in the same restic restore invocation would I expect them to point at the same inode at the end of the restore process.

lrrrac · December 17, 2020, 9:34pm

well that makes this sound like it will last a bit longer. Thanks!

fd0 · December 18, 2020, 7:35am

Thanks a lot @cdhowie for debugging this!

I just learned: Adding a new hard link to a file changes its ctime:

$ stat testfile
[...]
Access: 2020-12-17 22:31:08.318146994 +0100
Modify: 2020-12-17 22:31:08.318146994 +0100
Change: 2020-12-17 22:31:08.318146994 +0100

$ ln testfile testfile2

$ stat testfile
[...]
Access: 2020-12-17 22:31:08.318146994 +0100
Modify: 2020-12-17 22:31:08.318146994 +0100
Change: 2020-12-17 22:31:13.926046483 +0100
[...]

That’d make restic re-read the file.

By default, restic’s algorithm for detecting changes pragmatic: if in doubt, re-read the data. That’s the most conservative approach and I’m convince it’s the right one. We’re still tuning the algorithm though, and there’s a PR for adding the option --ignore-ctime to cover exactly this use case.

Maybe that’s what happened to you. Do you by chance use something like rsnapshot?

Have a nice weekend!

lrrrac · January 7, 2021, 7:33pm

@fd0 Thanks for the follow up! This sounds like it would solve the rescan issue. I am curious however, while this will prevent rescanning when hardlinks change pointing to the same data, will the backup still save all hardlinks that exist and restore them as hardlinks? Also not sure if there is an ability to restore hardlinks without having to restore the original file. That would save a significant amount of time and bandwidth for this admittedly small usecase. Thanks for being so involved!

fd0 · January 9, 2021, 8:52pm

Yes, it will, but please be aware that restic only restores hardlinks within the same restic restore run (as @cdhowie already pointed out).

When you have a hard link, the file system has several names for the same content. In that regard, every file is an “original” file, there’s on difference.

cdhowie · January 10, 2021, 8:05am

To clarify what @fd0 is saying here, a file on most Unix filesystems has two pieces: the name in the directory tree, and an inode number.

The name entry contains the name of the file (obviously) and the inode number.

The inode itself (which has no name beyond its number) references the file data, and other metadata such as ownership and permissions.

A “hard link” isn’t really a thing as much as it’s a process. You can hard link an inode under a new name. Once this process is done, you now have two different names that refer to the same inode, meaning that content and metadata is shared between the two.

Note that one name does not refer to the other, meaning there is no concept of which name is the “original.” Once the hard link process is complete, it’s impossible to distinguish which name came first. They are peers, and neither has any more claim to the inode than the other. Saying that one is a hard link of the other is kind of true, but is misleading since neither name points at the other. It would be more correct to say that they are hard linked – a state that applies to all names equally.

Note that none of this is true of symlinks; a symlink is a name in the filesystem that contains arbitrary text. This text is interpreted as a file name at the moment the symlink needs to be resolved. The target need not exist at the time the symlink is created, and the final inode that is resolved from a symlink can change over the lifetime of the symlink without any change to the symlink itself. You can do strange things with symlinks, such as having a symlink point to an ancestor directory (creating a loop) or having a symlink point at itself.

Notably, symlinks can point at directories, and can also point at filesystem objects outside of their own filesystem. You cannot hard link a directory, nor can you hard link between different filesystems.

punchcard · January 11, 2021, 3:27am

Thank you for this explanation. For years I have not understood hardlinks vs symlinks. Part of understanding that there is a good backup is understanding how the computer stores files or at least that the backup program understands how files are stored. Your comments are one of the reasons that I use restic.
Perhaps this explanation could be placed in the restic documentation in a technical area so a casual user does not need to read it.

lrrrac · January 12, 2021, 1:41am

I was aware of how hardlinks work, I was just using original to mean the first pointer created. I have tested with the --ignore-cname fork and it works quite well. I used it to create a new snapshot with updated cnames and subsequent backups without ignore cnames work as expected. I have also tested restoring hardlinks and in some cases restoring two hardlinks to the same inode results in deleting the existing link, and restoring the missing link.