[PSA] ATTENTION anyone not using a recent master (ie, > 0.9.5) version: restic will NOT detect changed files if mtime is reset by the application (eg: MS Excel and .xls/.xlsx files on macOS)

Hello everyone,

This is a follow-up to my “corrupted files after restic restore” issue.

So, I woke up at 3AM with the idea of trying to track down this problem by looking for differences in those files in the ZFS snapshots I store here (I have all of them saved). What I found, perhaps predictably, is that the ZFS snapshot made at the moment restic backup managed to save its first snapshot contained these two files with EXACTLY THE SAME CONTENT as restic restore restored them (from a much later restic snapshot).

So, what was happening was, obviously, that these files’ contents had changed and their mtimes had not (causing restic backup to miss their changing) . I checked them on the snapshots with stat(1) and saw that, albeit mtime hasn’t changed, ctime had!

I managed to reproduce the problem by opening a .xls file (like the ones involved in my problem) with MS Excel, moving around the file a bit without changing anything, and then exiting MS Excel without saving. Apparently Excel saves something inside the file (which doesn’t change its “visible” content) and rolls back mtime to “hide” it… Anyway, I bet this is a very common problem!

I then googled for “restic ctime mtime” and found this and then this; to sum it up: it’s already a known problem with a fix in master… which was commited after v0.9.5 was released, so it is NOT present on any released restic version yet!

What I did was to manually patch the respective commit into my own restic source tree (which already implements other modifications and workarounds, that’s why I simply didn’t download master) and compile to produce a new binary. I expect that, on its next backup run, this patched restic will detect the changed files and happily upload their contents, fixing the issue (apart from the previous snapshots where they will still show with the wrong content, of course).

So, if you or your users run MS Excel (and possibly other programs playing stupid tricks with mtime), be very afraid… and apply the above fix ASAP, and STRONGLY suspect that all your backups made with restic <= 0.9.5 are ‘corrupted’ in the sense of not accurately containing changed files whose mtime was played with.

Hope this helps someone.

Cheers,
– Durval.

PS: I think the seriousness of this goes beyond just driving crazy someone that checks everything (ie, SHA checksum for restored files) like me, as changes that should have been backed up are being missed – if anyone needs to recover one of these files from backup (ie, operator error, disaster recovery, etc) he/she will, without any warning, just get an older, outdated file :-/ and in such a scenario, ie after the original file is lost, it is lost forever: there is obviously no way to recover it from a restic backup :frowning:

4 Likes

Any update on this? How critical is this bug? Will there be a hot fix pushed to restic?

If you are worried that this bug affects you, you can use the binaries from beta.restic.net, which are automatically build from master and have the latest commits in it.

Personally I think that this isn’t too critical as the cases in which programs mess around with mtime should be rare (and in case of @durval the content of the files wasn’t even changed). I can live with changes in metadata not being picked up aswell. This isn’t critical for me but YMMV.

Hello @Den, @764287:

[1] Only yours so far, as per this topic’s historic :slight_smile:
[2] Depends on your level of paran^H^H^H^H^Hdata integrity concerns :wink: Mine is high, so it’s critical for me; but anyone who considers that mtime should be obeyed (and in the specific case I’ve detected, that applications resetting mtime to “hide” file changes should have their way) could perhaps find it not so critical.
[3] @764287’s suggestion to use a later beta is not bad, but it will certainly incorporate more changes than strictly necessary; a ‘hotfix’ with exactly 0.9.5 plus #2212 would be best IMHO.

I beg to differ: as I explained further in the linked topic, the files’ contents changed: “one’s data had 13 bytes changed to apparently random values, while the other had 8 bytes zeroed out”. What didn’t change was its “visible” content, ie what MS Excel showed on the screen.

And that was in a small subset (less than 0.5% of the total) of the entire backup, which was what I managed to restore and throughly verify so far – and just one application; who knows what else could be lurking?

The fact is that the correct metadata to check in order to detect changes to data, on *ix systems at least, is ctime – I’m really happy restic is now doing it that way after #2212, and I reinforce my recommendation to everyone with a high level of concern regarding data integrity to start using a version incorporating that ASAP, and to be wary of old backups done with older versions.

Cheers,
– Durval.

I really don’t think it’s a critical bug, if it can be called a bug at all.
The thing is, no amount of metadata check can truly track the actual data.

Software can mess with all metadata if it wants to, including all of mtime, ctime, atime.
In all cases I know of, (non-malicious) software that messes with mtime does it for good reasons.

  • In your case, because the actual user-facing data of MS docs wasn’t changed, and the changed bits don’t affect users.
  • Some sync tools do it to avoid unnecessary/superfluous syncing, which restic can also do without.
  • VeraCrypt has an explicit option that makes it keep all metadata unchanged, specifically to hide data changes. If you’re using this option, you know what’s going on, and tracking ctime doesn’t help either.

If you’re paranoid, or worry about software modifying metadata maliciously, then you should use the --force flag for backups. It’s very slow, that’s why it’s not the default, but it’s the only option that can guarantee all changes will be caught.

1 Like

As I said, it all depends on your level of concern for data integrity; if your own needs, or of the place you work at, can accommodate applications changing data unnoticed (and restic failing to back them up), then I say, more power to you. But this is definitely not the case here: defending such a view where I work would probably get you fired on the spot.

Software can mess with all metadata if it wants to, including all of mtime, ctime, atime.

This is definitely not the case with ctime in *ix OSes: ctime is considered out of limits for any userland code, and so cannot be set by any application, only by the OS when any data or metadata for the respective file is changed. Please see here for Linux (notice how only mtime and atime are referenced, and ctime is simply ignored) and here for the Posix standard (I quote: “The utimes () function shall set the access and modification times of the file” – please note how ctime is not mentioned).

If you’re paranoid, or worry about software modifying metadata maliciously, then you should use the --force flag for backups.

Nope, please see above. Unless you suspect your kernel has been hacked, there’s no way any software can mess with ctime.

Let me say it again: the correct procedure to detect changed data is checking ctime. This is consensus, and part not only of the actual OS implementations but also of the standards as I demonstrated above.

If you disagree, fine by me – you are entitled to your own opinion and to have your own way when dealing with your own data. But I want to get the record straight here so that whoever comes in the future a-googling will not be misled.

Cheers,
– Durval.

1 Like

[PSA] ATTENTION anyone not using a recent master (ie, > 0.9.5) version…

The fix for this (#2212) is not included in the official relase 0.9.6 as far as I understand.

In case I don’t miss something, I’d suggest to modify the title of this topic if possible. Otherwise “> 0.9.5” could be misleading from my point of view, even though it might be correct in conjunction with the term “master”. (I am not really sure about the term “master”)

What makes you think it isn’t? The issue is tracked as #2179, #2212 was only the pull request which resolved it.

At first I was very sure you’re wrong: code that has been commited to master since 0.9.5 was surely included in 0.9.6, I just tag the new version from master and there’s no process to selectively include/exclude commits from there. So it must be included.

But then I checked the changelog: It does mention #2179, but in the section for 0.9.5. But @durval wrote in this thread and the title that the fix is not included in 0.9.5. Hm. Looking at the history for CHANGELOG.md shows the commit for 0.9.6 inserted the fix into the section for 0.9.5. That’s odd for sure!

So, let’s dig in. The pull request ID was #2212, so we can use git log -p and grep for 2212, finding the merge commit (a6481b37072dc4878bf845de77ed67406c59905d) which added the code to master:

Indeed we can verify that (only) 0.9.6 contains this commit:

$ git tag -l --contains a6481b37072dc4878bf845de77ed67406c59905d
v0.9.6

It looks like git’s merge resolve code decided that the directory changelog/unreleased was renamed to changelog/0.9.5_2019-04-23 (which it was!) and put the file into that directory instead of the unreleased one.

So, long story short: the fix is included in 0.9.6, and I’ll fix the changelog. Sigh. This will hopefully not happen again, with https://github.com/restic/restic/issues/2485 we ensure the unreleased folder exists and is never empty.

3 Likes

This was my conclusion after browsing through the changelog on GitHub as it showed about an hour before I posted my reply.

That is good news. I’m quite happy about that! Thanks for digging in, clarifying the release process and fixing the changelog.

As far as I’ve read ctime on Windows means creation time (as opposed to change time = ctime on Linux). What timestamp(s) is/are used by restic for Windows now to detect file changes?

2 Likes

Based on the code, only the modification time and metadata change time are used (as well as several other non-time pieces of metadata):

1 Like

@cdhowie: Thanks for the information.

@durval: What do you mean by “moving around the file”? I’d like to examine if there’s a similar issue when using Windows and NTFS.