Restore missing files after filesystem failure

ygoe · June 18, 2021, 8:04am

My server had issues with a corrupt filesystem. I’m still investigating this but as I see it now, several files are missing. They’re spread across the entire filesystem. Now I want to restore exactly those files, nothing else. How would I do that?

I can only restore an entire snapshot or certain subtrees of it, or single files. None of those solves my problem in acceptable time and effort. Anything that already exists on the server should not be touched.

The computer is running Windows 10.

If restic cannot do this, I have to consider deleting everything and doing a full restore from the latest snapshot (including manually restoring all ACLs from existing documentation). This will however lose all changes made since the last snapshot a week ago. It also cannot be done remotely (internet is too slow) so I’ll have to go to the server and do that locally.

Update: I managed to create a list of all files to restore. The list contains full paths and has 130 lines. Can I pass that file to restic to restore exactly these files?

nicnab · June 18, 2021, 9:43am

Technically this would be very simple: you can use restic mount to make the snapshot available in your system and then rsync to the corrupt filesystem. The real problem is probably that this would likely overwrite files that have a newer current version and would probably leave the system in an inconsistant state. However, with restic mount you can, of course, use your list of files to restore as well!

betatester77 · June 18, 2021, 10:48am

I thought about giving that answer too, but he is using windows. Looks like at the moment it’s not that easy to do a restic mount on windows, isn’t it?

ygoe · June 18, 2021, 10:55am

The documentation says it’s not possible. I’ve seen that section, too, but then discarded that option.

nicnab · June 18, 2021, 11:20am

Oh. My bad, I’m sorry! What about using the WSL to do it? I might have a slight know how deficit regarding Windows but I hear WSL can do great things!

ygoe · June 18, 2021, 1:39pm

Sorry, I know next to nothing about WSL. It seems to be more like a VM than a real part of Windows. But then again, it can’t even access Linux filesystems. If I run a restic mount in WSL, I’m not even sure I could access if from Windows.

Meanwhile I helped myself by comparing the full file lists (created with Windows and Restic, resp.) and manually restoring the affected directories to my local machine, then using a compare tool (Beyond Compare) to copy the missing files back to the server.

Of course it would be nice if Restic could implement the mount feature on Windows, too. It sure is possible, look at VeraCrypt (f.k.a. TrueCrypt) for example. Or even support some kind of “my filesystem had certain files for dinner” restore mode. A GUI would probably help with that so I could preview all missing files and select which of them to restore.

rawtaz · June 18, 2021, 1:54pm

You could have just taken the list of paths and run a few restic restore invocations with a number of -i, --include pattern : include a pattern, exclude everything else (can be specified multiple times) options to each of them. Whatever is easiest

Indeed working mount on Windows would easier for you, but for now the KISS principle applies

ecki · June 18, 2021, 5:51pm

What would be nice would be a “shasum export” from a snapshot (or even better from the cache?). Then you can run the local checksum compare (or existence checks) to get a list of files. Those can then be scripted top be restored (to a staging folder). But more importantly it allows to review what would be restored before doing so.

torfason · June 22, 2021, 4:42pm

For corrupt files, one could run:

restic backup <working_directory> --parent=<parent> -vvv | grep ^modified

This would give a list of all files present in the working directory that are different from the file in the last known good parent snapshot (one could then script a restore to restore just those files). So that would be close to the check-sum restore.

restic backup -vvv does not seem to report files that are present in the snapshot but missing in the working directory. That could be useful to have, but I’m not sure if that would be complicated in terms of implementing it.

ygoe · June 23, 2021, 8:38am

I’m not sure what a backup command would help me in restoring files.

Anyway I’ve already done a full restore of the last good backup in a separate directory and compared all files (including content) with Beyond Compare. It detected a few differences, mostly large MP4 files but also a few XLSX/PPTX, where several bytes were different. The file modification time and file size was identical, so if Restic only considers this data to find modified files, it wouldn’t have seen anything.

There were about 130 files and directories (counted together) missing on the disk and about 10 or 20 files altered. I’ve done a chkdsk (which had deleted many files), extensive SMART selftests (which found no errors) and a RAM test (also no errors) and I’m watching the Windows event log for filesystem errors now. So far, nothing has happened anymore.

torfason · June 23, 2021, 10:40am

I’m not sure what a backup command would help me in restoring files.

Well, the idea was to use it to get a list of mismatches which would then be restored. But you are right that with a parent, modification relies on ther being differenences in time, file size, inode or something like that, so bit corruption would probably not have been detected.

Sounds like you found the best available solution.

ecki · June 25, 2021, 3:35pm

I think it would be really good if there is a restic command to “export” a snapshot (without reading all remote files) into a local sha2sum file.

Is a traditional sha2 content checksum part of the snapshot meta data and/or the local cache or only rolling chunk checksums?

torfason · June 25, 2021, 4:48pm

Probably the best solution for getting checksums would be if a checksum could be added to the output of:

restic ls <snapshot_id> --long and/or
restic ls <snapshot_id> --json

Those already contain quite a bit of info about files. If the checksum can be retrieved without reading the data itself it would make sense to return it.

rawtaz · June 25, 2021, 5:17pm

What @torfason said, that’s the place to add the SHA256 sums if that is what @ecki means (it’s a bit unclear what he wants out of that “export”, and what he means with “sha2”). Here’s an example of what’s currently output by restic --json ls 37b159d6:

{"name":"blah.txt","type":"file","path":"/foo2/blah.txt","uid":502,"gid":20,"mode":420,"mtime":"2021-04-25T22:12:19.918905+02:00","atime":"2021-04-25T22:12:19.918905+02:00","ctime":"2021-04-25T22:12:19.918936892+02:00","struct_type":"node"}

ecki · June 25, 2021, 5:30pm

i mean whatever general purpose cryptographically strong checksum restic has available over the complete file content (to be checked with sha256sum or similar tools against the existing files).

MichaelEischer · June 26, 2021, 7:38pm

The repository data only contains sha256 hash of file chunks, but no sha256 hash of the whole file. Calculation that hash won’t work without reading the whole file from the repository (in which case you could just restore to a temporary location and run shasum afterwards). See capture and validate backed up files checksums · Issue #1620 · restic/restic · GitHub for a feature request regarding a full file hash.

But I wonder whether it’s really necessary to provide full file hashes. It would probably be sufficient to let restic do the checks itself, using some sort of diff against local files.

That said what would also work is to create a new backup --force and then use the diff command to look for unexpected changes.

ygoe · July 10, 2021, 8:16am

So the restic metadata only has sha2 sums for chunks, but it also should know what regions of a file a chunk covers, right? So it should be possible to compare the sha2 sums with the existing local files. I mean if the file size differs, a change is obvious, but for the same file size, the chunks can be compared individually.

The then available information about which of the chunks of a file differs is probably not of interest, just that there is a difference somewhere in a file.

torfason · July 11, 2021, 9:04am

Yes, that’s á good point. It would not allow comparisons with local files without á full implementation of restic chunking logic (Thinking of using the hashes in an external script). But across snapshots that comparison should work well.

torfason · August 23, 2021, 2:06pm

Looking at this thread again it seems like it was never fully resolved, since all the proposed solutions were dead ends for generating a list of the missing/corrupt files without downloading the full repository contents. The solution that does work was suggested by @alexweiss in another thread:

Essentially, the solution is to run:

restic backup --force
restic diff -v <known_good_snapshot> <last_snapshot>

Any missing or changed files can then be restored (using a series of --include flags with restic restore)

jatwell · March 3, 2024, 11:38pm

What does restic use to determine if a file is modified/changed?

It seems to only be modified date and/or size, correct? Actually, just found this: Backing up — restic 0.16.4 documentation (This is a Windows box).

I have a use case and recently found that changes to an encrypted VeraCrypt file were not getting backed up. Seems if you mount a file with VeraCrypt and make changes then unmount the file - the modified date nor the file size changes, but the file hash definitely changes.

I did test using --force and then diff against a previous backup shows 1 changed file, but seems like my only option is to use --force every time I backup to catch changes with this one VeraCrypt file.