I’m just testing my disaster recovery plan. And I found something which is rather terrifying.
So first of all, how is my backup organized: My backup is organized the following way: I do a snapshot for each of the following lines:
/home/me/Clouds/Onedrive (Mounted with rclone) (A Cloud is not a backup.)
For each partition I want to backup I do a own snapshot.
So today I wanted to test to restore my backup saved on B2 for onedrive to a Hardrive(NTFS). So I changed the folder there and typed source b2_login.sh followed by restic restore SNAPSHOTID --target Onedrive_Test_Restore and then it started working and it is still restoring successfully many files. But then the following error came:
ignoring error for ORIGNALPATH1: mkdir /run/media/me/Volume/OneDrive_Test_restore/ORIGINALPATH1 : invalid argument
ignoring error ORIGINALPATH2: open /run/media/me/Volume/OneDrive_Test_restore/ORIGINALPATH2 : invalid argument
ignoring error for ORIGNIALPATH: UtimesNano: no such file or directory.
So panic starts to rise and I checked those files mentioned: They were NOT restored. So I tried the mentioned command (mkdir ...) and it works if I use mkdir with quotes(spaces in file path). During the shortened time ([...])
Doing a bit more research, I found out, that I can restore some of those files using mount. (Tested two of 258, so the assumption is that I can probably restore all)
So I’ve checked some ideas: All affected files have an “ä”, “ö” or an “ü” (or other special caracters) in their filename/directory name. But other files with the same attributes did restore. Therefore I thought, that maybe the full filename is to long, but I can copy those files where they shall be.
The final component (“basename”) of the new directory’s pathname is invalid (e.g., it contains characters not permitted by the underlying filesystem).
That sounds like there are some problems with unicode conversion. Are you able to restore these files to a different filesystem (like ext4/btrfs/zfs)? restic just uses the filenames exactly as they were stored on the source filesystem.
afaik NTFS is only able to store filename that are valid Unicode (probably UTF-16 as it’s used in the Windows API). Filesystems on Linux on the other hand don’t have such a restriction, they’ll basically store just about every filename. Actually, from the point of the filesystem the filename is just a bunch of bytes. It’s only the userspace programs which interpret these bytes as characters and nowadays most software defaults to UTF-8 (hopefully).
When storing such a filename on an NTFS formatted disk, then the filesystem driver has to handle the charset conversion from UTF-8 to whatever is required by NTFS. I hope that the driver expects UTF-8 as input and converts it to the proper NTFS format. If some configuration is wrong there, that could cause these problems.
Or the filenames in the backup are not valid UTF-8 to begin with. You could try and check what the filename encoding looks like using ls filename | xxd, if it’s UTF-8 then the umlauts should be encoded as two bytes.