Input/output error when reading from an external hard drive : fail-safe mode?

KillianKemps · November 25, 2022, 7:46pm

Hello,

I use Restic 13.1 to make backups from a laptop to an external hard drive. Since some days it is not possible to do backups because of an input/output error:

List(index) returned error, retrying after 4.506218855s: lstat /run/media/myuser/MyHarddrive/restic-repository/index/39da33a8b6f28929b3c3f5d3e3e5d2de171a37bf44e5950e4839f4201f8ae621: input/output error

The restic snapshots command is the only one working. It successfully displays the list of snapshots. However, the backup, restore, check and rebuild-index commands all fail with the same above input/output error being displayed in loop.

The external hard disk on which the backups are made has no S.M.A.R.T statistics, but the filesystem seemed okay according to Gnome’s disk utility. Nevertheless, when I tried to check the files by myself I got a similar error:

$ ls "/run/media/myuser/MyHarddrive/restic-repository/index/"                                                                
ls: cannot access '/run/media/myuser/MyHarddrive/restic-repository/index/39da33a8b6f28929b3c3f5d3e3e5d2de171a37bf44e5950e4839f4201f8ae621': Input/output error
ls: cannot access '/run/media/myuser/MyHarddrive/restic-repository/index/a6d9abf60d471a91108d75f49c41d04597b902caeea108ac821b8561927b69ab': Input/output error
ls: cannot access '/run/media/myuser/MyHarddrive/restic-repository/index/d2fa0ff862a42838ba2abfd7a2d19a8153022073643617b300f08910bae1aae5': Input/output error
ls: cannot access '/run/media/myuser/MyHarddrive/restic-repository/index/dc2fb6ada2bd3f692566183b22a8ce4ac4dc25c5f1156fcd25b370960bb0cca1': No such file or directory

So, maybe there is still an issue with the hard drive.

My question now is: is there no way for Restic to allow for a “fail-safe mode” allowing to retrieve at least some data? Hopefully my system is still okay, so I began my backups on another support, but if I had to retrieve the backups from this external hard drive, it seems I would have lost everything.

rawtaz · November 25, 2022, 8:06pm

Restic uses standard disk I/O operations and methods, so if it asks to read something and gets back from the operating system that the read isn’t possible, then that is pretty much what it is. If restic can’t read the files it needs to read to access your backups, I’m not sure what fail-safe mode can help with that It sure sounds like your disk or something else along the way is acting up.

Can you make a completely new backup of your system to another place and be back on track? Or are you saying that there is something in this particular repository that you need to restore?

fd0 · November 26, 2022, 8:12am

I’m quite sure there’s something wrong with the drive and/or the file system. You can check using sha256sum (without involving restic at all):

sha256sum /run/media/myuser/MyHarddrive/restic-repository/index/39da33a8b6f28929b3c3f5d3e3e5d2de171a37bf44e5950e4839f4201f8ae621

It should give you back a hash that is equal to the file name.

KillianKemps · November 26, 2022, 11:17am

Yes it seems there is an issue with the drive as this command also fails:

sha256sum /run/media/myuser/MyHarddrive/restic-repository/index/39da33a8b6f28929b3c3f5d3e3e5d2de171a37bf44e5950e4839f4201f8ae621: Input/output error

But, the same command on other index files are working as intended.

However my point is that the drive is having a partial failure and I hoped I could partially recover some files from Restic.

The snapshots command works successfully:

$ ./restic_0.13.1_linux_amd64 snapshots
repository fd7fed28 opened successfully, password is correct
ID        Time                 Host        Tags        Paths
-------------------------------------------------------------------------------------------
ffdf7a9d  2019-12-22 18:11:34  myhost                 /home/myuser/.config/aliases
[...]
-------------------------------------------------------------------------------------------
16 snapshots

And on the same drive I’m having regular files which I can access without an issue. All files I tested manually were not corrupted. (It is maybe good to specify that the external hard drive is formatted as a NTFS filesystem, but accessed by my Linux computer).

The feeling I have with Restic is that one or a a few index files may have an issue and it blocks me of accessing the entirety of my backed up files. While, if I only did a simple copy of the files to the hard drive, I would maybe have lost some corrupted files, but still being able to recover a number.

In my current case there is no issue as I could continue to do my backups on another support and did not need to recover the backups from this hard drive. However, I’m using Restic for my servers too and I’m wondering if it is a good idea to continue to use Restic if only one failure on a disk is capable to break the whole Restic repository. I’m daily monitoring the server’s disks in case of failures, but if already one failure is too much to recover data from Restic, I will never have time to migrate the data to some other drive.

fd0 · November 26, 2022, 4:43pm

So, there’s an issue with the drive, most likely.

The index files are not that critical for restic, you can recreate them from the files in data/ by running restic rebuild-index. When other files (which contain data or metadata of backups) are damaged, then you will not be able to restore this data.

At the moment, restic is built to detect this kind of corruption, but it is not always possible to recover from such issues by itself.

Yes, it’s a great idea! After all, you just uncovered faulty hardware which you may not have been found otherwise. And you’ve even discovered that before you needed your backups!

In order to get an idea how much data is corrupted you can run sha256sum on all the files in the repo. Apart from the config file, every file has its SHA256 hash as the filename.

Good luck!

bazinga · November 26, 2022, 10:37pm

Also, regardless of the tool you choose, you should not count on a single backup repository. A common idea is to have at least one on a different device, plus another one offsite and offline.