Restic success story đź‘Ť

Haha, so here’s another hardware problem that Restic both helped detect and helped recover from.

With the new local-cache code, I decided to put my XDG_CACHE_HOME in a folder on a USB drive, because my Mac’s boot drive is a fairly small SSD.

Having had a few aborted remote backups over the past weeks, I decided to do a reindex, check and prune of my remote with the spiffy new cache code. I expected prune to be fast, but it was running slower than ever (18+ hours for ~19% completion). And worse, it seemed to be making the USB drive thrash like mad! Like, audibly vibrate.

I thought maybe restic was thrashing the disk because the cache was on the same drive with some of my backed up data. But thinking it through, “prune” probably doesn’t need to read data from the backed-up paths. It seemed unlikely that the cache code could by itself thrash the disk, though I guess that’s possible if the OS did a bad job of spreading files around the platters.

I started checking around with lsof, Activity Monitor, etc to see what else was hitting the disk. Finally I checked my logs to see them filled with disk3s2: I/O error. Sure enough, the drive enclosure now has a sad blinking light on it, and the disk won’t even mount. The (1 month old!) drive has failed.

Apparently putting the cache data on the disk was enough to exercise some bad component in the drive to the point of total failure. I’m glad it happened under these circumstances (rather than say, slow & silent bit-rot).

In addition to the XDG_CACHE_HOME dir, I also had some real important data on that drive. Since it was regularly backed up by restic (to another local disk, as well as remotely), I have successfully restored it.

Thank you, Alexander & contributors!

2 Likes

Very nice, that makes about four cases where restic detected broken and/or defective hardware :slight_smile:

1 Like

By brute forcing the system into submission :wink:

1 Like