I am using some old hard drives that are reporting SMART errors already. If one of them has a bad sector I run into 2 issues.
- If the storage node would like to read the bad sector my system will get unresponsible with 100% IO WAIT. I am unable to cancel it. I am unable to open an SSH connection. I am unable to restart my machine. Is there a way to tell my system that it should just unmount the bad drive instead of trying to read from it over and over again? Or can I make sure that at least a remote SSH connection is possible and some kind of emergency reboot? I want to be able to solve the issue without having to be at home.
- If I am lucky I can restart the machine a short time before the IO WAIT gets out of hand. The next issue is that on startup my machine wants to mount all ZFS drives including the bad one. If it can’t mount the bad drive the system will never finish booting. Only solution is a try and error session with a few hard resets and removing different SATA cable to figure out which of the hard drive the issue has. Is there a trick to tell ZFS to just mount the drives it can mount and let the bad drives error out? I want to be able to always restart my machine and worry about the bad hard drives later.