Linux box crashes when plugging in SATA drive
I have a somewhat interesting problem that's manifested in the past few weeks. I have a Linux box with an Intel D975XBX motherboard and a Conroe C2D. It has a Marvell-based Highpoint PCI-e RAID controller and an old NVIDIA GeForce PCI card for console viewing when needed. It's running Ubuntu 12.04 with stock 32-bit kernel version 3.2.0-35.
I've always been able to hot-plug SATA drives to the motherboard's SATA ports (not the onboard SiI ports which I've disabled, but the actual SATA ports), but I've been having problems recently. Some drives will completely crash the system when I plug them in. The system crashes to the point where the console blanks, networking becomes unresponsive and the HDD access light is solid; SysRQ+REISUB doesn't work, and the only solution is to pull the plug.
I tried to trigger the error with a SATA drive today, and received an error similar to this one in
Dec 30 11:10:15 panorama kernel: [97383.660485] Clocksource tsc unstable (delta = 4686838547 ns)
A few more lines about the ATA subsystem recognizing the drive and adding the block device scroll by, then the system completely locks up. Interestingly, it's not all drives that cause this issue; I can't seem to find a pattern. It's definitely per-drive though, as some will never crash and some will always crash (those never switch).
After some preliminary Googling, I've added
notsc clocksource=acpi_pm to GRUB (the only other supported clocksource on my system) but haven't tested if I can reproduce the problem (all my spare drives are tied up at the moment). What I find interesting is that this "tsc unstable" message only ever appears when I hotplug a SATA drive, and even then I can't get it to appear reliably.
My question is this: is this some bug in the Linux kernel, has something on my motherboard become unstable, or is it something totally different? My system doesn't seem to have any trouble keeping time (then again, I have ntpd running so it could be masking the problem), so I don't even know if these two issues are related. I'd appreciate any insight anybody has.
Edit: Something I should probably add is that one drive in particular had the same effect on the system but I didn't plug it in to the SATA port, just power. Yes. Just power. I sent the drive back to WD for RMA but I found it as puzzling as you probably do right now. Sometimes I wonder if it's a problem with voltage drops/spikes and lack of an adequate power supply; I used to have 12 drives (2 WD blue, 8 WD black, 2 Seagate IDE), some running off SATA power and some running off Molex to SATA power adapters. I only have 10 now (2 WD blacks have died on me). The power supply doesn't have an obvious wattage rating on it (pulled from an old Alienware computer) but is fairly large and has plentiful connectors, so I assume it's adequate. Again, I don't know for sure, I'm just searching for theories.
Well hot-plugging has worked reliably on this system since I bought it, so I'm inquiring about the sudden loss of this ability. I never experienced a single hot-plugging problem before a couple of weeks ago, so I'm wondering why problems would've suddenly started.
There are so many things that can go wrong during hot-plugging -- starting at the physical level -- that I won't even bother to go into details.
There is a very good reason why hot-plug trays and enclosures were invented.
BTW, until last week, I had no problem with hot-plugging a debug connector to the device I work on, but then one time I managed to burn up both male and female connectors -- several neighboring traces disappeared without a trace.... --
Wacky Races 2012!