dslreports logo
 
    All Forums Hot Topics Gallery
spc

spacer




how-to block ads


Search Topic:
uniqs
34
share rss forum feed

rockisland
Premium
join:2008-12-15
Friday Harbor, WA
reply to koitsu

Re: Bad Hard Drive(s) / Raid Array

Click for full size
Click for full size
Click for full size
OK - gave up altogether on the USB connection - the results were no better when hooked to a USB port on the motherboard.

Dug out the eSATA cable and attached the enclosure that way and got what looks to me like similar results. Perhaps the images will tell you something useful.


koitsu
Premium,MVM
join:2002-07-16
Mountain View, CA
kudos:23
Cute. That indicates to me the underlying drive is going completely catatonic during I/O operations (with a 30 second timeout), resulting in AHCI (SATA) controller timeouts. The AHCI driver tries to reset the AHCI port -- which works, except the underlying device attached to the port never responds ("device not ready"). The drive then later falls off the bus completely, which causes dd to fail.

I believe there were some AHCI-related bugs in FreeBSD 9.0 with certain models of controllers, but I don't think these are responsible for this problem (in fact I'm about 98% certain they're not).

Something is just downright buggered with the drive -- it's significantly worse than a single sector. My initial guesses is that the drive firmware itself is wedging/locking up dealing with something internally and not responding to ATA requests.

At this phase, you got one choice (nice and easy): RMA the drive.
--
Making life hard for others since 1977.
I speak for myself and not my employer/affiliates of my employer.

rockisland
Premium
join:2008-12-15
Friday Harbor, WA
Well, thank you for making the effort to solve the issue - I appreciate it.

Drive is out of warranty so I guess I'll be looking for another one.


koitsu
Premium,MVM
join:2002-07-16
Mountain View, CA
kudos:23
If I had a 1.5TB to send you, I'd offer to swap it, just so I could get my hands on that sucker. I always like getting my hands on bad drives, they make interesting test cases + educational material.

Sorry I couldn't be of more help -- had this been a simple single LBA which needed reanalysing, everything would have gone as planned.
--
Making life hard for others since 1977.
I speak for myself and not my employer/affiliates of my employer.


koitsu
Premium,MVM
join:2002-07-16
Mountain View, CA
kudos:23

2 recommendations

downloadwd-wmap41573589.mp3 203,651 bytes
So this is an old thread, but I wanted to follow up on the matter.

rockisland See Profile sent me the drive in question and I received it today. It didn't take me long to encounter issues.

SMART shows the drive in the state shown here:


The first thing I did was attempt to zero the drive, which is the same thing rockisland See Profile attempted to do earlier.

Within a few moments of the zeroing beginning, I started hearing repetitive clicking coming from the drive itself. The clicking sounds like a head that's stuck repeatedly trying to re-read a sector, and not the actuator arms resetting back to track 0. I doubt rockisland See Profile could hear this if the drive was mounted in a chassis or enclosure, but I do all of my testing with bare drives externally attached to a system (literally SATA power and SATA data cable hanging out of the case).

I Ctrl-C'd dd, which of course was blocked for quie some time by the kernel CAM and underlying AHCI layer since it was waiting for an I/O transaction, and the CAM timeout is 30 full seconds. During this time, I started seeing this on the console (which is to be expected):


A general end-user probably can't decode any of this, but it reads quite clearly to me. The drive began experiencing a physical problem of some sort (and is stil experiencing it) causing the entire SATA bus (well, this port) to lock up hard. The reason is that the underlying firmware on the drive, on this model, is apparently designed very poorly -- it does not handle error conditions correctly.

The end result is a drive firmware itself which is stuck in an infinite loop trying to deal with the underlying physical problem (whatever that may be); the SATA controller on the drive appears to be entirely driven by the firmware as well (this explains the deadlock, followed by the drive falling off the bus entirely -- AHCI/SATA protocol should still work despite the underlying drive going catatonic).

As I type this, the drive is still clicking away, and refuses to reappear on the bus because the firmware is downright wedged. It's been literally a full 10 minutes now, which is longer than the total amount of CAM retries and timeouts (5 retries, 30 seconds each).

I tried the god-awful trick of smacking the drive against a flat surface while operational -- this is not something I normally do, but when I hear a drive clicking like this, it's sometimes worth it to see if you can jostle the arm enough that it might unwedge or trip some other condition in the underlying firmware code. Sadly no avail -- I even heard the drive (mechanically) re-set the actuator arm but it went right back to trying to clicking. It's hell bent on trying to read that naughty LBA.

So bottom line is that there's no way to reset this drive without power-cycling it. It flat out refuses to respond to any ATA CDBs once it gets into this indefinite loop, and as such, also stops responding at the SATA protocol level. Pretty awesome; nicely designed firmware! *cough* :P

There isn't much I can do with the drive other than use it as a real-world example of how technology since the days of this drive (circa 2006) have evolved and improved. What's extra amusing is that the WD1500ADFD is a 10,000rpm Raptor drive, which is what WD toted as "reliable and fast and awesome" -- it just goes to show no matter how much money you pay, no matter what "classification" of drive you buy, what ultimately matters is whether or not the programmers of the underlying device firmware actually designed their code sanely and never to get stuck in deadlock/infinite loop situations like this one.

Attached is an .mp3 file recorded from my digital camera of the clicking in question, amplified by 8dB. :-) I do plan on opening this drive while it's in operation to see if there's some visual defect or reason I can detect for its issue.

--
Making life hard for others since 1977.
I speak for myself and not my employer/affiliates of my employer.


norwegian
Premium
join:2005-02-15
Outback
kudos:1
Quite interesting, I have 2 x raptors here of the same ilk (WD1500ADFD). 1 in particular that sounds so close to that problem - used to have them in raid too.

Lovely, at the time I didn't chase up warranty enough, wish I had now, at $300 a pop Aus$ they weren't cheap either.

These were reportedly the best Raptors on the market at the time before the 'Velo' came about.
--
The only thing necessary for the triumph of evil is for good men to do nothing - Edmund Burke