dslreports logo
site
 
    All Forums Hot Topics Gallery
spc

spacer




how-to block ads


Search Topic:
uniqs
907
share rss forum feed


ds5v50

join:2003-01-22
Fremont, OH

2 edits

Hard drives or raid card? Updated!

I have recently started getting some drive errors on my Dell Poweredge 1600.

WARNING:  Kernel Errors Present
            res 51/0c:4f:b1:39:1d/00:00:00:00:00/e0 Emask 0x10 (ATA bus error) ...:  1 Time(s)
    ata2.00: error: { ABRT } ...:  1 Time(s)
    ata2: SError: { UnrecovData D ...:  1 Time(s)
 
 1 Time(s): ata2.00: cmd c8/00:80:80:39:1d/00:00:00:00:00/e0 tag 0 dma 65536 in
 1 Time(s): ata2.00: configured for UDMA/133
 1 Time(s): ata2.00: exception Emask 0x10 SAct 0x0 SErr 0x100100 action 0x6
 1 Time(s): ata2.00: failed command: READ DMA
 1 Time(s): ata2.00: port_status 0x20200000
 1 Time(s): ata2.00: status: { DRDY ERR }
 1 Time(s): ata2: EH complete
 1 Time(s): ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
 1 Time(s): ata2: hard resetting link
 

I have 2 WD 80gig sata drives hooked to a fasttrack pci controller.
00:04.0 RAID bus controller: Promise Technology, Inc. PDC20378 (FastTrak 378/SATA 378) (rev 02)
 
 

I'm not verse on hardware jargon, So can anyone give me a quick diagnostic as to whether it is the controller or one of the drives spewing errors. Just a side note this is on CentOS 6.3 x64. And it has taken the machine down once.

TIA

Edit: I want to add, These errors can be reproduced by doing large file transfers, or putting a load on the card/drives.

--
Fremont, Ohio Weather
»www.Fremont-OH-weather.com


DeHackEd
Bill Ate Tux's Rocket

join:2000-12-07

Re: Hard drives or raid card?

I'm not versed in that model of Dell, but I'm a bit suspicious by the kernel output. It's not a perfect solution, but you could try this:

smartctl -a /dev/sda

(Also try /dev/sdb if it exists)

If it gives back SMART data and can name the actual physical drive, you have a physical hard drive problem. And possibly a diagnostic as well from the SMART output.

The bad news is failure to get SMART data doesn't necessarily mean you have a RAID card in the way causing issues.
--
That's odd...

mich

join:2008-08-30
reply to ds5v50
I've seen similar link resets caused simply by poorly attached cable. If SMART indicates high count of "UDMA CRC errors", it's likely some problem with the cable.


CQCQCQ

@pnap.net
reply to ds5v50
Yep, check cabling or the introduction of an RF source. If I key up my 2 meter rig I can cause the exact same error with less than 35W of RF around 145 MHz.


ds5v50

join:2003-01-22
Fremont, OH
reply to ds5v50
Thanks everyone for the suggestions. smart checking is telling me no errors on the drives. I will check the cables and report back. This might be a sign telling me not to buy used controller cards at HamFest's. Thanks.


Brano
I hate Vogons
Premium,MVM
join:2002-06-25
Burlington, ON
kudos:14
It's rarely the controller. It's the HDD or the cable most of the time.


ds5v50

join:2003-01-22
Fremont, OH
reply to ds5v50

Re: Hard drives or raid card? Update!

As a follow up. Indeed it is one of the 80gb WD drives that is spewing errors. Looks like it is time to replace some storage.

In the error it was pointing to ata2, as I look through boot log I find ata2 is indeed one of the 80gb drives and not the controller. Thanks for all the ideas .

mich

join:2008-08-30
Of course ata2 is one of the HDDs, yet this still doesn't tell you what exactly is wrong.

If you are already looking for some reason to upgrade your HDDs and/or SATA controller, go ahead. However, your problem may likely be caused by bent cable (SATA hates this), connector going loose, RF interference or similar issues.


ds5v50

join:2003-01-22
Fremont, OH
I have swapped out the cables. I have not seen the error yet. But the error is not a normal thing. I might go 2 weeks without seeing it. I have my fingers crossed that the cable swaps fix it. If not I'll just have to remedy the issue.

Thanks.


koitsu
Premium,MVM
join:2002-07-16
Mountain View, CA
kudos:23
I would strongly recommend doing what DeHackEd See Profile recommended (preferably with smartmontools 6.0, not something older, and using -x not -a). Getting SMART attributes from the drive will allow me to do a quick review/analysis of the drive and tell you what to look for.

If you're using the FastTrak 378's native RAID functionality, that may explain why you don't have a /dev/sdX entry for the drive (that controller is extremely old and probably doesn't offer passthrough capability, so you would have to pull the disk off the controller and hook it up to a non-RAID controller temporarily).

"Swapping cables" sometimes helps only temporarily, when there is things like dust/debris in the SATA connectors or bad plating contacts on the SATA data port connector on the drive PCB itself -- but that problem manifests itself in a specific way (in SMART), so you have to have familarity with how to read the attributes to know what the problem may be. I've seen people report cable-swapping "solves their problem!!!" only to see a follow-up a week later complaining that it's still happening, where the root cause turned out to be excessive interference/noise inside of a case combined with not-so-well-shielded cables.

--
Making life hard for others since 1977.
I speak for myself and not my employer/affiliates of my employer.