dslreports logo
 
    All Forums Hot Topics Gallery
spc
Search similar:


uniqs
981

ds5v50
join:2003-01-22
Fremont, OH

2 edits

ds5v50

Member

Hard drives or raid card? Updated!

I have recently started getting some drive errors on my Dell Poweredge 1600.

WARNING:  Kernel Errors Present
            res 51/0c:4f:b1:39:1d/00:00:00:00:00/e0 Emask 0x10 (ATA bus error) ...:  1 Time(s)
    ata2.00: error: { ABRT } ...:  1 Time(s)
    ata2: SError: { UnrecovData D ...:  1 Time(s)
 
 1 Time(s): ata2.00: cmd c8/00:80:80:39:1d/00:00:00:00:00/e0 tag 0 dma 65536 in
 1 Time(s): ata2.00: configured for UDMA/133
 1 Time(s): ata2.00: exception Emask 0x10 SAct 0x0 SErr 0x100100 action 0x6
 1 Time(s): ata2.00: failed command: READ DMA
 1 Time(s): ata2.00: port_status 0x20200000
 1 Time(s): ata2.00: status: { DRDY ERR }
 1 Time(s): ata2: EH complete
 1 Time(s): ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
 1 Time(s): ata2: hard resetting link
 

I have 2 WD 80gig sata drives hooked to a fasttrack pci controller.
00:04.0 RAID bus controller: Promise Technology, Inc. PDC20378 (FastTrak 378/SATA 378) (rev 02)
 
 

I'm not verse on hardware jargon, So can anyone give me a quick diagnostic as to whether it is the controller or one of the drives spewing errors. Just a side note this is on CentOS 6.3 x64. And it has taken the machine down once.

TIA

Edit: I want to add, These errors can be reproduced by doing large file transfers, or putting a load on the card/drives.

DeHackEd
Bill Ate Tux's Rocket
join:2000-12-07

DeHackEd

Member

Re: Hard drives or raid card?

I'm not versed in that model of Dell, but I'm a bit suspicious by the kernel output. It's not a perfect solution, but you could try this:

smartctl -a /dev/sda

(Also try /dev/sdb if it exists)

If it gives back SMART data and can name the actual physical drive, you have a physical hard drive problem. And possibly a diagnostic as well from the SMART output.

The bad news is failure to get SMART data doesn't necessarily mean you have a RAID card in the way causing issues.
mich64
join:2008-08-30

mich64 to ds5v50

Member

to ds5v50
I've seen similar link resets caused simply by poorly attached cable. If SMART indicates high count of "UDMA CRC errors", it's likely some problem with the cable.

CQCQCQ
@pnap.net

CQCQCQ to ds5v50

Anon

to ds5v50
Yep, check cabling or the introduction of an RF source. If I key up my 2 meter rig I can cause the exact same error with less than 35W of RF around 145 MHz.

ds5v50
join:2003-01-22
Fremont, OH

ds5v50

Member

Thanks everyone for the suggestions. smart checking is telling me no errors on the drives. I will check the cables and report back. This might be a sign telling me not to buy used controller cards at HamFest's. Thanks.

Brano
I hate Vogons
MVM
join:2002-06-25
Burlington, ON

Brano

MVM

It's rarely the controller. It's the HDD or the cable most of the time.

ds5v50
join:2003-01-22
Fremont, OH

ds5v50

Member

Re: Hard drives or raid card? Update!

As a follow up. Indeed it is one of the 80gb WD drives that is spewing errors. Looks like it is time to replace some storage.

In the error it was pointing to ata2, as I look through boot log I find ata2 is indeed one of the 80gb drives and not the controller. Thanks for all the ideas .
mich64
join:2008-08-30

mich64

Member

Of course ata2 is one of the HDDs, yet this still doesn't tell you what exactly is wrong.

If you are already looking for some reason to upgrade your HDDs and/or SATA controller, go ahead. However, your problem may likely be caused by bent cable (SATA hates this), connector going loose, RF interference or similar issues.

ds5v50
join:2003-01-22
Fremont, OH

ds5v50

Member

I have swapped out the cables. I have not seen the error yet. But the error is not a normal thing. I might go 2 weeks without seeing it. I have my fingers crossed that the cable swaps fix it. If not I'll just have to remedy the issue.

Thanks.

koitsu
MVM
join:2002-07-16
Mountain View, CA
Humax BGW320-500

koitsu

MVM

I would strongly recommend doing what DeHackEd See Profile recommended (preferably with smartmontools 6.0, not something older, and using -x not -a). Getting SMART attributes from the drive will allow me to do a quick review/analysis of the drive and tell you what to look for.

If you're using the FastTrak 378's native RAID functionality, that may explain why you don't have a /dev/sdX entry for the drive (that controller is extremely old and probably doesn't offer passthrough capability, so you would have to pull the disk off the controller and hook it up to a non-RAID controller temporarily).

"Swapping cables" sometimes helps only temporarily, when there is things like dust/debris in the SATA connectors or bad plating contacts on the SATA data port connector on the drive PCB itself -- but that problem manifests itself in a specific way (in SMART), so you have to have familarity with how to read the attributes to know what the problem may be. I've seen people report cable-swapping "solves their problem!!!" only to see a follow-up a week later complaining that it's still happening, where the root cause turned out to be excessive interference/noise inside of a case combined with not-so-well-shielded cables.