dslreports logo
 
    All Forums Hot Topics Gallery
spc
uniqs
54

koitsu
MVM
join:2002-07-16
Mountain View, CA
Humax BGW320-500

koitsu to rockisland

MVM

to rockisland

Re: Bad Hard Drive(s) / Raid Array

ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
...
  9 Power_On_Hours          -O--CK   078   078   000    -    16284
...
197 Current_Pending_Sector  -O--C-   200   200   000    -    1
198 Offline_Uncorrectable   -O--C-   200   200   000    -    1
...
SMART Error Log Version: 1
ATA Error Count: 1
...
Error 1 occurred at disk power-on lifetime: 16270 hours (677 days + 22 hours)
  When the command that caused the error occurred, the device was active or idle
.
  
  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 08 97 6b 9f 40  Error: UNC 8 sectors at LBA = 0x009f6b97 = 10447767
  
  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 01 08 97 6b 9f 41 00      00:00:32.250  READ DMA EXT
  25 01 08 87 dc 9e 41 00      00:00:32.250  READ DMA EXT
  25 01 01 4e ea a7 46 00      00:00:32.250  READ DMA EXT
  25 01 01 4e ea a7 46 00      00:00:32.250  READ DMA EXT
  61 01 00 ee 89 77 41 00      00:00:32.250  WRITE FPDMA QUEUED
 

This clearly indicates a failed read at LBA 10447767. The drive itself detected this condition during a series of 48-bit LBA READ CDBs. The I/O error happened roughly 14 hours ago, and what you see in attributes 197 and 198 are a result of this.

So how do you want to proceed? (See second paragraph)
rockisland
Premium Member
join:2008-12-15
Friday Harbor, WA

rockisland

Premium Member

My question is whether you think the drive is salvageable or will it always be suspect and I'd be better off replacing it. If it's worth a shot I'd give writing to it a try. It can't hurt anything at this point.

Then what to do with the drive with 30 Ultra DMA CRC Errors in 94 hours of use? That seems like too much especially when compared to the other drives with many times the hours of use. That one may actually be under warranty because it was a replacement last year.

koitsu
MVM
join:2002-07-16
Mountain View, CA
Humax BGW320-500

koitsu

MVM

said by rockisland:

My question is whether you think the drive is salvageable or will it always be suspect and I'd be better off replacing it. If it's worth a shot I'd give writing to it a try. It can't hurt anything at this point. :)

From my perspective there's absolutely nothing anomalous about the drive aside from at least 1 sector that may or may not be bad (won't know until a write is issued to the LBA). Your choices here:

1. Zero the entire drive (writing zeros to every LBA). HD Tune Pro can do this via the Erase tab, or you can use whatever other utility you want (CCleaner for example has this feature too). FORMAT will not do this (at least not on XP), nor will Disk Management. Take a screenshot/snapshot of the SMART attributes before and after the drive is erased. I can do the post-analysis from there.

This choice has the advantage of detecting and dealing with any other LBAs/sectors that may cause issues. Meaning: right now you only know of one, but there may be others (on other areas of the drive you haven't used yet).

On the downside, zeroing the entire drive takes a while.

I tend to recommend this method because it's easiest and can also reveal other sectors that may have issues.

I also tend to recommend that after zeroing, you issue a Error Scan (if using HD Tune Pro) of every LBA on the disk (i.e. un-check the Quick checkbox). This takes a while too, but ensures that every LBA is readable before you put the drive back into the array.

2. Issue a write to the individual LBA that the drive has issues with (LBA 10447767). The drive will re-analyse the individual sector and either remap the LBA to a spare or decide the sector is fine and keep the existing mapping.

This has the advantage of being very quick to do (a single write takes milliseconds), and does not require you to have to back up any data from the drive to begin with (latter doesn't apply in your case since it's used for RAID).

On the downside, doing this is tricky and requires familiarity with tools such as dd (I don't trust any other utility) and exactly what arguments to use (messing these up or omitting one can result in the entire drive being zeroed). You also have to read from that individual LBA first -- why? Because I have seen cases where the drive firmware says LBA X while the OS insists LBA X is perfectly fine and it's LBA X+1 which has the issue (don't ask; this is not an off-by-one mistake, this is just downright something bizarre that I've seen reported here).

In general, on RAID arrays where checksumming filesystems are not used (i.e. NTFS, FAT, ext2, ext3, ext4, etc.), I do not recommend this method unless after doing so you immediately tell the RAID management software to nuke the metadata on the disk and rebuild the array entirely with that drive (i.e. treat the now-repaired drive as a new disk). Failure to do this can/will result in one of your files, when read, returning 512 bytes of zeros where there was previously data. What file is also unknown/undetermined. There's nothing you can do about this situation, sadly (think about the situation if it was a standalone, non-RAID disk).

3. RMA the drive (preferably an Advanced RMA, since it ensures you get a replacement drive first, which you can test fully before sending the other drive back).

This has the advantage of being the simplest choice and usually the least painful, i.e. box the drive up and ship it off.

On the downside, Advanced RMA requires that you have a credit card handy (in case they don't receive the bad drive you get charged for the new one, at a significantly increased price), that you have proper shipping materials (anti-static peanuts/foam, anti-static bags, sturdy box, etc.) for the bad drive, and that it takes about a week to get the replacement drive. The other downside is that if you do this over the phone (please try to avoid that) you have to "prove" to the person you speak to that the drive is bad. They also ask you the question "is this drive in a RAID array?" to which you should answer NO. I've ranted about this sneaky/tricky question in a DSLR/BBR post in the past; I can dig it up if you want. Just answer no and move on. Their website, AFAIK, does not ask this question. For the RMA reason, just say "bad sectors".
said by rockisland:

Then what to do with the drive with 30 Ultra DMA CRC Errors in 94 hours of use? That seems like too much especially when compared to the other drives with many times the hours of use. That one may actually be under warranty because it was a replacement last year.

I already answered this. Quote:
said by koitsu:

... If you really did replace the drive 92 hours ago, I recommend waiting until the next array degradation event happens and then see if the CRC error count [has] increased. ...

rockisland
Premium Member
join:2008-12-15
Friday Harbor, WA

rockisland

Premium Member

Not sure what HD Tune did but the drive seems to be really toasted now. The Erase function didn't take very long at all and filled the entire screen with red segments.
The full error scan took seconds and likewise filled the screen with red segments and now the drive no longer shows up in HD Tune.

CCleaner is unusable because the drive doesn't have a drive letter assigned to it.

C:\Users\Martye>smartctl -x -d usbjmicron /dev/sde
smartctl 6.0 2012-10-10 r3643 [x86_64-w64-mingw32-win7-sp1] (sf-6.0-1)
Copyright (C) 2002-12, Bruce Allen, Christian Franke, www.smartmontools.org
 
Smartctl open device: /dev/sde [USB JMicron] failed: \\.\PhysicalDrive4: Open failed, Error=2
 

I think we killed it. :)

koitsu
MVM
join:2002-07-16
Mountain View, CA
Humax BGW320-500

koitsu

MVM

HD Tune Pro didn't "do anything" in that case. The drive is not dead. The drive is in the same condition as before.

The problem you saw many have experienced, including myself -- and to me, this further indicates it's more of a Windows I/O subsystem problem, not an HD Tune Pro problem, because I experienced it with two separate utilities, then without disconnecting/reconnecting the CF card, used a completely different utility which worked fine.

Proof (read, do not skim): »Re: Repair or Replace Disk Warning on Brand New WD Caviar Black.

I can step you through using dd on Windows (as shown in my post, it does work -- link to software) if you'd like. Be aware if you screw this up you can completely destroy all contents of a drive, so you need to be cautious. Start with dd --list and provide the full output here. If the output is multiple pages, please use dd --list > C:\list.txt then open C:\list.txt in Notepad and copy/paste the contents here.

If you aren't sure which drive is the correct one, disconnect the drive (unplug the USB connector), wait 15 seconds, then re-run dd --list and compare the new output to the old. It should become fairly obvious which device is relevant. If it isn't, please attach both outputs (from when the drive is attached, and from when the drive is not attached).

Again: I can help you through this, but you need to be patient.

In general, blame Windows for it's nonsense/bugs/whatever, and the fact that there are not any good utilities of this sort. (I have some others I could recommend but they do stupid things like require you to unplug/replug the device for absolutely no good reason). Using Windows for forensics/repair -- serious PITA.
rockisland
Premium Member
join:2008-12-15
Friday Harbor, WA

2 edits

rockisland

Premium Member

I'm assuming that destroying drive contents is not an issue as this disk is a member of a RAID array. If the disk is erased it should be no different than replacing it with a new drive and letting the array rebuild. We already tried to erase it with HD Tune.

=> dd software- 0.6 beta or 0.5?

I will need you to walk me through (and thank you for offering) as I am absolutely horrible with command prompts.

Addendum: I got 0.5.zip; extracted it hit run on dd.exe and got a command prompt type window. typed in dd --list hit enter and and got nothing except another copy of the text dd --list.

koitsu
MVM
join:2002-07-16
Mountain View, CA
Humax BGW320-500

koitsu

MVM

First paragraph: correct. The whole premise here is to get the drive to either remap the LBA to a new sector (if the sector is determined as bad) or to clear the "suspect" state (i.e. existing sector is fine). That's all we're effectively trying to do.

0.6 beta is fine.

You need to extract dd.exe from the .zip file and place it somewhere (like C: or wherever you want; C:\ makes it easier). Then launch Command Prompt, and navigate to that path by selecting the drive letter and changing into the directory, i.e. for C:\ :

Microsoft Windows XP [Version 5.1.2600]
(C) Copyright 1985-2001 Microsoft Corp.
 
C:\Documents and Settings\jdc>C:
 
C:\Documents and Settings\jdc>cd \
 
C:\>
 

From there run dd --list. You should get output that roughly resembles this:

C:\>dd --list
rawwrite dd for windows version 0.6beta3.
Written by John Newbigin <jn@it.swin.edu.au>
This program is covered by terms of the GPL Version 2.
 
Win32 Available Volume Information
\\.\Volume{03ad2fc1-3b45-11e2-bf5c-806d6172696f}\
  link to \\?\Device\HarddiskVolume1
  fixed media
  Mounted on \\.\c:
 
\\.\Volume{03ad2fc2-3b45-11e2-bf5c-806d6172696f}\
  link to \\?\Device\HarddiskVolume2
  fixed media
  Mounted on \\.\d:
 
\\.\Volume{03ad2fc0-3b45-11e2-bf5c-806d6172696f}\
  link to \\?\Device\CdRom0
  CD-ROM
  Mounted on \\.\e:
 
\\.\Volume{fa47b5c0-3b8c-11e2-a637-806d6172696f}\
  link to \\?\Device\CdRom1
  CD-ROM
  Mounted on \\.\f:
 
NT Block Device Objects
\\?\Device\CdRom0
  size is 6682574848 bytes
\\?\Device\CdRom1
  size is 4347138048 bytes
\\?\Device\Harddisk0\Partition0
  link to \\?\Device\Harddisk0\DR0
  Fixed hard disk media. Block size = 512
  size is 120034123776 bytes
\\?\Device\Harddisk0\Partition1
  link to \\?\Device\HarddiskVolume1
\\?\Device\Harddisk1\Partition0
  link to \\?\Device\Harddisk1\DR1
  Fixed hard disk media. Block size = 512
  size is 1000204886016 bytes
\\?\Device\Harddisk1\Partition1
  link to \\?\Device\HarddiskVolume2
 
Virtual input devices
 /dev/zero   (null data)
 /dev/random (pseudo-random data)
 -           (standard input)
 
Virtual output devices
 -           (standard output)
 /dev/null   (discard the data)
 

This is the output I'm looking for, specifically one for when the drive is attached to the system, and one for when it isn't (to determine what the correct \\?\Device\xxx entry is).

To resize the Command Prompt window, please follow this guide:

»physiology.med.unc.edu/w ··· mpt.html

I see that dd doesn't output to stdout (he must be writing to the buffer directly, for no good reason), so redirecting to a file doesn't work. Sigh. I'll have to mail the author about that -- that is just downright stupid, especially for a utility that's supposed to emulate a *IX system, and I'm going to have choice words with him about that.

For now, to copy the contents of the Command Prompt window, please follow one of these guides:

»www.microsoft.com/resour ··· mfr=true
»www.megaleecher.net/Copy ··· s_Window

Then paste the output into a Notepad window and choose Paste and save the results somewhere (doesn't matter where). Do this once with the drive attached, and once with the drive detached, so you'll have 2 files (duh). Then upload each file here using the Preview/Attach button and let me review the rest.

If all of this is too complex/too annoying/doesn't work, I have another alternative program (GUI-based) that I could step you through, but I've never used it for erasing drives (though I do use some of the author's other software) so I don't know if it would have the same issue as HD Tune Pro or Active@ Kill Disk.
rockisland
Premium Member
join:2008-12-15
Friday Harbor, WA

rockisland

Premium Member

not_attached.txt
3,157 bytes
attached.txt
3,322 bytes
I had dd 0.5 so that is what I used.
Almost nothing is too complex if I have instructions; I'm pretty good at following directions.

txt files attached.