dslreports logo
 
    All Forums Hot Topics Gallery
spc
Search similar:


uniqs
12805

Krisnatharok
PC Builder, Gamer
Premium Member
join:2009-02-11
Earth Orbit

Krisnatharok to Ghastlyone

Premium Member

to Ghastlyone

Re: Repair or Replace Disk Warning on Brand New WD Caviar Black.

I saw all that red and thought it was bad then saw it was the Erase tab. Does error scan pull anything up?

Ghastlyone
Premium Member
join:2009-01-07
Nashville, TN

Ghastlyone

Premium Member

said by Krisnatharok:

I saw all that red and thought it was bad then saw it was the Erase tab. Does error scan pull anything up?

Nothing at all, luckily.

koitsu
MVM
join:2002-07-16
Mountain View, CA
Humax BGW320-500

koitsu to Ghastlyone

MVM

to Ghastlyone
Can you please resize the HD Tune Pro window so I can see all the SMART attributes under the Health tab, rather than just the ones at the top?

For some reason which I can't explain, the Erase method did not work -- this is confirmed not only from the 2500 error count, but notice the erase speed (4GBytes/second -- impossible). My guess is, for some reason, the OS has the drive locked and write requests are being denied. This would also explain why an Error Scan in HD Tune Pro worked fine (those are read requests).

Does this new drive already have a partition on it or something of that nature? Did you have Disk Management or a partition manager also running at the same time you did the erase? Anything like that?

Alternately you can try Active@ KillDisk (version 7, for Windows) and see if that works better for you. But if something has locked access to the disk device (partition/volume manager, WD utilities, etc.) then writes aren't going to work there either.

Ghastlyone
Premium Member
join:2009-01-07
Nashville, TN

Ghastlyone

Premium Member

I'll re run the scans tonight. As far as I know, there is no partition on the drive yet. So I'm not certain why the Erase didn't work.

koitsu
MVM
join:2002-07-16
Mountain View, CA
Humax BGW320-500

3 edits

koitsu

MVM


HD Tune Pro acting stupid

HD Tune Pro acting even more stupid
Me either, unless HD Tune Pro was already running when you plugged in the drive (e.g. you "hot-swapped" it in), in which case re-launching HD Tune Pro should solve that. But somehow I don't think that's what happened.

I can assure you that the Erase feature in HD Tune Pro works since I've used it myself many times over, but Active@ KillDisk provides more granularity -- and if it fails, might actually shed some light on what the error reason is (volume locked, etc.). Windows does so much nonsense with disk drives under-the-hood (ranging from services to stuff the actual kernel does) that its infuriating at times.

Edit: I was able to reproduce the oddity with HD Tune Pro when trying to erase a 8GB CF drive I have. The first attempt actually erased successfully for about the first 1/40th of the drive, then spit out red the rest of the time (claiming something like 2428 errors). A subsequent Erase resulted in all errors (also 2500 error count, just like yours). Possibly the author of this software busted something in the 5.00 release, I'm not sure.

I gave Active@ KillDisk a shot. When erasing the same device, I decided to uncheck "Ignore all errors" just to see what transpired. The CF drive did erase, but intermittently I'd get "device is unable to erase sector xxx" where the sectors seemed somewhat arbitrary:

Error (the device is not ready) writing sector 104580 on Removable Disk 2.
Error (the device is not ready) writing sector 306306 on Removable Disk 2.
 

I re-ran the erase with "Ignore all errors" checked, and it did complete, but with this log message:

Bad (unwritable) sectors detected from 394758 to 15523839 on Removable Disk 2.
 

Complete bullshit. I'm not really sure what these programs are doing, because this is not rocket science to accomplish.

Finally, I resorted to using dd for Windows, which I know interacts with the underlying device using the proper methodology and is no-nonsense:

D:\Util\dd for Win32>dd --list
...
\\?\Device\Harddisk2\Partition0
  link to \\?\Device\Harddisk2\DR18
  Removable media other than floppy. Block size = 512
  size is 7948206080 bytes
...
 
D:\Util\dd for Win32>dd if=/dev/zero of=\\?\Device\Harddisk2\Partition0 bs=64k --progress
rawwrite dd for windows version 0.6beta3.
Written by John Newbigin <jn@it.swin.edu.au>
This program is covered by terms of the GPL Version 2.
 
733,760k
 

...and is still going, with no errors so far. So like I said, I don't know what these other programs are doing, but it's utter nonsense.

Edit #2: dd finished. Not a single REAL I/O error (keep reading and I'll explain what's shown):

7,761,920k Error writing file: 27 The drive cannot find the sector requested
7,761,920k
121281+0 records in
121280+0 records out
 

The "error" shown is actually normal given the flags I gave dd. The 8GB CF drive is 7948206080 bytes, and I requested a read/write size of 64k (65536 bytes). 7948206080 / 65536 = exactly 121280.

The above term "record" isn't magical -- it's simply a counter of how many reads (from the if device, in this case /dev/zero, which is just a pseudo-device that returns zeros) and writes (to the of device) issued.

I did not specify a count directive to say "only read/write this many records". dd tried to go past the end of the drive by 1 record (an extra 65536 bytes), and the underlying drive said "yeah right, bye", hence the records in vs. records out delta of 1. So the drive was fully erased successfully up until that last "bogus" write.

The question is why dd tried to go past the end of the device if it could have calculated its size. Well, this behaviour is documented on the dd for Windows web site:
quote:
On many usb devices this is not reliable so you should use --size to guess the size of the device, see below.

Traditionally when using dd, if you wanted to copy an entire device, you did not specify a block count and dd would read until it reached the end of the device. If you tried to read past the end of the device, the data up to the end of the device would be returned and if you kept reading you would get an error message. Windows however does not always do this so --size will tell dd to figure out the size of the device and make sure it does not read past that point. This is important for USB sticks which stop working if you read past the end of them. This is not on by default because getting the correct size of the device is not always possible. Some devices also keep returning bogus data past the end of the device without returning a suitable error code.

I didn't use the --size parameter, as you can see, so there's the explanation for the final error shown in dd. But the drive did get erased. I even verified using HxD which can "open a raw disk" and let you look at all its bytes/sectors.

So yep, HD Tune Pro's Erase feature, and Active@ KillDisk, are doing something stupid in their code or are just buggy. See what you get for relying on GUI tools? :-) Sigh.

Krisnatharok
PC Builder, Gamer
Premium Member
join:2009-02-11
Earth Orbit

Krisnatharok to koitsu

Premium Member

to koitsu
said by koitsu:

I should explain to you the proper procedure for doing a thorough/correct test, however:

1. Plug in the drive + run HD Tune Pro
2. Pick the drive + go to the Health tab
3. Take a screenshot of the SMART statistics
4. Run an Error Scan -- it's okay to check the "Quick scan" checkbox for this run -- and let it run completely.
5. Back to the Health tab: take a screenshot of the SMART statistics
6. Go to the Erase tab and choose to erase the drive. DO NOT check "Verify". The fill mode should be "Zero fill" (I cannot stress this enough; do not pick any other mode!) -- this will write zeros across the entire drive
7. Back to the Health tab: take a screenshot of the SMART statistics

I can explain the reason behind this method if folks want to know.

If certain SMART statistics between steps 5 and 7 are different (I mainly focus on 0x05, 0xC4, 0xC5, 0xC6, and 0xC7 for WD drives), then you can post the before-and-after shots here and I can provide an analysis for you. There are other attributes to focus on as well, but not all of them. Some will vary with use and are acceptable.

It's important to understand that when reading SMART attributes that you don't just look at the "Data" column (also known as RAW_VALUE) and if it's non-zero assume something is wrong. So many people do that and it's just completely incorrect/invalid. I can explain why if need be, or refer you to past posts of mine where I explain it in further detail.

Koitsu, you should really sticky this post as a "Just bought a new HDD, first steps" thread.

I'm finishing up zeroing a new WD1002FAEX-00Y9A0 (1TB) for my stepson--HD Tune Pro 5.00 is doing fine with zeroing the HDD, and it's returned no errors yet.
bbear2
Premium Member
join:2003-10-06
dot.earth

bbear2 to koitsu

Premium Member

to koitsu

If certain SMART statistics between steps 5 and 7 are different (I mainly focus on 0x05, 0xC4, 0xC5, 0xC6, and 0xC7 for WD drives), then you can post the before-and-after shots here and I can provide an analysis for you. There are other attributes to focus on as well, but not all of them. Some will vary with use and are acceptable.

Which ones do you focus on for Seagate drives?

neonhomer
Dearborn 5-2750
Premium Member
join:2004-01-27
Edgewater, FL

neonhomer to Krisnatharok

Premium Member

to Krisnatharok
said by Krisnatharok:

said by koitsu:

Koitsu, you should really sticky this post as a "Just bought a new HDD, first steps" thread.

Now I'm worried... I have about two weeks or so on my new system, but I installed my OS on a 750GB Seagate that was from my old system. I didn't format it, I just pulled my "My Documents" folder off, and then let Win7 install.

I also have a new (3 weeks old) Seagate 2TB that I am offloading a lot of data to. I'm wondering if I should run HD Tune Pro on it to make sure the drive is going to cooperate.

With that said, I have a couple of smaller, older drives that I might move the data off of and run HD Tune Pro on them for a checkup. (One is a 80GB WD that holds all of my MP3s..)

Krisnatharok
PC Builder, Gamer
Premium Member
join:2009-02-11
Earth Orbit

1 recommendation

Krisnatharok

Premium Member

Always have redundancy regardless of what a program says about a current drive. You never know what may happen tomorrow.

neonhomer
Dearborn 5-2750
Premium Member
join:2004-01-27
Edgewater, FL

neonhomer

Premium Member

Click for full size
Thought I would add this... This is my OS drive I am running on right now...

I'd have to check, but I am sure this drive is out of warranty. (EDIT: Yup, out of warranty) So I guess I am going to go shopping for a new drive tomorrow after work, and just clone this one over to the new drive.

I am running SMART tests on my other drives now...

aurgathor
join:2002-12-01
Lynnwood, WA

aurgathor to koitsu

Member

to koitsu
said by koitsu:

So yep, HD Tune Pro's Erase feature, and Active@ KillDisk, are doing something stupid in their code or are just buggy. See what you get for relying on GUI tools? Sigh.

Did you take take a look at the event log? I'd say there is at least a 66.6% chance that it's the OS that doing something funny.

koitsu
MVM
join:2002-07-16
Mountain View, CA
Humax BGW320-500

koitsu

MVM

I would say the chance is pretty slim; the programs themselves have bugs. The issue with Active@ is completely different than HD Tune Pro. I was able to take the exact same system (without rebooting, making any changes, etc.) and use dd if=/dev/zero of={device string} bs=64k without a single problem. So the issue was definitely was not with the OS or underlying storage subsystem/drivers.

Besides, the Windows Event Log rarely tells me anything useful. It's always dumbed-down. It's very, very rare to find anything in Windows which discloses useful information that an engineer can use. Even the MCA/MCE handler in Windows doesn't give you enough information to decode an MCE; sad panda.

aurgathor
join:2002-12-01
Lynnwood, WA

aurgathor

Member

While I do agree that Windows Event log is very dumbed-down ( or cryptic ), when I have issues, more ofthen than not, there's a corresponding "Critical", "Error", or "Warning" entry in the log, even if the only thing I can do is scratch my head and stare at the error message in disbelief.
quote:
The following fatal alert was generated: 40. The internal error state is 107.
(an actual entry from my just cleared event log )

If there's nothing in the event log, that would definitely be indicative of a bug in the program.

Anonymous_
Anonymous
Premium Member
join:2004-06-21
127.0.0.1

Anonymous_ to koitsu

Premium Member

to koitsu
said by koitsu:

I need to see an actual screenshot from the drive in question, not something you found off the Internet. I'm less interested in "what shows up as failed" as I am in the other statistics that can explain the situation. TL;DR version: I, nor anyone else, can help you unless you provide actual screenshots/data for the drive in question. I will be happy to assist (see my many other posts assisting people with hard disks gone bad) once I can get my hands on that.

Secondly, WD Lifeguard Diagnostic is for a couple specific purposes only -- it does not do the same kind of forensics/analysis as a human being. I can read/interpret SMART statistics better than that software, as pompous as that sounds.

My OCZ SSD Boot drive is connected into the same 6Gb/s Sata port as this Western Digital HDD running in AHCI and SMART scan checked the SSD with no errors.

That cannot be the case, since with SATA there is a 1:1 ratio between port and device. E.g. one hard disk/SSD/whatever per port. Unless, of course, you're using an SATA port multiplier, but I strongly doubt that.

If you meant to say "I use port number X with my SSD and it works fine, and I disconnected my SSD and hooked the same port up to the WD MHDD", then that makes more sense, except I don't know why you're doing that -- it means you don't have an OS/debugging environment to do tests in.

Next, I'd love for you to explain what a "SMART scan" means in this context. See my comment above -- I can read/interpret SMART statistics better than software. If you used WD's software to analyse an OCZ SSD, then that was a very, very bad choice on your part.

SMART is not a "binary thing", meaning the situations where an actual SMART attribute trips the overall SMART health status are fairly extreme; hard disk vendors tend to pick absurd threshold points. There can absolutely be a catastrophic problem happening long before the overall SMART health status goes from "OK" to "FAIL". A great example is the screenshot you did provide: SMART attribute 0xC7 indicates something bad going on with that individuals' SATA cables or SATA port, yet the attribute is "OK".

Learning how to read SMART attributes takes time and experience. It's not something you can learn quickly overnight, as every drive model and every vendor implements their stuff differently. Some attributes may make you think something is catastrophically wrong (especially with Seagate disks), when in fact everything is 100% normal. People almost constantly interpret SMART attributes wrong.

So as I said, I need to see actual screenshots from tools like HD Tune Pro (trial is fine), or smartmontools' smartctl -a or smartctl -x output for the Western Digital disk to be able to provide you with a good analysis of what might be causing the problem. I understand you're going to RMA the disk anyway -- and that's fine -- but there may be something going on that's specific to your motherboard or underlying system. You're welcome to provide me to same data for the SSD too if you want me to review that too.

Sorry for sounding argumentative, but you've shown up asking for help with a MHDD, yet did things like "run SMART tests on the SSD" and haven't provided any actual data that folks here like myself can use to help you.

SATA port multipliers have been standard since SATA Gen2