Mountain View, CA
reply to Subaru
Re: [hard drive] WD Caviar Blue failure rates? Some key points here:
1. Drive started out looking fine -- with a start/stop count of 1540, a power-cycle count of 1327, and an "unsafe shutdown count" (this is mislabelled in HD Tune Pro; it should be Power-off Retract Count) of 39. Read error rate is 0 as well (that will matter momentarily). Load cycle count is not relevant so please do not bring it into the discussion.
It's important to note the drive at this phase has no indications of sector-level issues nor remapped LBAs. The drive also shows no CRC errors, indicating physical cabling and related bits are just fine.
HD Tune Pro's error scan suddenly turning up thousands of errors cannot be substantiated -- problem is likely elsewhere, and this has been seen by others (including me). Keep reading.
2. Next phase shows a drive that passed an error scan without any issues (every LBA was readable), with a start/stop count of 1551, power-cycle count of 1338, and an unsafe shutdown count of 50. Read error rate is still 0.
Comparing these numbers to the previous set shows that the drive started a total of 11 times (1551-1540), power-cycled a total of 11 times (1338-1327), and heads were retracted a total of 11 times (50-39).
Drive at this phase also shows no sector-level issues or remapped LBAs, and did pass a full (non-quick, i.e. every LBA was read successfully) read scan in HD Tune Pro.
Has this system been power-cycled a total of 11 times between those two phases? Pressing Reset does not constitute a power-cycle, nor does a standard system shutdown or reboot.
If the system has not been power-cycled this number of times, then it is very likely your drive is losing power. Bad power circuitry on the PCB is possible (it's the most likely thing to go bad on a hard disk), but bad or rippling voltages from the PSU are another possibility, as well as a faulty PSU or faulty power connector. If you're using a 4-pin-Molex-to-SATA-power adapter, please cease and use a native SATA power connector from the PSU (if the PSU does not have one, buy one which does).
If the system has been power-cycled this number of times, then onward ho:
3. Next phase shows the drive having been power-cycled another 8 times (1346-1338).
The drive suddenly begins showing 2 suspect LBAs -- meaning attempts to read from those LBAs will result in an I/O error (unreadable), so you've lost bare minimum 1024 bytes of data at this point.
The read error rate now jumps to an arbitrary value of 502, indicating the drive has had some issues reading data from the platters. This is absolutely related to the aforementioned 2 suspect LBAs.
Owner of drive now states the drive "shows up in Windows but has no drive letter", indicating the suspect LBAs could be ones used by the main NTFS partition table, or could be sector 0 (in which case you're kind of screwed).
So, things for you to figure out:
1. You need to start keeping track of every time the system is rebooted (just in case), reset is pressed, or power-cycled. Keep a pad of paper handy for this task.
2. You also need to keep track of situations where the drive begins showing problems/issues. If you really think it's falling off the bus (and it may be -- see above analysis model), then the cause would be loss of power, and SMART tracks this. Meaning: if the drive is operating fine, then suddenly starts showing problems, look at the relevant counters before and after the issue. No offence (honest), but use your brain on this one -- for example, if power-cycling the drive is required to get it to show back up on the bus, then if the drive abruptly lost power this counter may end up having incremented by 2 rather than just 1. That pad of paper comes in handy again...
3. I strongly suggest zeroing the entire drive at this point. You can do this in HD Tune Pro, but you will need to delete all partitions from the drive using Disk Management or a similar tool (HD Tune Pro won't let you erase the drive if there are partitions on it).
The reason for doing this is pretty simple: it's the only way to get the drive to determine if those 2 suspect LBAs are actually bad and if they need to be remapped or not -- the only way to do that is to issue writes to the LBAs. If the sectors which the LBAs point to are actually fine, the "Current Pending Sector" count will decrement. If the sectors are determined as actually bad, "Current Pending Sector" will decrement, and "Reallocated Sector Count" will increment; "Reallocated Event Count" might also increment.
Furthermore (and just as important), it will issue writes to every LBA on the drive, so any sector which is acting wonky/strange which you haven't seen the effects of yet can be "pre-emptively" dealt with per se (e.g. remapped if need be).
Be sure to take SMART health snapshots/screenshots before and after the erase so you can see what's transpired; I can do an analysis of that post-mortem if you wish.
Finally, if you don't want to deal with any of this, leave the drive in its current condition (2 suspect LBAs shown) and do an Advanced RMA with Western Digital and get a new/replacement drive first, then send them the one that's driving you bonkers. The reason is "Bad sectors" or "Bad blocks" (even though that's not necessarily true at this point, they're not going to care).
Making life hard for others since 1977.
I speak for myself and not my employer/affiliates of my employer.
Would love to try this but windows after boot thinks its a great time to do check disk... Well its been running for like 2 hours and its only at 5113 of 258560 seems a bit slow, no?
ended up just rebooting and scanning the drive in windows now much faster...
Not looking very good on the scan 11 min in and almost 9% damaged blocks.