site Search:


 
    All Forums Hot Topics Gallery






how-to block ads


 
Search Topic:
Uniqs:
2201
Share Topic
Posting?
Post a:
Post a:
Links: ·Belarc Advisor ·Asking Tech Questions ·Athlon XP True Speeds ·BIOS Beep Codes ·Hardware Tech #s
page: 1 · 2 · 3
AuthorAll Replies


Jovi21

@comcast.net

reply to Jovi21

Re: Problems with laptop

Koitsu, I did read what you wrote- I was just curious to know what would cause sectors to go bad, if they really did go bad.

Do you think it would be cheaper to get Dell to replace the drive, or for me to buy a new one and replace it myself? I doubt Dell would replace it for free since I've had it since 2007.

It would make sense for I/O errors to occur. It sounds like my best solution would be to transfer the most important files to another computer (they seem to be working fine at the moment) and try the procedure you suggested. I might try this in a few days, after I make sure I have gotten all of the files I need.


koitsu
Premium,MVM
join:2002-07-16
Mountain View, CA
kudos:14

I'll try to find some time to answer this later today or later this week. Sorry, things at work right now are insane (our entire division just got sold to another company), so I don't have as many free cycles as I normally would.
--
Making life hard for others since 1977.
I speak for myself and not my employer/affiliates of my employer.



Jovi21

@comcast.net

reply to Jovi21
It's okay- there is no hurry.



koitsu
Premium,MVM
join:2002-07-16
Mountain View, CA
kudos:14

2 edits

reply to Jovi21

said by Jovi21 :

Koitsu, I did read what you wrote- I was just curious to know what would cause sectors to go bad, if they really did go bad.

I took some time tonight to write up this response, and re-wrote it a good 4 times before feeling comfortable with what I'd written. I realise what you're asking are simple questions that warrant simple/logical answers, but with technology nothing is ever simple. Hard disks are quite complex (no matter what anyone tells you).

Please keep in mind most of this pertains to ATA/SATA disks only, and only non-RAID scenarios. RAID makes a huge mess of all of this. Likewise, SCSI is a whole different game, but I do have a small note in one of the descriptions covering a bit of SCSI (specifically use of low-level formatting).

Terminology

Sector -- an area of a hard disk that consists of a header section, a data section (usually 512 bytes, or 4096 bytes on new Advanced Format drives), and an ECC section. You can read the 3rd paragraph here to get an idea. The header defines information about both what's in the data section, as well as some information about what's in the ECC error. The data section holds your actual data. The ECC section contains some parity information and other "neat stuff" that is used to auto-correct certain kinds of errors (if encountered) and ensure that you always get the same content back when reading the sector. The algorithm used to auto-correct errors varies from drive to drive, but most these days use some form of forward error correction (algorithms vary greatly here).

LBA -- stands for Logical Block Addressing, and is a way to refer to ("address") the sectors on a hard disk. Back in the olden days, we used to the old C/H/S (cylinder/head/sector) model of addressing specific sectors, but LBA supercedes that. An LBA is just an integer number that ranges from 0 to N, where N is the last LBA on the drive. Larger capacity drives usually have more LBAs, etc.. However, the thing to note is that there is not always a 1:1 ratio between LBA number and physical sector (see "remapping" below).

LBAs/sectors -- a term I use to describe either. They are technically different things, but if I talk to people who aren't familiar with SCSI or ATA protocols or hard disk technology and use the terms LBA and sector separately, this often confuses people. Most of the time (barring remaps -- see below), there is a 1:1 ratio between LBA and sector number.

Conditions

Suspect LBAs/sectors -- often the result of a drive experiencing problems when reading a sector -- either the header, data, or ECC section -- or, if successfully able to read them all, is unable to automatically correct any errors (magnetic substrates are far from perfect!).

These type of sectors are marked unreadable from that point forward. The drive internally does this. There is no way for the OS to read them -- no matter what you hear/read, that includes utilities like SpinRite. All that can be done with these going forward is re-analysis, which determines if the anomaly witnessed was transient or was permanent.

Suspect LBAs/sectors, when read from, should return a read error (I/O error) to the OS. Suspect LBAs/sectors, when written to, results in re-analysis. More on that below.

Bad LBAs/sectors -- sectors which the drive has attempted writes to and determined that there is some kind of anomaly where data cannot be reliably written and re-read even with ECC. These are permanent -- once a sector is deemed bad, it cannot be used ever again. Note: ATA/SATA disks do not offer true low-level format capability, while SCSI disks still do. SCSI disks handle suspect and bad LBAs/sectors differently than ATA/SATA does, and such defects are handled in a different way. Defects on SCSI disks come in two types -- grown and physical. A true low-level format of a SCSI disk re-detects all sector anomalies and re-creates the physical defect list. This cannot be done on ATA/SATA unless the vendor provides a proprietary tool (I have yet to see one which does). A format (in Windows, a DOS utility, etc.) or "erase" on an ATA/SATA is not the same thing as a low-level format -- do not let anyone tell you otherwise.

Remapped LBAs/sectors -- use of spare sectors to effectively replace ones which are bad. LBA number N, instead of referring to sector N (e.g. a 1:1 LBA-to-sector ratio), now returns to sector X. Every drive has spare sectors which it can use for remapping, sometimes thousands.

For example, let's say sector 12345 is determined to be bad, thus LBA 12345 would always return an I/O error. Assuming there are spare sectors available, the drive internally maps LBA 12345 to a new spare sector -- say, sector 84939282 -- so that going forward every time you read/write to/from LBA 12345 it actually reads/writes to/from sector 84939282.

The tricky part about remapping that most people do not realise is that no data is transferred from the bad sector to the new (spare) sector. A filesystem will have no knowledge of this happening unless the filesystem itself has some kind of checksumming method implemented (such as ZFS or Btrfs). NTFS, FAT, FAT32, ext2, ext3, UFS/FFS, etc. do not have any such thing. Meaning: a remap absolutely results in lost data.

Q&A

What can cause a bad sector? -- So many things that it's almost impossible to list them all off. The most common that I've seen are magnetic substrate problems (e.g. the magnetic surface area of the drive becomes worn or isn't effective any longer), specks of dust or physical manifestations on the platter (consider that read/write heads on a hard disk sit literally 3 nanometres above the platter surfaces. See this for analogies how small a nanometre is; 1 nanometre = 1 billionth of a metre), mechanical problems that only occur within certain areas of a platter. Things like head crashes affect a large number of sectors, so in your case, if those 2 LBAs/sectors turn out to be bad, it definitely is not a head crash. :-)

One of my favourite stories regarding magnetic substrate problems is the famous IBM DeskStar issue (resulting in the drives being called "Deathstars", which many of us dealt with in the early 2000s. I absolutely love this page which shows you what happened with these drives -- the magnetic substrate is completely gone from the platters! The reason is that IBM chose to cover glass-based platters with magnetic substrate, and over time the substrate came off the platters, collecting inside of the drive itself. That is why in the 5th and 6th pictures the platters are transparent on the outside edges but not on the inside -- literally the substrate is gone, along with all your data. The last picture makes this quite apparent. It's a little hard to write a 1 or a 0 to a non-magnetic surface (glass)! ;-) As for the blue goo, I imagine it's some kind of lubricant leaking out, which scares the hell out of me.

How do I get a drive to re-analyse a suspect LBA/sector? -- simple: you issue a single write to that LBA. When the drive detects a write to a suspect LBA/sector, it performs a series of internal operations (these take longer than a normal write) to determine if the sector is actually usable (e.g. if the condition that resulted in it being marked suspect was transient or not). Quite often when a very small number of LBAs/sectors are suspect, they're usually transient.

How do I determine what LBAs are suspect? -- this is a tricky one. There are a multitude of methods, including ones which can use the hard disk itself (via SMART surface tests) to issue reads of all LBAs, or a select range of LBAs, to try and determine which ones are unreadable.

The tricky part is that sometimes what an application/program reports as an unreadable LBA is actually incorrect. There are all sorts of reasons for this (buggy drivers, weird controllers, strange NCQ implementations, etc.).

As such, I usually recommend to folks here on the forum that they engage me for assistance in this process. If possible I tend to recommend they use 3 tools: smartmontools, HD Tune Pro (trial version is fine, but do not use the free version), and dd for Win32. Using all 3 tools in combination its possible to determine what the suspect (unreadable) LBAs are and then issue writes to them.

What if I don't want to bother with any of that? Can I just ignore the situation? -- absolutely. In this case, any reads from those suspect LBAs will result in read errors (I/O errors). This can be acceptable depending on what files or what data refers to those LBAs. Eventually, assuming the drive gets a lot of writes, those LBAs will be written to naturally (you download a large file, you delete some old files, etc.), and they will be re-analysed on their own.

What if I use CHKDSK or CHKDSK /F /R? -- this is not guaranteed to work. CHKDSK, like fsck on *IX operating systems, only verifies the integrity of the allocation tables within the filesystem, it does not verify the integrity of the data sections of files. The /R flag is somewhat mysterious and badly documented; the flag may result in a read of every LBA on the partition, however it won't issue a write (which is ultimately what re-analyses suspect LBAs). The only situation where CHKDSK will help you is if the suspect LBAs are within the file allocation sections of the partition (e.g. the MFT or FAT).

What if I use Windows FORMAT, specifically with the /U flag, to erase the drive? -- its been determined that FORMAT's behaviour greatly depends upon what version of Windows you use, and whether or not the suspect LBA is within the partition you plan on formatting. It's confirmed that Windows XP FORMAT /U does not erase every LBA in the partition -- instead, all it does is do a quick erase, then proceed to read every LBA in the partition, and any which are unreadable it makes note of in a hidden file called $BadClus (well, in the case of NTFS). Windows Vista has some undocumented flags that may be able to help, while Windows 7 has some flags which can help. I can speak more about this at a later date, but given the variance in behaviour/OS, I recommend not using this method.

What if I erase the drive using HD Tune Pro, which would write zeros to all the LBAs on the drive? -- Yup, this works fine. For people who are willing to go this far, I recommend it. It truly issues a write (of zeros) to every LBA on the drive. Please note that this has the added advantage of detecting and dealing with suspect (or bad) LBAs/sectors which one hasn't encountered yet. You can accomplish the same thing in an *IX operating system using dd if=/dev/zero of=/dev/somedisk bs=64k or similar.

Can you explain the "remapping" thing a bit more? -- sure. Let's say you have a 1197 byte file on a 512-byte sector drive. This file is spread across 3 sectors (1197 / 512 = 2.34, thus we round up). Let's say the LBAs that correlate with those sectors are in linear order, so: LBA 83912, 83913, and 83914.

One day you read this file and receive an I/O error. You check SMART and find that you have 1 "pending" sector (a sector marked "suspect"). You run a surface scan of the drive and find that LBA 83913 is unreadable. You then use some other tools to try and read that LBA and confirm that is in fact the LBA which you can't read. At this point you know for a fact you have lost 512 bytes of that file (such is the case with suspect LBAs), regardless if the sector turns out to be usable or not.

So you decide to issue a 512-byte write command to LBA 83913 to see what happens. After 5-6 seconds, the system comes back and says it worked (no error). You check SMART stats again and find that there are no longer any pending sectors -- turns out LBA 83913 had some kind of transient error and is fine now. That LBA now has zeros in it, however, so you've still lost 512 bytes of data.

For getting your data back you have one option: restore the file from backups.

Anyway, training session over for now -- I'm tired. Back to the questions:

said by Jovi21 :

Do you think it would be cheaper to get Dell to replace the drive, or for me to buy a new one and replace it myself? I doubt Dell would replace it for free since I've had it since 2007.

The laptop drive is both out of warranty (from Dell and probably from the laptop drive manufacturer itself). Yes, you can replace the drive by yourself if you want, however the process of copying all the data over (Windows, programs, data, etc.) is a bit tricky. It's easier if you simply stick a new drive in and reinstall Windows from scratch.

If you don't have Windows CDs/DVDs (on many Dell machines these aren't provided, instead there's a "hidden partition" that has the equivalent of your Windows CDs/DVDs), then this gets even trickier.

If you feel comfortable doing this by yourself or with help on this forum, that's fine. Please be aware I won't help you with this procedure; others here can help you with it, but I choose not to (it's more of a "how to copy/restore data" situation, which I'm well-educated with regards to, but there are others here who can step you through that easier. I'm more of a low-level technician).

Just be aware that if you copy data from the old drive to the new, there may be a point where you get some read errors. If you read my above explanation, you'll realise that you've lost some data in the process, so you and/or whoever helps you will need to deal with that situation (restoring from backups, etc.) so you get a good copy of whatever that file is. This is especially important if it's a Windows driver file or something within, say, C:\Windows\System32. You get the idea I hope.

said by Jovi21 :

It would make sense for I/O errors to occur. It sounds like my best solution would be to transfer the most important files to another computer (they seem to be working fine at the moment) and try the procedure you suggested. I might try this in a few days, after I make sure I have gotten all of the files I need.

Let me know, time permitting. Please be aware that I can't watch this thread constantly and provide real-time support; it may be days between when you ask for instructions and when I reply. If this is your main computer, this may be a problem, in which case you may want to consider simply taking the machine to a local repair shop and have them migrate the data for you. Those sorts of outfits, though, are often hit-or-miss.

This is why you should do backups (of bare-metal type) regularly. You could then simply buy a brand new drive, stick it in, and restore from backups + resize partitions to make use of the new space.

--
Making life hard for others since 1977.
I speak for myself and not my employer/affiliates of my employer.


Jovi21

@comcast.net

Thank you for taking the time to provide such detailed information- it is a lot to take in, but very interesting! I think I am going to just replace the drive. I do have a Windows installation cd, so that shouldn't be a problem. My most important data seems fine now, but is there a possibility of errors existing that won't show up until the data is transferred? If not, I should be okay. I do have one last (hopefully) quick question. I know you said computer beeps need to be diagnosed in person, but can you think of any possible reason why a computer would beep when a dialog box pops up? My laptop just started doing that within the past week. If there are too many possibilities to list, that's okay- I was just curious.


Sunday, 03-Jun 18:41:34 Terms of Use & Privacy | feedback | contact | Hosting by nac.net - DSL,Hosting & Co-lo
over 12.5 years online © 1999-2012 dslreports.com.
Most commented news this week
Hot Topics