dslreports logo
site
 
    All Forums Hot Topics Gallery
spc

spacer




how-to block ads


Search Topic:
uniqs
914
share rss forum feed


squircle

join:2009-06-23
Oakville, ON

Motherboard suddenly stopped booting?

A couple of years ago, I repurposed an old Alienware machine with an Intel D975XBX motherboard into a home Linux server. It's been running solidly since then, but this weekend, I needed to swap out the boot drive from a 200GB to a 320GB Seagate Barracuda IDE drive. I duplicated the old drive to the new drive and swapped it in. Now, for some reason, the computer refuses to boot anymore. Let me take you through a timeline.

--> BIOS/POST screen briefly flashes (as always) a few seconds after turning it on.
--> Highpoint PCI-e RAID card screen displays attached drives (as normal)
--> Generic (Silicon Image-based) PCI IDE controller (to which the boot drive is not connected) displays its information for 30-40 seconds (up from the usual 2-3)
--> Final BIOS screen comes on after running option ROMs and attempts to boot the system, but all that's left is a blinking cursor in the top-left corner

The BIOS sees the drive fine, the drive itself is fine and set as first boot device in the BIOS. I've tried clearing the BIOS to no avail, but now it is unable to boot any IDE drive, even though it can read/write to them perfectly. I've tried it with and without any add-in cards, but that hasn't changed anything. I'm really at a loss here.

I should mention that I can boot my USB CD drive fine and have used Finnix to verify that I can indeed access the boot drive. No problems there. I don't have a spare drive to see if it will boot from SATA, but I don't see why it wouldn't (unless it's experiencing the same problem as this drive). I've re-installed GRUB on the drive in question and it has a valid MBR partition table, so that's definitely not it.

Now this could be coincidence, but the first time this happened, I had plugged 4 3TB WD Red drives into the onboard SATA ports; it hasn't booted since. If the 2.2TB limit thing screwed up something (it is an old motherboard)... wouldn't that be strange.

Any help is appreciated!



Dissembled

join:2008-01-23
Indianapolis, IN

You sound fairly accomplished in this area and seem like you know what you're talking about.

But let me get this straight, you swapped a drive and all the sudden it won't boot? The first thing you did was put the old drive in and same result? Not sure if I caught that part or not.

Did you zap something while you were doing this swap?



squircle

join:2009-06-23
Oakville, ON

said by Dissembled:

You sound fairly accomplished in this area and seem like you know what you're talking about.

But let me get this straight, you swapped a drive and all the sudden it won't boot? The first thing you did was put the old drive in and same result? Not sure if I caught that part or not.

Did you zap something while you were doing this swap?

I like to think so

That's correct; I cloned one drive to the other, reinstalled bootloader for good measure, and now it won't boot. Put original drive in, it won't boot either.

I didn't zap anything; I was in contact with the case the whole time, didn't touch the motherboard... I don't know how I could've zapped anything.


berserken

join:2011-03-27
Oakland, CA
kudos:1
Reviews:
·Comcast
reply to squircle

Did you before the swap and/or do you now have a drive connected to Generic (Silicon Image-based) PCI IDE controller ? I've seen whether or not a controller is enabled in BIOS affects grub's identification of disks, resulting in blinking cursor.

Another diagnostic is to put an empty file as a marker at the root of your drive by booting some removable device and doing something like:

# mount /dev/sda1 /mnt
# touch /mnt/2011-ssd
# umount /mnt

Then, after calling the grub shell on this or some other live bootable device that has grub on it:

# grub
Probing devices to guess BIOS drives. This may take a long time.
 
    GNU GRUB  version 0.97  (640K lower / 3072K upper memory)
 
 [ Minimal BASH-like line editing is supported.  For the first word, TAB
   lists possible command completions.  Anywhere else TAB lists the possible
   completions of a device/filename. ]
grub> find /2011-ssd
find /2011-ssd
 (hd0,1)
grub> quit
quit
#
 

That can help get the grub syntax correct or give some other clue but it's not always straightforward. I've found that grub sees a different disk order at the grub boot screen vs. after the OS is booted, sometimes. I've been running five or six drives and have installed grub to the MBR of every drive, at times, to get it loaded at boot. The BIOS can re-enumerate disks at times, especially after booting a removable drive for an occasional run of something or other. In the BIOS on this machine, there is a place to set the boot order and a place to set HDD priority, which affects what happens in the boot order. I'm thinking it's a matter of how you have the BIOS and/or grub configured. :)


squircle

join:2009-06-23
Oakville, ON

Thanks for the post. Neither drive was ever connected to the PCI IDE controller, and I've never booted from the controller. I have the boot drive set as #1 to boot right now, and none of the other connected drives (all to the Highpoint RAID controller) have bootloaders on them. The trouble is, the system isn't even booting into GRUB; if it was, I'd know how to repair it. I've tried reinstalling the bootloader to the drive but it hasn't helped.

That's what I can't make sense of: I have it set first in boot order, I've verified that GRUB is intact, but the machine refuses to boot from that (or any other) IDE drive. I know it's not the drive, so I figured it was the motherboard crapping out.

To your last point, I wish it was a GRUB or BIOS config issue; those are easy to solve



koitsu
Premium,MVM
join:2002-07-16
Mountain View, CA
kudos:23
reply to squircle

The system is almost certainly POSTing. You're insistent that the problem is with the motherboard, while I would argue heavily against it. I realise you're very insistent that GRUB is not being run at all, but you really don't know that for certain -- there's a lot of x86 code that runs in the MBR and subsequent bootstraps before anything is displayed on-screen. The same goes for any x86 bootloader (and this is where things like Sparc's PROM come in handy, too bad PC architecture sucks :P).

Three possibilities as I see them:

1) When you swapped drives, the previous drive was operating in either CHS, Large (ECHS), or possibly LBA addressing mode -- while the new drive is operating in a different mode. This will throw all sorts of things for a loop, particularly bootloaders (as the drive geometry now appears different). The default which most BIOSes use -- "Auto" -- picks whichever the BIOS feels is best. This may be different for the 200GB drive than the 320GB drive. This is one of the many reasons IDE/PATA sucks.

I would suggest playing around with this setting in the BIOS. I would also suggest putting the 200GB drive back into the system and seeing what the kernel reports as drive geometry; if the BIOS shows a summary screen, it might also disclose what the addressing mode used is for that model of drive (this is pretty common). If not, try changing it in the BIOS (with the 200GB drive) and see if you can find the non-Auto mode which works reliably, then try that on the 320GB drive.

2) A problem with the bootloader installation on the new 320GB drive. Whatever you used to "duplicate the old to the new" may not have duplicated all the necessary bits. I am not extensively familiar with GRUB, but as I understand it changing any part of the underlying disk hardware requires one rewrite the bootloader (that's the MBR and all subsequent bootstraps). This is especially important if the drive addressing mode changes.

If you're using an MBR scheme then this would be LBA 0 and whatever subsequent boot stages that come after that (GRUB is a multi-stage bootloader and lives both within the MBR as well as subsequent LBAs).

If you're using a GPT scheme, GPT bootstraps are somewhat scattered -- usually LBA 0 is used for booting into the next bootstrap portion (which usually lives within the first 512KBytes of the drive) -- but the GPT table itself lives both around there as well as at the end of the drive. See Wikipedia's article on GPT.

I would recommend you boot some CD/DVD-based distro (whichever you used to get into GRUB to begin with) and rewrite the GRUB bits from there. You may find that's all it takes -- and if that's the case, it's either (1) or (2) which occurred.

3) A combination of (1) and (2).

Remember: LBA-to-LBA copies do not guarantee that the system will be usable/bootable afterwards, just that you copied every LBA from X to Y. PATA's addressing mode stuff makes it difficult. This is why when upgrading boot drives, I often tell people to just reinstall the OS.
--
Making life hard for others since 1977.
I speak for myself and not my employer/affiliates of my employer.



squircle

join:2009-06-23
Oakville, ON

Thanks for your reply!

When I cloned the drives, I copied, bit-for-bit, from the beginning of the drive until the beginning of the first partition (which copied the MBR table and stuff), so I figured that would work. When it didn't, I nuked all those bytes, reconstructed the partition table and re-installed the bootloader from scratch, so I was somewhat surprised when even that didn't work. The part I find odd is that it's having trouble booting but I can access every LBA on the drive without issue, and everything seems to be in the proper place. I suppose I could give the boot repair CD a try and see if that works, but if it doesn't, I guess I'll have to take a look at the BIOS.

FWIW, they're both Seagate Barracuda 7200.10 drives, they just differ in capacity. Of course, the logical part of me is thinking they can't be that different, but I may be wrong and it may be #1 after all.

And I totally agree with you that IDE/PATA sucks; I just wish I had more ports (and drives) so I could boot from SATA and be done with it.

I'll investigate further and let you know!



koitsu
Premium,MVM
join:2002-07-16
Mountain View, CA
kudos:23

The drives having the same series number (7200.10) does not mean "they're pretty much identical". The capacity increasing means the platter count may have changed (though the platter itself definitely has), which also means the total LBA count has increased. LBA count directly affects a BIOS's decision to use CHS vs. Large (ECHS) vs. LBA mode. That addressing mode choice also affects how a bootloader makes its decisions; i.e. when you write the boot blocks, it may have been when using CHS addressing, so it may have decided to write things to C=839, H=255, S=0, while on the new drive it may be using LBA which would use a totally different location. On the bright side, at least sector size didn't change.

I would suggest examining the addressing modes in the BIOS first, as that's the most likely explanation for a non-bootable PATA system (in your situation) that I can think of.
--
Making life hard for others since 1977.
I speak for myself and not my employer/affiliates of my employer.