dslreports logo
Search similar:


uniqs
5583

pflog
Bueller? Bueller?
MVM
join:2001-09-01
El Dorado Hills, CA

1 edit

pflog

MVM

Warning to Intel e1000e owners

I thought I'd pass this on:

»www.heise-online.co.uk/n ··· -/111583

It appears the Intel e1000e card can be bricked by the 2.6.27rc1 kernel. So be careful if you have one of these cards and are planning on installing a newer distro that may be using a 2.6.27 kernel!

I admit, I haven't read the details yet fully. Reading now, but just a heads up!
pflog

1 edit

pflog

MVM

Re: New OS feature - brick your hardware!

Here's someone from the e1000 driver team commenting on the issue:
I work on the e1000 team (including the e1000e driver) and here is what we know. A panic in another driver (believed to be the gfx driver but uncertain) which scribbles over the NIC/LOM non-volatile memory (NVM). This is only happening with the 2.6.27-rc kernels on ICHx systems. Since the NIC/LOM VNM is part of the whole BIOS image other things in the system could be effected by this driver panic as well. An update of the system BIOS will restore the NIC/LOM to be operational. We have some patches under test right now that we will be releasing later today to protect the NIC/LOM NVM. That should help narrow down who is scribbling over NVM.
And here's the link to the OpenSUSE mailing list from an Intel dev:

»lists.opensuse.org/opens ··· 017.html

jdong
Eat A Beaver, Save A Tree.
Premium Member
join:2002-07-09
Rochester, MI

jdong to pflog

Premium Member

to pflog

Re: Warning to Intel e1000e owners

Nice heads-up. I've posted a warning earlier this morning to the UbuntuForums regarding this issue with a more complete writeup: »ubuntuforums.org/showthr ··· t=927943

While it appears like a random event, the consequences are pretty serious. As I commented in the end, it puts a whole new perspective on what it means when vendors give warnings on testing prerelease software.

joako
Premium Member
join:2000-09-07
/dev/null

joako to pflog

Premium Member

to pflog
FWIW the SLED 11.0 beta has e1000e on blacklist....

jdong
Eat A Beaver, Save A Tree.
Premium Member
join:2002-07-09
Rochester, MI

jdong

Premium Member

said by joako:

FWIW the SLED 11.0 beta has e1000e on blacklist....
We've taken the same approach at Ubuntu until this is resolved; The Intel folks are working on an EEPROM reflasher to reverse this damage; though I must caution people not to go around looking for random tools like IBAUTIL to fix this; as you may cause more damage than you have now.

kleeman
Reduce blood pressure. Ignore trolls
join:2000-07-29
Nyack, NY
884.6 923.7

kleeman to pflog

Member

to pflog
In reading through several threads on this issue I was unclear about which was the first kernel version with this issue. The e1000e driver was first introduced in the 2.6.26 version kernel.

. »lwn.net/Articles/278016/

Ubuntu Intrepid uses the 2.6.27 kernel which is where many reports started but I use the 2.6.26 kernel on hardy......

Any info on this issue jdong?

jdong
Eat A Beaver, Save A Tree.
Premium Member
join:2002-07-09
Rochester, MI

jdong

Premium Member

said by kleeman:

In reading through several threads on this issue I was unclear about which was the first kernel version with this issue. The e1000e driver was first introduced in the 2.6.26 version kernel.

. »lwn.net/Articles/278016/

Ubuntu Intrepid uses the 2.6.27 kernel which is where many reports started but I use the 2.6.26 kernel on hardy......

Any info on this issue jdong?
Technically the fundamental problem exists in 2.6.26's e1000e driver too. The issue here is that e1000e maps registers controlling the flashing of NVRAM and LOM of the chipset into memory space. The issue was made present in 2.6.27 because some unknown (most likely a graphics driver) is spewing random garbage into memory space when crashing, which just happens to flip the NVRAM registers in the right way to write some nonsense into there.

Technically if you took some syscalls and zeroed all the RAM space on 2.6.26 in Hardy, you can hurt your NIC the same way, but I don't think anyone sane will do that, so the practical answer is this is a 2.6.27 problem as far as the likelihood of "bricking" the NIC, but it's a 2.6.26+ problem as far as the fundamental design flaw of the driver.

kleeman
Reduce blood pressure. Ignore trolls
join:2000-07-29
Nyack, NY
884.6 923.7

kleeman

Member

Thanks. I am still a bit unclear about this though. Couldn't this (unknown) driver also be acting the same way in 2.6.26 or is there other info that rules that possibility out.?

BTW I got a bit paranoid and manually blacklisted the driver

jdong
Eat A Beaver, Save A Tree.
Premium Member
join:2002-07-09
Rochester, MI

jdong

Premium Member

said by kleeman:

Thanks. I am still a bit unclear about this though. Couldn't this (unknown) driver also be acting the same way in 2.6.26 or is there other info that rules that possibility out.?

BTW I got a bit paranoid and manually blacklisted the driver
Well the kernel devs are pretty confident the crashy driver was introduced in 2.6.27. But yeah, it is somewhat possible your worries may be true. But given that so many people using 2.6.26 have not reported any issues except when moving to 2.6.27 I'm more inclined to believe it's a 2.6.27 problem.

kleeman
Reduce blood pressure. Ignore trolls
join:2000-07-29
Nyack, NY
884.6 923.7

1 edit

kleeman

Member

Thanks for the additional info.

I am assuming intel will fix this before too long and hopefully provide a script to repair any damage.

Edit: Here is the kernel bug thread.

»bugzilla.kernel.org/show ··· id=11382

Reading through it does appear that an intel employee is on the case.

jdong
Eat A Beaver, Save A Tree.
Premium Member
join:2002-07-09
Rochester, MI

jdong to pflog

Premium Member

to pflog
yeah intel has been on the ball since the original report. An EEPROM reflasher utility is apparently in the works.

rolfp5
join:2001-09-12
Oakland, CA

rolfp5 to pflog

Member

to pflog
Intel devels have created an interim kernel patch that provides for re-enabling the driver:
»lkml.org/lkml/2008/10/1/368

Mandriva has implemented it:
[root@localhost /]# rpm -q --changelog kernel-desktop-2.6.27-0.rc8.2mnb-1-1mnb2 | head
* Wed Oct 01 2008 Pascal Terjan 2.6.27-0.rc8.2mnb
o Herton Ronaldo Krzesinski
- Add fix for e1000e corruption bug and re-enable it
lkml.org/lkml/2008/10/1/368). Closes #44147

* Wed Oct 01 2008 Pascal Terjan 2.6.27-0.rc8.1mnb
o Herton Ronaldo Krzesinski
- Fix sis190 ethernet device support on Asus P5SD2-VM motherboard
(kernel.org bug #11073).
- Add fix for sata_nv regression in latest 2.6.27 rcs (kernel.org

jdong
Eat A Beaver, Save A Tree.
Premium Member
join:2002-07-09
Rochester, MI

jdong

Premium Member

Ubuntu's latest (post-beta) kernel upload adds this upstream fix too.
salahx
join:2001-12-03
Saint Louis, MO

salahx to pflog

Member

to pflog
The culprit has probably been found. Turns out it was due to bug a part of ftrace (CONFIG_DYNAMIC_FTRACE). ftrace wasn't added until the 2.6.27 merge window (which is why no one with any earlier kernel saw it). There already a fix for it, but its been held off for 2.6.28 since there's quite a few changes involved. So, as a workaround, for 2.6.27.1 CONFIG_DYNAMIC_FTRACE is now marked BROKEN to prevent any further unintentional foot bullets.

rolfp5
join:2001-09-12
Oakland, CA

rolfp5

Member

That's some interesting, if, largely, incomprehensible, to me, reading.

In that thread, I see CONFIG_DYNAMIC_FTRACE repeatedly, however,
[rolf@localhost ~]$ grep CONFIG_DYNAMIC_FTRACE /boot/config
[rolf@localhost ~]$ 
 
while,
[rolf@localhost ~]$ grep -i ftrace /boot/config
CONFIG_HAVE_FTRACE=y
CONFIG_HAVE_DYNAMIC_FTRACE=y
# CONFIG_FTRACE is not set
[rolf@localhost ~]$ uname -r
2.6.27-desktop-0.rc8.2mnb
[rolf@localhost ~]$
 
so, I wonder, additionally, about the disparity...
SUMware
Premium Member
join:2002-05-21

1 edit

SUMware to pflog

Premium Member

to pflog

openSUSE Fix

Intel e1000e Corruption Fixed - Already in openSUSE 11.1 Beta2 (with exception of Debug, Vanilla Kernels)
October 16th, 2008
by Andreas Jaeger

The patches we did for the Intel e1000e network card for Beta2 protect the chip so that the NVRAM could not get corrupted anymore and we indeed did not receive any new bug reports and could not reproduce the bug anymore on our systems.

Further investigation by Intel has found the root cause of the problem as Steven Rostedt wrote on the linux kernel mailing list : The dynamic ftrace code contained some fragile code that could write to ioremap-ed memory and thus corrupt the NVRAM. The issue could happen “when the init functions of a module are freed and the nvram is vmapped there as well”. The full story can be found on LKML.

Since 24th of September, we have disabled for our kernel of the day the dynamic ftrace code due for all flavors except the debug and vanilla kernels (on x86 and x86-64 - it was not enabled on other architectures). We have also added the NVRAM protection patches to all kernel flavors. Therefore Beta2 already contains - by pure luck - not only the NVRAM protection but also not anymore the broken code.

Beta3 will contain the same fixes - and the kernel of the day has just been updated with dynamic ftrace code disabled also for the debug and vanilla kernels (with the update to 2.6.27.1).

So, if you’re running a debug or vanilla kernel, I advice - to be on the safe side - to update to the 2.6.27.1 kernel of the day. For everybody else: The Beta2 and Beta3 kernels should not corrupt your Intel e1000e NVRAM.

I’d like to thank all that were involved in debugging and fixing the issues around this, including our kernel developers Karsten Keil and Jiri Kosina who debugged and worked on a solution, testers that fried their machine and helped debugging like Stephan Binner and Vladimir Botka, and the team at Intel for developing protection code and finding and fixing the root cause.
MTB
join:2007-06-29
Newport Beach, CA

MTB to pflog

Member

to pflog

Re: Warning to Intel e1000e owners

Is this a 2.6.27 problem or did it start in 2.6.17. Some posts indicate trouble with some sort of merge at this point.

I am not sure that this is only related to the e1000 card since I have one that works fine with openSUSE11.0 2.6.25 but the ipw2200 cards are now stinking up the place.

If the e1000 card is handled the way the ipw2200 I would not doubt what is going on and it is possible that unstable drivers are getting into the mix as they are in openSUSE 11.0.

Intel and the DISTROS need to make a post as to what is really going on, how to get and/or install any Intel cards. At least package some of the drivers in a tested/recommended fashion and not the random free for all that is currently in place.

I personally do not plan on buying any more Intel products. These guys claim to offer support for the ipw2200 but not for the hardware. Sounds like more of a windows standard than a linux standard and just a bad direction to go.

jdong
Eat A Beaver, Save A Tree.
Premium Member
join:2002-07-09
Rochester, MI

jdong

Premium Member

This has nothing at all to do with ipw2200. The "problem" was introduced with the e1000e driver (NOT to be confused with e1000) but it couldn't be triggered until CONFIG_DYNAMIC_FTRACE from 2.6.27.
MTB
join:2007-06-29
Newport Beach, CA

MTB

Member

Thanks for the info.

Was the "problem" triggered by CONFIG_DYNAMIC_FTRACE, the use of unstable drivers in distros or both.

and

The fact that Intel does not appear make hardware information availible for open source projects.
MTB

1 edit

MTB to jdong

Member

to jdong
.
MTB

MTB to jdong

Member

to jdong
The problem appears to extend further than stated.
said by openSUSE comment :
Comment by bonux
2008-10-17 14:58:08

Where can I get the fix for 11.0 I need to download it and fix my Lenovo T60. It isn’t working at the moment.. I am desperate, please help…I know this might be the wrong place for this but I can’t help myself. The eth0 does not even appear on my list of interfaces


I will stick behind what I have said.

pflog
Bueller? Bueller?
MVM
join:2001-09-01
El Dorado Hills, CA

pflog to MTB

MVM

to MTB
said by MTB:

The fact that Intel does not appear make hardware information availible for open source projects.
Huh? Their ethernet drivers are pretty much fully open source.

Cabal
Premium Member
join:2007-01-21

Cabal to MTB

Premium Member

to MTB
said by MTB:

The fact that Intel does not appear make hardware information availible for open source projects.
Not only are Intel's specs (ethernet, video, chipset) open, but they fund their OSS driver development.
MTB
join:2007-06-29
Newport Beach, CA

2 edits

MTB

Member

I will have to read up on the e1000e card, but here is how other projects go down "NO Hardware Doccumentation"
said by »ipw2200.sourceforge.net/ :
This project was created by Intel to enable support for the Intel PRO/Wireless 2915ABG Network Connection and Intel PRO/Wireless 2200BG Network Connection mini PCI adapters. This project (IPW2200) is intended to be a community effort as much as is possible given some working constraints (mainly, no HW documentation is available)

It should also be noted that the e1000 and e1000e may infact be the same. I could only find an e1000 project.
said by »https://bugs.launchpad.n ··· ug/42572 :
Citing Ben Collins from #256555:
"The 2.6.26 kernel and 2.6.27 kernel have the exact same e1000e driver (one which we downloaded from Intel's e1000 sf.net project)."

So, although this problem has been fixed since months (patch posted by an Intel employee in Oct 07, patch applied upstream Jan 08, released with Linux 2.6.25), it obviously hasn't been incorporated into the version of e1000e which was downloaded from sf.net and integrated into Ubuntu.

Why is Ubuntu not using the upstream version at all?

The point here is that this may be a bigger issue than it looks and there appears to be room for improvement by all parties involved.

I am not a driver expert, but it seems to me that they would be extremely hard to write w/o hardware doccumentation and hence random behavior should be expected.

jdong
Eat A Beaver, Save A Tree.
Premium Member
join:2002-07-09
Rochester, MI

jdong to pflog

Premium Member

to pflog
The project is called e1000 but the e1000e and e1000 are separate drivers.

The problem inherently is in e1000e, but it was not triggered by anything until CONFIG_DYNAMIC_FTRACE which contained a bug allowing memory writes to certain arbitrary locations accidentally -- this could've also introduced other kinds of nasty corruption too that goes away with a reboot (or, more scarily, if it went into page cache area corrupted files)...

pflog
Bueller? Bueller?
MVM
join:2001-09-01
El Dorado Hills, CA

pflog to MTB

MVM

to MTB
I found this page in about 10 seconds.

»www.intel.com/design/net ··· docs.htm

KodiacZiller
Premium Member
join:2008-09-04
73368

KodiacZiller

Premium Member

Apparently Gentoo's kernel devs have applied the patch to this "bug" in the 2.6.27 gentoo-sources. According to portage, the bug still exists but it will no longer damage the hardware.

jdong
Eat A Beaver, Save A Tree.
Premium Member
join:2002-07-09
Rochester, MI

jdong

Premium Member

said by KodiacZiller:

Apparently Gentoo's kernel devs have applied the patch to this "bug" in the 2.6.27 gentoo-sources. According to portage, the bug still exists but it will no longer damage the hardware.
Yes, the dynamic FTRACE bug is still present and the fix for that will be nontrivial but now that it's marked BROKEN competent kernel configurers won't be enabling it by accident.

In addition, the e1000e driver now locks the registers shortly after initializing so this shouldn't ever be a problem.