dslreports logo
site
 
    All Forums Hot Topics Gallery
spc

spacer




how-to block ads


Search Topic:
uniqs
1427
share rss forum feed

JoelC707
Premium
join:2002-07-09
Lanett, AL
kudos:5

Dell PowerConnect 2748 Issue

Click for full size
Click for full size
Click for full size
Click for full size
Click for full size
Click for full size
I have a Dell PowerConnect 2748 switch that was bought new in 2007 (warranty ended in 2008). It lived it's life tucked away in a server rack in an air conditioned room kept at 70ish 24/7. In 2010 it quit passing traffic and I swapped it out with a spare I had. I've been playing with it off and on ever since.

Here's what it does. If you let it sit OFF and unplugged for 4+ hours it will boot just fine. If you warm boot it or don't let it sit for 4+ hours what it will do is turn on but the "fan" and "managed" LEDs will flash green indefinitely. While it's doing this, the ports will indicate a link but no traffic will pass and I can't get to the management GUI.

According to the manual the flashing managed LED indicated a boot/flash problem but that it should be flashing amber, not green (and I've gotten others to verify it's in fact green). There's NO indication of what a flashing fan LED means.

The fact it works fine on cold boot (and will work fine for as long as you can keep power on it) or 4+ hours after a power outage seems to indicate a PSU problem. The time makes me think it's got to drain a capacitor or something though I should indicate I'm not really an EE so I may not know what I'm talking about LOL.

Markings indicate it's a 12V only PSU so I got my trusty meter out (Fluke 77 IV) and did some minor poking around. Each of the three red wires shows 11.96V so that appears to be good (a little low maybe but still probably within tolerance?). Working off the capacitor idea I unplugged it with the meter still attached and waited until it got down to 0.002V and plugged it back in but it still refused to boot. Maybe I needed to wait for 0.000V but it took a good little while to get to 0.002V and impatience got to me lol.

I see no obviously burned components or markings on the top side of the board but I have not pulled the board from the chassis yet. Attached are several pics of the PSU and the board.

TheMG
Premium
join:2007-09-04
Canada
kudos:3
Reviews:
·NorthWest Tel
If I had to guess I'd say you're probably dealing with a bad solder joint somewhere. Possibly around wherever the boot EEPROM chip is located if it keeps working fine when it has successfully booted.

Once the switch is warm, chuck it in the fridge for about 15 minutes then try to boot it. If it boots up, then that confirms a thermally intermittent failure of most likely a solder joint.

Freeze spray can help narrow down the location of the fault further.

JoelC707
Premium
join:2002-07-09
Lanett, AL
kudos:5
Interesting thought. That's likely the case but unfortunately I haven't a clue what chip that would be. Any ideas? I know there's only so much that can be done from afar.


leibold
Premium,MVM
join:2002-07-09
Sunnyvale, CA
kudos:10
Reviews:
·SONIC.NET
Thermal troubleshooting can be done with the above mentioned freeze spray (e.g.: Kälte-75 from Kontakt Chemie) and hair dryer.

Start the device and wait for failure symptoms. Start spraying each likely component with a short blast of freeze spray until device works again (wait for 1 minute after each blast). Be sure to double and triple check when you think you identified the correct location due to possibly delayed response to heat and cold (alternately heat and cool the suspect part).

The hair dryer (on low/medium) can also be used to accelerate the failure mode (for the impatient or when it just takes too long to fail on its own). Just don't overdo it or you successfully find a brand new damaged part
--
Got some spare cpu cycles ? Join Team Helix or Team Starfire!


SmokChsr
Who let the magic smoke out?
Premium
join:2006-03-17
Saint Augustine, FL
reply to JoelC707
I don't know, from the described symptoms, I would tend to think it's more of a CMOS memory problem getting ambiguous data until the charge drop to a point where it does a "cold boot".

This just doesn't strike me as a thermal failure.

I don't see a battery, but perhaps it's cap maintained.


alphapointe
Don't Touch Me
Premium,MVM
join:2002-02-10
Columbia, MO
kudos:2
Reviews:
·Socket Internet ..
reply to JoelC707
It may just be the image, but it looks like the big filter cap in the first image is bulging. If I had it, I'd put a scope on the power supply and see if there's too much ripple...

If it is that cap, as it gets warm, the ripple would get worse, until something gets pissed off and crashes.

Just my 10 bits...
--
"When the hammer drops, the bullshit stops"


mackey
Premium
join:2007-08-20
kudos:13
reply to SmokChsr
I doubt there's CMOS memory on that board since there's a nice large chunk of flash memory there.

/M


mackey
Premium
join:2007-08-20
kudos:13
reply to JoelC707
said by JoelC707:

That's likely the case but unfortunately I haven't a clue what chip that would be. Any ideas?

They're actually easy to spot once you know what to look for. The flash chip is the one with the white dot in the top left corner in that last pic. The 2 "Samsung" chips right below it are the RAM.

Even if it is a bad solder joint causing it to not read the flash correctly, the bad joint could also be on the CPU itself or on a passive.

/M


SmokChsr
Who let the magic smoke out?
Premium
join:2006-03-17
Saint Augustine, FL
reply to mackey
said by mackey:

I doubt there's CMOS memory on that board since there's a nice large chunk of flash memory there.

Either way, this sounds more like bad data, than a heat problem to me.


leibold
Premium,MVM
join:2002-07-09
Sunnyvale, CA
kudos:10
Reviews:
·SONIC.NET
said by SmokChsr:

Either way, this sounds more like bad data, than a heat problem to me.

Of course, one (heat) is sometimes the cause of the other (bad data). For example dynamic memory cells are "losing it" when it gets too warm (leak currents increase).
--
Got some spare cpu cycles ? Join Team Helix or Team Starfire!


SmokChsr
Who let the magic smoke out?
Premium
join:2006-03-17
Saint Augustine, FL
said by leibold:

Of course, one (heat) is sometimes the cause of the other (bad data).

Quite true, but doesn't seem so in this case. 4 hours is quite a bit longer that it would need to cool. Also that it runs indefinitely (until the next power glitch) once it correctly boots, just doesn't even sound like a thermal trouble.

For trouble shooting purposes I would plug it in, unplug it and check all the caps rather quickly and see which ones were still charged, each time I found a charged cap I'd discharge it and see if it would boot normally. When I find the one that allows it to boot normally it would get a bleeder resistor.

lutful
... of ideas
Premium
join:2005-06-16
Ottawa, ON
kudos:1
reply to JoelC707
The Altera MAX CPLD, which probably handles the mangement, also has a built-in temperature sensor which takes some time to settle.

My hunch is that some firmware routine is trying to read it too quickly after rebooting causing the fan+management LED warnings.

JoelC707
Premium
join:2002-07-09
Lanett, AL
kudos:5
I had a similar thought. I didn't get into the temp sensor level of it but I just thought it might be a firmware issue, as in maybe something corrupt or such. I did re-flash both image 1 and image 2, and I also tried booting from image 2 but no dice. Sadly there's no newer release to correct a potential firmware bug.

So here's where I stand on the power supply front. My soon to be father-in-law has a small handheld oscilloscope and I got him to probe the output of the power supply looking for ripple. He hooked up to the positive side of the output and saw some ripple and was trying to refine the scale on the display to get a better look at it but was unable to.

Then he hooked up to the negative side of the output and blew the fuse on the input of the power supply. He thinks he has some around here and we're currently hunting them but for now, it appears ripple may be the cause? Won't really know for sure until we get the fuse replaced but it's something.


shdesigns
Powered By Infinite Improbabilty Drive
Premium
join:2000-12-01
Stone Mountain, GA
Reviews:
·EarthLink
·Comcast
·Atlantic Nexus
reply to JoelC707
What was he trying to measure when he blew the fuse? Sounds like he was trying to see the voltage on the input cap.

I very much doubt it is a power supply issue. Or, least it wasn't

JoelC707
Premium
join:2002-07-09
Lanett, AL
kudos:5
To be honest I'm not sure lol. I have found the fuses online, though every supplier via Google was shipping from China (though I didn't check places like Mouser). I may have a source on the entire PSU if it comes to it (and depending on the cost of the fuse, such as if I have to buy a bunch instead of just one) which ideally should also take care of any other PSU issues too.

lutful
... of ideas
Premium
join:2005-06-16
Ottawa, ON
kudos:1
You can use a generic 12V power supply and mate to the existing connector. Just check the amp rating from Dell's label or manual.

JoelC707
Premium
join:2002-07-09
Lanett, AL
kudos:5
It's a Delta ADP-80GP B power supply (not sure what the B stands for cause it's not the revision). It's 12V 6.66A and I haven't come across one with that much amperage in my box-o-power supplies yet (I had the same thought lol).

lutful
... of ideas
Premium
join:2005-06-16
Ottawa, ON
kudos:1
Digikey/Mouser/etc. sells equivalent AC/DC power supplies. You could also use very inexpensive 12V LED power supplies from Amazon which supply 8A or 10A.

JoelC707
Premium
join:2002-07-09
Lanett, AL
kudos:5
The PSU I have utilizes three 12V pins on the output, but they appear to be all tied together underneath (though those three capacitors at the end are all on the output). That's still a single rail/output PSU right, and not a 3 rail/output?


SmokChsr
Who let the magic smoke out?
Premium
join:2006-03-17
Saint Augustine, FL
reply to lutful
said by lutful:

My hunch is that some firmware routine is trying to read it too quickly after rebooting causing the fan+management LED warnings.

I wouldn't get locked into what the warning lights say they are, since quite often the POST flash codes will have nothing to do with the labeling of the lights.


SmokChsr
Who let the magic smoke out?
Premium
join:2006-03-17
Saint Augustine, FL
reply to JoelC707
I just looked at the manual and it does give an indication for a green flashing Managed LED..but it gives nothing for a flashing fan..

lutful
... of ideas
Premium
join:2005-06-16
Ottawa, ON
kudos:1
I was part of various ASIC design teams for switches and routers and used to interact with firmware guys.

Fan LED flashing green usually means that the fans are OK but firmware could not start them running for some reason.

JoelC707
Premium
join:2002-07-09
Lanett, AL
kudos:5
They're definitely running when it does this. Maybe that the firmware couldn't read their RPM (in this case)?

Given the green flashing management LED indicates diagnostics/booting in progress, but it will continue like that indefinitely, I wonder if the LEDs can take on a different meaning other than what's documented? Of course to find this out means talking to someone who designed the firmware or someone who knows how to code it and can look at it and maybe find the answer.

lutful
... of ideas
Premium
join:2005-06-16
Ottawa, ON
kudos:1
OK, based on fan running, this is my best guess ...

a) firmware reads temp higher than threshold during warm boots,
probably just takes longer to give stable and accurate readings due to old age or previous episodes of high temp operation.

b) firmware tries to verify fan(s) are running but can't, probably because some fan related routines have not initialized yet.

c) firmware does not proceed to next step.

During cold boots, first temp reading is below threshold and firmware does not attempt to verify fan until all routines have initialized. So when temp does get hotter, even if it is inaccurate, firmware can actually verify that fans are running at necessary RPM.


mackey
Premium
join:2007-08-20
kudos:13
reply to JoelC707
That 4-pin header in the top right looks like a (TTL level) serial port. Have you tried hooking into it to look for boot/debug messages?

/M