Tell me more x
, there is a new speed test available. Give it a try, leave feedback!
dslreports logo
 
    All Forums Hot Topics Gallery
spc

spacer

Search Topic:
uniqs
1480
share rss forum feed


sk1939
Premium
join:2010-10-23
Mclean, VA
kudos:10
Reviews:
·T-Mobile US
·Verizon FiOS

[H/W] 3750G Lockup

So here's a question, has anyone ever experienced a 3750G lockup simply due to the sheer amount of traffic in terms of simultanous connections?

Experienced that today (those late night phone calls you oh-so-hate) when this particular 3750G got overwhelmed due to the number of TCP connections flowing through it (best explanation I can come up with). The switch logs indicate nothing out of the ordinary, configuration is basic with just VLANs, no routing, and throws no error messages. It's not a thermal issue (that's obvious at least) switch is clean, and fans are functioning fine. However, it's done this now twice in the past week, and it basically just completely locks up (all the connected port lights are on but solid, console inaccessible, SSH inaccessible). I'm inclined to say that it's dying, but at the same time I don't want to order a new one until I can get confirmation of this.

aryoba
Premium,MVM
join:2002-08-22
kudos:6
I assume you opened up TAC case already?


RyanG1
Premium
join:2002-02-10
San Antonio, TX
kudos:1
reply to sk1939
If you are not using this for routing and just a plain layer 2 switch then i doubt it has anything to do with TCP connections. Could be some bug in the IOS that you are running on it but im kinda leaning towards hardware

I have a 3750G-24T thats running Cisco IOS Software, C3750 Software (C3750-ADVIPSERVICESK9-M), Version 12.2(46)SE and pushing a large amount of packets and a high number of concurrent connections without issue.

If theres nothing in the logs or no traps being sent leading up to the lockup my guess would be hardware but i agree with aryoba, if its under contract get TAC on the horn.

Ryan
--
Human beings, who are almost unique in having the ability to learn from the experience of others, are also remarkable for their apparent disinclination to do so. -Douglas Adams


tubbynet
reminds me of the danse russe
Premium,MVM
join:2008-01-16
Chandler, AZ
kudos:1
reply to sk1939
said by sk1939:

So here's a question, has anyone ever experienced a 3750G lockup simply due to the sheer amount of traffic in terms of simultanous connections?

c3k is hardware based platform with limited support for software type functions (i.e. think "punting"). however, vanilla ipv4 functions (switching and routing) occur in the hardware fast-path. whether i push 1pps or 1mpps, the switch won't have any larger "load" if everything is performed in the asics (and you'd *know* if it wasn't).

i'd look into open bugs on the software or possible faulty memory modules. look at overall mem utilization to see if it grows over time due to a leaky process or if it hits a bad bank and crashes the box.

q.
--
"...if I in my north room dance naked, grotesquely before my mirror waving my shirt round my head and singing softly to myself..."

cramer
Premium
join:2007-04-10
Raleigh, NC
kudos:9
reply to sk1939
I concur... this is a "call Cisco" problem.


sk1939
Premium
join:2010-10-23
Mclean, VA
kudos:10
Reviews:
·T-Mobile US
·Verizon FiOS
reply to RyanG1
That's nothing in the logs, and the traps indicate the connection has been lost, but that's to be expected. Unfortunately, this particular switch does not have a service contract, it was deemed "not cost effective" (go figure).

Also, I found that two of the connecting servers also had NICs that locked up at the same time (would not reset in Windows/disable-enable), but I have no idea if there is a correlation or not.


vipergg22

@frontiernet.net
reply to sk1939
What kind of traffic are we talking ? Is it really busy ? These switches really have pretty good thruput specially if just using it as a l2 switch . Look at the utilization using the show controllers utilization command to see how busy it is during the day . Don't know how old your ios is but they are up to like 12.2.55SE or something like that now .


Da Geek Kid

join:2003-10-11
::1
kudos:1
Reviews:
·Callcentric
reply to sk1939
this is IOS bug... I have played with over 500+ 3750, 3750G, 3750-X to know that does not happen just on sheer data push unless some other feature is enabled that strains the Processor i.e. ACL etc..

also, you can always police the COPP for SSH/Telnet to see WTF!

cramer
Premium
join:2007-04-10
Raleigh, NC
kudos:9
lockups can be due to hardware malfunctions (or "bugs" [aka. a design fault], but those tend to hit more than one person.)

HELLFIRE
Premium
join:2009-11-25
kudos:21
reply to sk1939
Got any performance data -- CPU or memory util -- up until the device went down sk1939?

What version of code?

3750 platform is DEFINATELY not what I'd call 'designed for performance anything,' yet I see them put into
collapsed core setups and set to run BGP and OSPF. If you're doing this, make D**N SURE you've set the
SDM template to desktop routing, or watch your network fall apart.

Otherwise this is gonna be a bit of a shot in the dark with no service contract. My 00000010 bits.

Regards


sk1939
Premium
join:2010-10-23
Mclean, VA
kudos:10
Reviews:
·T-Mobile US
·Verizon FiOS
I have CPU data, was around 60% or so, but started to go up once the device went down.

12.2(35)SE5 C3750-IPBASE-M

I've seen the collapsed core running off of a 3750 as well, I've even seen it run off of a 2950 (cringe). I'd prefer to stick something beefier in there like a Nexus 5k, 4503E, or at least a 3750X so we're working with something more than single gig links.

As far as traffic goes, the VLAN for web-facing traffic hits about 150Mbps (it's packet-shaped and load balanced) total, but from this device it's about half that at most. Total traffic from the device saturates a gig link (which is all there is back to the core).


Da Geek Kid

join:2003-10-11
::1
kudos:1
Oh, I see you answered yourself, your own question. You cap yourself with packet shaping AND load balancing. OK Nice info. I am sure you could reduce it below 150Mbps...


TomS_
Git-r-done
Premium,MVM
join:2002-07-19
London, UK
kudos:5
reply to sk1939
Just curious, but its not due to a loop is it?

Lights going solid, loss of forwarding, sounds like it could be a loop to me as Ive seen similar behaviour when Ive created loops on Cisco switches before. But of course this would only be an issue while the loop is in place, and should resume to normal operation once it has been removed.

But make sure spanning tree is running, and configure portfast for ports that connect to PCs and servers and the likes so they come up straight away (i.e. no delay when a cable is plugged in, which can cause issues obtaining a DHCP lease.)

If a loop is created somehow, then STP will stop it.

But as others have indicated, 3750's are pretty beefy boxes when it comes to forwarding performance, so I'd be surprised if its locking up due to excessive traffic.

Also try a more recent IOS. Thats the usual first step when you suspect a bug.


sk1939
Premium
join:2010-10-23
Mclean, VA
kudos:10
Reviews:
·T-Mobile US
·Verizon FiOS
said by TomS_:

Just curious, but its not due to a loop is it?

Lights going solid, loss of forwarding, sounds like it could be a loop to me as Ive seen similar behaviour when Ive created loops on Cisco switches before. But of course this would only be an issue while the loop is in place, and should resume to normal operation once it has been removed.

But make sure spanning tree is running, and configure portfast for ports that connect to PCs and servers and the likes so they come up straight away (i.e. no delay when a cable is plugged in, which can cause issues obtaining a DHCP lease.)

If a loop is created somehow, then STP will stop it.

But as others have indicated, 3750's are pretty beefy boxes when it comes to forwarding performance, so I'd be surprised if its locking up due to excessive traffic.

Also try a more recent IOS. Thats the usual first step when you suspect a bug.

I don't believe so since both the core and switch have spanning-tree running.

That's what I thought as well, but span-tree is running and disconnecting all but one link dosen't seem to solve the issue.

Portfast is enabled on interfaces fa0/1-32, while the uplinks are fa0/35 & fa0/47 going to one core, and fa0/36 and fa0/48 going to the other.

I'm guessing it was due to excessive traffic, but it could very well be an IOS bug. Updated the IOS to the latest version last night, seems to be running smoothly so far.

HELLFIRE
Premium
join:2009-11-25
kudos:21

1 edit
reply to sk1939
@sk1939
Try 2x 2950s with 14 WAN connections shared between them -- forget the pipe size and circuit types offhand
-- and multiple TAC cases opened to determine "reasons for buffer failures from 'show buffer' output."

Now THAT was cringeworthy... or this

What version of code did you move up to? Definately keep us posted on how it goes.

Regards


sk1939
Premium
join:2010-10-23
Mclean, VA
kudos:10
Reviews:
·T-Mobile US
·Verizon FiOS
said by HELLFIRE:

@sk1939
Try 2x 2950s with 14 WAN connections shared between them -- -- and multiple TAC cases opened to determine "reasons for buffer failures from 'show buffer' output."

Gee imagine that.


c3750-ipbasek9-mz.122-55.SE5 since we didn't have a need for ipservices.


sk1939
Premium
join:2010-10-23
Mclean, VA
kudos:10
reply to sk1939
Since we upgraded the IOS it hasn't crashed as of late, so I'm attributing it to a firmware bug rather than a hardware issue. Thanks all.