dslreports logo
 
    All Forums Hot Topics Gallery
spc
Search similar:


uniqs
2054

sk1939
Premium Member
join:2010-10-23
Frederick, MD
ARRIS SB8200
Ubiquiti UDM-Pro
Juniper SRX320

sk1939

Premium Member

[H/W] 3750G Lockup

So here's a question, has anyone ever experienced a 3750G lockup simply due to the sheer amount of traffic in terms of simultanous connections?

Experienced that today (those late night phone calls you oh-so-hate) when this particular 3750G got overwhelmed due to the number of TCP connections flowing through it (best explanation I can come up with). The switch logs indicate nothing out of the ordinary, configuration is basic with just VLANs, no routing, and throws no error messages. It's not a thermal issue (that's obvious at least) switch is clean, and fans are functioning fine. However, it's done this now twice in the past week, and it basically just completely locks up (all the connected port lights are on but solid, console inaccessible, SSH inaccessible). I'm inclined to say that it's dying, but at the same time I don't want to order a new one until I can get confirmation of this.
aryoba
MVM
join:2002-08-22

aryoba

MVM

I assume you opened up TAC case already?

RyanG1
Premium Member
join:2002-02-10
San Antonio, TX

RyanG1 to sk1939

Premium Member

to sk1939
If you are not using this for routing and just a plain layer 2 switch then i doubt it has anything to do with TCP connections. Could be some bug in the IOS that you are running on it but im kinda leaning towards hardware

I have a 3750G-24T thats running Cisco IOS Software, C3750 Software (C3750-ADVIPSERVICESK9-M), Version 12.2(46)SE and pushing a large amount of packets and a high number of concurrent connections without issue.

If theres nothing in the logs or no traps being sent leading up to the lockup my guess would be hardware but i agree with aryoba, if its under contract get TAC on the horn.

Ryan

tubbynet
reminds me of the danse russe
MVM
join:2008-01-16
Gilbert, AZ

tubbynet to sk1939

MVM

to sk1939
said by sk1939:

So here's a question, has anyone ever experienced a 3750G lockup simply due to the sheer amount of traffic in terms of simultanous connections?

c3k is hardware based platform with limited support for software type functions (i.e. think "punting"). however, vanilla ipv4 functions (switching and routing) occur in the hardware fast-path. whether i push 1pps or 1mpps, the switch won't have any larger "load" if everything is performed in the asics (and you'd *know* if it wasn't).

i'd look into open bugs on the software or possible faulty memory modules. look at overall mem utilization to see if it grows over time due to a leaky process or if it hits a bad bank and crashes the box.

q.
cramer
Premium Member
join:2007-04-10
Raleigh, NC

cramer to sk1939

Premium Member

to sk1939
I concur... this is a "call Cisco" problem.

sk1939
Premium Member
join:2010-10-23
Frederick, MD
ARRIS SB8200
Ubiquiti UDM-Pro
Juniper SRX320

sk1939 to RyanG1

Premium Member

to RyanG1
That's nothing in the logs, and the traps indicate the connection has been lost, but that's to be expected. Unfortunately, this particular switch does not have a service contract, it was deemed "not cost effective" (go figure).

Also, I found that two of the connecting servers also had NICs that locked up at the same time (would not reset in Windows/disable-enable), but I have no idea if there is a correlation or not.

vipergg22
@frontiernet.net

vipergg22 to sk1939

Anon

to sk1939
What kind of traffic are we talking ? Is it really busy ? These switches really have pretty good thruput specially if just using it as a l2 switch . Look at the utilization using the show controllers utilization command to see how busy it is during the day . Don't know how old your ios is but they are up to like 12.2.55SE or something like that now .

Da Geek Kid
join:2003-10-11
::1

Da Geek Kid to sk1939

Member

to sk1939
this is IOS bug... I have played with over 500+ 3750, 3750G, 3750-X to know that does not happen just on sheer data push unless some other feature is enabled that strains the Processor i.e. ACL etc..

also, you can always police the COPP for SSH/Telnet to see WTF!
cramer
Premium Member
join:2007-04-10
Raleigh, NC

cramer

Premium Member

lockups can be due to hardware malfunctions (or "bugs" [aka. a design fault], but those tend to hit more than one person.)
HELLFIRE
MVM
join:2009-11-25

HELLFIRE to sk1939

MVM

to sk1939
Got any performance data -- CPU or memory util -- up until the device went down sk1939?

What version of code?

3750 platform is DEFINATELY not what I'd call 'designed for performance anything,' yet I see them put into
collapsed core setups and set to run BGP and OSPF. If you're doing this, make D**N SURE you've set the
SDM template to desktop routing, or watch your network fall apart.

Otherwise this is gonna be a bit of a shot in the dark with no service contract. My 00000010 bits.

Regards

sk1939
Premium Member
join:2010-10-23
Frederick, MD
ARRIS SB8200
Ubiquiti UDM-Pro
Juniper SRX320

sk1939

Premium Member

I have CPU data, was around 60% or so, but started to go up once the device went down.

12.2(35)SE5 C3750-IPBASE-M

I've seen the collapsed core running off of a 3750 as well, I've even seen it run off of a 2950 (cringe). I'd prefer to stick something beefier in there like a Nexus 5k, 4503E, or at least a 3750X so we're working with something more than single gig links.

As far as traffic goes, the VLAN for web-facing traffic hits about 150Mbps (it's packet-shaped and load balanced) total, but from this device it's about half that at most. Total traffic from the device saturates a gig link (which is all there is back to the core).

Da Geek Kid
join:2003-10-11
::1

Da Geek Kid

Member

Oh, I see you answered yourself, your own question. You cap yourself with packet shaping AND load balancing. OK Nice info. I am sure you could reduce it below 150Mbps...

TomS_
Git-r-done
MVM
join:2002-07-19
London, UK

TomS_ to sk1939

MVM

to sk1939
Just curious, but its not due to a loop is it?

Lights going solid, loss of forwarding, sounds like it could be a loop to me as Ive seen similar behaviour when Ive created loops on Cisco switches before. But of course this would only be an issue while the loop is in place, and should resume to normal operation once it has been removed.

But make sure spanning tree is running, and configure portfast for ports that connect to PCs and servers and the likes so they come up straight away (i.e. no delay when a cable is plugged in, which can cause issues obtaining a DHCP lease.)

If a loop is created somehow, then STP will stop it.

But as others have indicated, 3750's are pretty beefy boxes when it comes to forwarding performance, so I'd be surprised if its locking up due to excessive traffic.

Also try a more recent IOS. Thats the usual first step when you suspect a bug.

sk1939
Premium Member
join:2010-10-23
Frederick, MD
ARRIS SB8200
Ubiquiti UDM-Pro
Juniper SRX320

sk1939

Premium Member

said by TomS_:

Just curious, but its not due to a loop is it?

Lights going solid, loss of forwarding, sounds like it could be a loop to me as Ive seen similar behaviour when Ive created loops on Cisco switches before. But of course this would only be an issue while the loop is in place, and should resume to normal operation once it has been removed.

But make sure spanning tree is running, and configure portfast for ports that connect to PCs and servers and the likes so they come up straight away (i.e. no delay when a cable is plugged in, which can cause issues obtaining a DHCP lease.)

If a loop is created somehow, then STP will stop it.

But as others have indicated, 3750's are pretty beefy boxes when it comes to forwarding performance, so I'd be surprised if its locking up due to excessive traffic.

Also try a more recent IOS. Thats the usual first step when you suspect a bug.

I don't believe so since both the core and switch have spanning-tree running.

That's what I thought as well, but span-tree is running and disconnecting all but one link dosen't seem to solve the issue.

Portfast is enabled on interfaces fa0/1-32, while the uplinks are fa0/35 & fa0/47 going to one core, and fa0/36 and fa0/48 going to the other.

I'm guessing it was due to excessive traffic, but it could very well be an IOS bug. Updated the IOS to the latest version last night, seems to be running smoothly so far.
HELLFIRE
MVM
join:2009-11-25

1 edit

HELLFIRE to sk1939

MVM

to sk1939
@sk1939
Try 2x 2950s with 14 WAN connections shared between them -- forget the pipe size and circuit types offhand
-- and multiple TAC cases opened to determine "reasons for buffer failures from 'show buffer' output."

Now THAT was cringeworthy... or this

What version of code did you move up to? Definately keep us posted on how it goes.

Regards

sk1939
Premium Member
join:2010-10-23
Frederick, MD
ARRIS SB8200
Ubiquiti UDM-Pro
Juniper SRX320

sk1939

Premium Member

said by HELLFIRE:

@sk1939
Try 2x 2950s with 14 WAN connections shared between them -- -- and multiple TAC cases opened to determine "reasons for buffer failures from 'show buffer' output."

Gee imagine that.


c3750-ipbasek9-mz.122-55.SE5 since we didn't have a need for ipservices.
sk1939

sk1939

Premium Member

Since we upgraded the IOS it hasn't crashed as of late, so I'm attributing it to a firmware bug rather than a hardware issue. Thanks all.