site Search:


 
   
story category
Level3 Suffers Brief But Massive Outage
Knocks ISPs Like Time Warner Cable Offline
by Karl Bode Monday 07-Nov-2011 tags: business · bandwidth · cable · trouble · consumers · Time Warner Cable
Tipped by battleop See Profile
As users in our Time Warner Cable forum can attest, Level3 appears to have suffered a massive outage today that knocked some ISPs, like Time Warner Cable, offline completely for a brief while. A Time Warner Cable employee has confirmed on Twitter that the company suffered from an outage that impacted "most of our service areas." Most users are back online after the five to ten minute nationwide network belch. Time Warner Cable has yet to give an official confirmation, but it appears to have been traced back to a problem at Level3, who gave this statement:

"Shortly after 9 am ET today, Level 3's network experienced several outages across North America and Europe relating to some of the routers on our network," the company said. "Our technicians worked quickly to bring systems back online. At this time, all connection issues have been resolved, and we are working hard with our equipment vendors to determine the exact cause of the outage and ensure all systems are stable."

Other reports claim that Juniper Networks routers choked on a BGP update.

view: topics flat text 
Post a:

whfsdude
Premium
join:2003-04-05
Washington, DC

TATA

Had problems with TATA as well during the same period.
dadarkside
Premium
join:2006-05-20
The Moon

Re: TATA

And TW Telecom and Global Crossing as well
iansltx

join:2007-02-19
Golden, CO
kudos:2

Re: TATA

So it could be a TWC problem?
dadarkside
Premium
join:2006-05-20
The Moon

Re: TATA

Well, Level3 crash dumped too, so I suppose that means it could be all TW Cables problem too.....

ok, all kidding aside.

MANY operators of networks had the exact same problem at the exact same time. TW Cable was caught up in it I guess.

In that sense, you could say that

skuv

@rr.com
TW Telecom is not TWC and is not even related anymore, not even by name. The TW in TW Telecom doesn't mean Time Warner anymore.
dadarkside
Premium
join:2006-05-20
The Moon

Re: TATA

I know, hope others do as well

whfsdude
Premium
join:2003-04-05
Washington, DC
Reviews:
·T-Mobile US
said by iansltx:

So it could be a TWC problem?

Anyone who uses MX Series Juniper routers and is running BGP. I feel like this whole story needs a complete rewrite to clarify the outage was a lot larger than just Level3 and TWC.
iansltx

join:2007-02-19
Golden, CO
kudos:2

Re: TATA

Yeah...

banditws6
Shrinking Time and Distance
Premium
join:2001-08-18
Frisco, TX
Reviews:
·RoadRunner Cable

So that's what happened

I've got Time Warner as my ISP at home...was wondering why I lost all connectivity right around 8 AM Central this morning. Guess that explains it!
--
"The counsel of fools is all the more dangerous the more of them there are." -Ólafr Höskuldsson

CptGemini
Inside your computer
Premium
join:2004-11-29
Corpus Christi, TX
kudos:6

Re: So that's what happened

Yea same here. I was sitting in the auction house in world of warcraft when after about 13 minutes of being in there wham-o got disconnected.

At least I still have my phone to tether to in times like that when the main line goes out.
dadarkside
Premium
join:2006-05-20
The Moon

Re: So that's what happened

Verizon's network was bitten by this too. Wireless carriers use Juniper routers in the core network too.

CptGemini
Inside your computer
Premium
join:2004-11-29
Corpus Christi, TX
kudos:6

Re: So that's what happened

That's really odd because I have t-mobile and my mobile web was still working, rather slowly but it was still working. It is usually fast though.

whfsdude
Premium
join:2003-04-05
Washington, DC
Reviews:
·T-Mobile US

Re: So that's what happened

said by dadarkside:

And TW Telecom and Global Crossing as well

Global Crossing is Level3.

said by CptGemini:

That's really odd because I have t-mobile and my mobile web was still working, rather slowly but it was still working. It is usually fast though.

Multi-homed and they obviously don't use Juniper routers at the edge.

Note that this only affected routers that do BGP.

said by syslock:

Test Lab? Staggered Updates? Geeesh....
Would have been the smart thing...

Lets wait till Monday morning at 8am on the
production network.

Uhh... This was a unkown bug in Juniper's implementation of BGP on certain routers than have been running production for years.

Posted today after the incident occurred.

View Bulletin PSN-2011-08-327
Title MX Series MPC crash in Ktree::createFourWayNode after BGP UPDATE
Products Affected This issue can affect any MX Series router with port concentrators based on the Trio chipset -- such as the MPC or embedded into the MX80 -- with active protocol-based route prefix additions/deletions occurring.
Platforms Affected
Security
JUNOS 11.x
MX-series
JUNOS 10.x
SIRT Security Advisory
SIRT Security Notice
Revision Number 1
Issue Date 2011-08-08

PSN Issue :
MPCs (Modular Port Concentrators) installed in an MX Series router may crash upon receipt of very specific and unlikely route prefix install/delete actions, such as a BGP routing update. The set of route prefix updates is non-deterministic and exceedingly unlikely to occur. Junos versions affected include 10.0, 10.1, 10.2, 10.3, 10.4 prior to 10.4R6, and 11.1 prior to 11.1R4. The trigger for the MPC crash was determined to be a valid BGP UPDATE received from a registered network service provider, although this one UPDATE was determined to not be solely responsible for the crashes. A complex sequence of preconditions is required to trigger this crash. Both IPv4 and IPv6 routing prefix updates can trigger this MPC crash.

There is no indication that this issue was triggered maliciously. Given the complexity of conditions required to trigger this issue, the probability of exploiting this defect is extremely low.

The assertions (crash) all occurred in the code used to store routing information, called Ktree, on the MPC. Due to the order and mix of adds and deletes to the tree, certain combinations of address adds and deletes can corrupt the data structures within the MPC, which in turn can cause this line card crash. The MPC recovers and returns to service quickly, and without operator intervention.

This issue only affects MX Series routers with port concentrators based on the Trio chipset, such as the MPC or embedded into the MX80. No other product or platform is vulnerable to this issue.

Solution:
The Ktree code has been updated and enhanced to ensure that combinations and permutations of routing updates will not corrupt the state of the line card. Extensive testing has been performed to validate an exceedingly large combination and permutation of route prefix additions and deletions.

All Junos OS software releases built on or after 2011-08-03 have fixed this specific issue. Releases containing the fix specifically include: 10.0S18, 10.4R6, 11.1R4, 11.2R1, and all subsequent releases (i.e. all releases built after 11.2R1).

This issue is being tracked as PR 610864. While this PR may not be viewable by customers, it can be used as a reference when discussing the issue with JTAC.

KB16765 - "In which releases are vulnerabilities fixed?" describes which release vulnerabilities are fixed as per our End of Engineering and End of Life support policies.

Workarounds
No known workaround exists for this issue.

dadarkside
Premium
join:2006-05-20
The Moon

Re: So that's what happened

Understood that L3 bought up GLBX recently, but GLBX is still a unique ASN and probably will be for some time yet. I doubt support teams and network maps have been integrated yet.

As such, it's still a unique organization from an internet perspective.

The Juniper announcement was expected, hadn't seen it yet though.

whfsdude
Premium
join:2003-04-05
Washington, DC
Reviews:
·T-Mobile US

Re: So that's what happened

said by dadarkside:

Understood that L3 bought up GLBX recently, but GLBX is still a unique ASN and probably will be for some time yet. I doubt support teams and network maps have been integrated yet.

+1 You're right!
dadarkside
Premium
join:2006-05-20
The Moon

Re: So that's what happened

It would be kind of satisfying in a Scheudenfreud kind of way if the whole cascade of router crashes was triggered because level3 was trying to integrate the GLBX routing table into it's own....

Look at this analysis of BGP RouteViews data collected just prior to the 9:05 am EDT crash began:

»mailman.nanog.org/pipermail/nano···691.html

Big spikes in BGP data flow between TATALevel3GLBX

whfsdude
Premium
join:2003-04-05
Washington, DC
Reviews:
·T-Mobile US

Re: So that's what happened

said by dadarkside:

It would be kind of satisfying in a Scheudenfreud kind of way if the whole cascade of router crashes was triggered because level3 was trying to integrate the GLBX routing table into it's own....

Wasn't implying that. I was originally implying that a Level3 routing outage was also a GBLX routing outage as well. But you're right in they haven't combined anything ops wise or network wise.

Sales and peering agreements have been combined.
dadarkside
Premium
join:2006-05-20
The Moon

Re: So that's what happened

Billing is always first to integrate. "SHOW ME THE MONEY!!!!!"

We still peer with AS3549

Duramax08
A Challenger Appears
Premium
join:2008-08-03
San Antonio, TX

ah yes

amazing how the internet works.
supergeeky

join:2003-05-09
United State
kudos:3

Re: ah yes

or how it doesn't work, as the case may be
BosstonesOwn

join:2002-12-15
Everett, MA
Reviews:
·Comcast
This was a concerted effort for all the governments to insert spy gear !!!

The interwebs are doomed I tell ya doomed.....

Suffered this outage today also but our backup provider did not thankfully.
--
"It's always funny until someone gets hurt......and then it's absolutely friggin' hysterical!"

KA3SGM
- -... ...- -
Premium
join:2006-01-17
West Chester, PA

Internet Kill Switch Test ???

The kill switch works!!!!

Yay!!!
--
ROCK 'TIL SUNSET
Wilsdom

join:2009-08-06

Modem needs a Level3 light

Couldn't tell what was wrong with my internets. Rebooted modem and router like a fool! Almost changed DNS servers...but couldn't Google alternatives!

tshirt
Premium,MVM
join:2004-07-11
Snohomish, WA
kudos:3
Reviews:
·Comcast

Re: Modem needs a Level3 light

said by Wilsdom:

Couldn't tell what was wrong with my internets. Rebooted modem and router like a fool! Almost changed DNS servers...but couldn't Google alternatives!

And now you know. Have that setup/printed out BEFORE this happens again. (not that that would probably help in this case, but backups of any kind require action BEFORE they are needed.)

skuv

@rr.com

Juniper Networks... not Jupiter...

Other reports claim that Jupiter Networks routers choked on a BGP update.


syslock
Premium
join:2007-02-03
Ann Arbor, MI
Reviews:
·Comcast

Re: Juniper Networks... not Jupiter...

said by skuv :

Other reports claim that Jupiter Networks routers choked on a BGP update.

Test Lab? Staggered Updates? Geeesh....
Would have been the smart thing...

Lets wait till Monday morning at 8am on the
production network.
brad

join:2007-09-06
Etobicoke, ON

Re: Juniper Networks... not Jupiter...

said by syslock:

Test Lab? Staggered Updates? Geeesh....
Would have been the smart thing...

Lets wait till Monday morning at 8am on the
production network.

There was no updating of router OS's on the affected networks.
Having a clue? Oh wait this is DSLR. Geeeeeshh..
acoustix

join:2004-01-30
Fort Dodge, IA
said by syslock:

said by skuv :

Other reports claim that Jupiter Networks routers choked on a BGP update.

Test Lab? Staggered Updates? Geeesh....
Would have been the smart thing...

Lets wait till Monday morning at 8am on the
production network.

You can't stagger BGP updates because they are basically routing tables. If you were to update some routers and not others you would create a routing loop - effectively taking down networks. BGP (and other routing protocols like OSPF, EIGRP, etc) all need to update at the same time.

It's also worth noting that the updates were valid. It's just that Juniper's routers had a bad implementation of BGP that caused them to crash. Other manufacturers' core routers did not suffer the same flaws.
dadarkside
Premium
join:2006-05-20
The Moon
BGP works on triggered updates. In other words, an update of the routing table is only sent when a change occurs. And the only update sent is relative to that change, the full routing table is rarely sent during normal BGP operation between peers.
tmc8080

join:2004-04-24
Brooklyn, NY
Reviews:
·Optimum Online
·Verizon FiOS

l3 routes?

I didn't notice a thing.. except maybe a website not routing for a few mins. These anomolies can usually be explained by new antivirus software website blocks or other phenomena. Suffice to say I wasn't annoyed by being offline. My routing was working around this time (I think).

The days of leaving my PC on 24/7/365 are largely over... I might have my tablet on 24/7 some days. That doesn't chew 120-750 watts of power.. probably less than 25 under heavy usage.

Level 3 is NOT the entire internet and Verizon has redundancies. It's probably more telling that TWC does not.

whfsdude
Premium
join:2003-04-05
Washington, DC
Reviews:
·T-Mobile US

Re: l3 routes?

said by tmc8080:

Level 3 is NOT the entire internet and Verizon has redundancies. It's probably more telling that TWC does not.

They do have redundancies. ( »bgp.he.net/AS7843 ) Most likely is that they were using Juniper routers at the border.

t3ln3t

@clearwire-wmx.net

slight chuckle

This made me chuckle, just a bit.

Sunday, 03-Jun 17:36:18 Terms of Use & Privacy | feedback | contact | Hosting by nac.net - DSL,Hosting & Co-lo
over 12.5 years online © 1999-2012 dslreports.com.