 whfsdudePremium join:2003-04-05 Washington, DC | TATA Had problems with TATA as well during the same period. | |
|
 |  | | Re: TATA And TW Telecom and Global Crossing as well | |
|
 |  |  iansltx join:2007-02-19 Golden, CO kudos:2 | Re: TATA So it could be a TWC problem? | |
|
 |  |  |  | | Re: TATA Well, Level3 crash dumped too, so I suppose that means it could be all TW Cables problem too.....
ok, all kidding aside.
MANY operators of networks had the exact same problem at the exact same time. TW Cable was caught up in it I guess.
In that sense, you could say that | |
|
 |  |  |  | | TW Telecom is not TWC and is not even related anymore, not even by name. The TW in TW Telecom doesn't mean Time Warner anymore. | |
|
 |  |  |  |  | | Re: TATA I know, hope others do as well | |
|
 |  |  |  whfsdudePremium join:2003-04-05 Washington, DC Reviews:
·T-Mobile US
| said by iansltx:So it could be a TWC problem? Anyone who uses MX Series Juniper routers and is running BGP. I feel like this whole story needs a complete rewrite to clarify the outage was a lot larger than just Level3 and TWC. | |
|
 |  |  |  |  iansltx join:2007-02-19 Golden, CO kudos:2 | Re: TATA Yeah... | |
|
 banditws6Shrinking Time and DistancePremium join:2001-08-18 Frisco, TX Reviews:
·RoadRunner Cable
| So that's what happened I've got Time Warner as my ISP at home...was wondering why I lost all connectivity right around 8 AM Central this morning. Guess that explains it! -- "The counsel of fools is all the more dangerous the more of them there are." -Ólafr Höskuldsson | |
|
 |  CptGeminiInside your computerPremium join:2004-11-29 Corpus Christi, TX kudos:6 | Re: So that's what happened Yea same here. I was sitting in the auction house in world of warcraft when after about 13 minutes of being in there wham-o got disconnected.
At least I still have my phone to tether to in times like that when the main line goes out. | |
|
 |  |  | | Re: So that's what happened Verizon's network was bitten by this too. Wireless carriers use Juniper routers in the core network too. | |
|
 |  |  |  CptGeminiInside your computerPremium join:2004-11-29 Corpus Christi, TX kudos:6 | Re: So that's what happened That's really odd because I have t-mobile and my mobile web was still working, rather slowly but it was still working. It is usually fast though. | |
|
 |  |  |  |  whfsdudePremium join:2003-04-05 Washington, DC Reviews:
·T-Mobile US
| Re: So that's what happened said by dadarkside:And TW Telecom and Global Crossing as well Global Crossing is Level3.
said by CptGemini:That's really odd because I have t-mobile and my mobile web was still working, rather slowly but it was still working. It is usually fast though. Multi-homed and they obviously don't use Juniper routers at the edge.
Note that this only affected routers that do BGP.
said by syslock:Test Lab? Staggered Updates? Geeesh.... Would have been the smart thing...
Lets wait till Monday morning at 8am on the production network. Uhh... This was a unkown bug in Juniper's implementation of BGP on certain routers than have been running production for years.
Posted today after the incident occurred.
View Bulletin PSN-2011-08-327 Title MX Series MPC crash in Ktree::createFourWayNode after BGP UPDATE Products Affected This issue can affect any MX Series router with port concentrators based on the Trio chipset -- such as the MPC or embedded into the MX80 -- with active protocol-based route prefix additions/deletions occurring. Platforms Affected Security JUNOS 11.x MX-series JUNOS 10.x SIRT Security Advisory SIRT Security Notice Revision Number 1 Issue Date 2011-08-08
PSN Issue : MPCs (Modular Port Concentrators) installed in an MX Series router may crash upon receipt of very specific and unlikely route prefix install/delete actions, such as a BGP routing update. The set of route prefix updates is non-deterministic and exceedingly unlikely to occur. Junos versions affected include 10.0, 10.1, 10.2, 10.3, 10.4 prior to 10.4R6, and 11.1 prior to 11.1R4. The trigger for the MPC crash was determined to be a valid BGP UPDATE received from a registered network service provider, although this one UPDATE was determined to not be solely responsible for the crashes. A complex sequence of preconditions is required to trigger this crash. Both IPv4 and IPv6 routing prefix updates can trigger this MPC crash.
There is no indication that this issue was triggered maliciously. Given the complexity of conditions required to trigger this issue, the probability of exploiting this defect is extremely low.
The assertions (crash) all occurred in the code used to store routing information, called Ktree, on the MPC. Due to the order and mix of adds and deletes to the tree, certain combinations of address adds and deletes can corrupt the data structures within the MPC, which in turn can cause this line card crash. The MPC recovers and returns to service quickly, and without operator intervention.
This issue only affects MX Series routers with port concentrators based on the Trio chipset, such as the MPC or embedded into the MX80. No other product or platform is vulnerable to this issue.
Solution: The Ktree code has been updated and enhanced to ensure that combinations and permutations of routing updates will not corrupt the state of the line card. Extensive testing has been performed to validate an exceedingly large combination and permutation of route prefix additions and deletions.
All Junos OS software releases built on or after 2011-08-03 have fixed this specific issue. Releases containing the fix specifically include: 10.0S18, 10.4R6, 11.1R4, 11.2R1, and all subsequent releases (i.e. all releases built after 11.2R1).
This issue is being tracked as PR 610864. While this PR may not be viewable by customers, it can be used as a reference when discussing the issue with JTAC.
KB16765 - "In which releases are vulnerabilities fixed?" describes which release vulnerabilities are fixed as per our End of Engineering and End of Life support policies.
Workarounds No known workaround exists for this issue.
| |
|
 |  |  |  |  |  | | Re: So that's what happened Understood that L3 bought up GLBX recently, but GLBX is still a unique ASN and probably will be for some time yet. I doubt support teams and network maps have been integrated yet.
As such, it's still a unique organization from an internet perspective.
The Juniper announcement was expected, hadn't seen it yet though. | |
|
 |  |  |  |  |  |  whfsdudePremium join:2003-04-05 Washington, DC Reviews:
·T-Mobile US
| Re: So that's what happened said by dadarkside:Understood that L3 bought up GLBX recently, but GLBX is still a unique ASN and probably will be for some time yet. I doubt support teams and network maps have been integrated yet. +1 You're right! | |
|
 |  |  |  |  |  |  |  | | Re: So that's what happened It would be kind of satisfying in a Scheudenfreud kind of way if the whole cascade of router crashes was triggered because level3 was trying to integrate the GLBX routing table into it's own....
Look at this analysis of BGP RouteViews data collected just prior to the 9:05 am EDT crash began:
»mailman.nanog.org/pipermail/nano···691.html
Big spikes in BGP data flow between TATALevel3GLBX | |
|
 |  |  |  |  |  |  |  |  whfsdudePremium join:2003-04-05 Washington, DC Reviews:
·T-Mobile US
| Re: So that's what happened said by dadarkside:It would be kind of satisfying in a Scheudenfreud kind of way if the whole cascade of router crashes was triggered because level3 was trying to integrate the GLBX routing table into it's own.... Wasn't implying that. I was originally implying that a Level3 routing outage was also a GBLX routing outage as well. But you're right in they haven't combined anything ops wise or network wise.
Sales and peering agreements have been combined. | |
|
 |  |  |  |  |  |  |  |  |  | | Re: So that's what happened Billing is always first to integrate. "SHOW ME THE MONEY!!!!!"
We still peer with AS3549 | |
|
 Duramax08A Challenger AppearsPremium join:2008-08-03 San Antonio, TX | ah yes amazing how the internet works. | |
|
 |  | | Re: ah yes or how it doesn't work, as the case may be  | |
|
 |  Reviews:
·Comcast
| This was a concerted effort for all the governments to insert spy gear !!!
The interwebs are doomed I tell ya doomed.....
Suffered this outage today also but our backup provider did not thankfully. -- "It's always funny until someone gets hurt......and then it's absolutely friggin' hysterical!" | |
|
 KA3SGM- -... ...- -Premium join:2006-01-17 West Chester, PA | Internet Kill Switch Test ??? The kill switch works!!!!
Yay!!!  -- ROCK 'TIL SUNSET | |
|
 | | Modem needs a Level3 light Couldn't tell what was wrong with my internets. Rebooted modem and router like a fool! Almost changed DNS servers...but couldn't Google alternatives! | |
|
 |  tshirtPremium,MVM join:2004-07-11 Snohomish, WA kudos:3 Reviews:
·Comcast
| Re: Modem needs a Level3 light said by Wilsdom:Couldn't tell what was wrong with my internets. Rebooted modem and router like a fool! Almost changed DNS servers...but couldn't Google alternatives! And now you know. Have that setup/printed out BEFORE this happens again. (not that that would probably help in this case, but backups of any kind require action BEFORE they are needed.) | |
|
 | | Juniper Networks... not Jupiter... Other reports claim that Jupiter Networks routers choked on a BGP update. | |
|
 |  syslockPremium join:2007-02-03 Ann Arbor, MI Reviews:
·Comcast
| Re: Juniper Networks... not Jupiter... said by skuv :Other reports claim that Jupiter Networks routers choked on a BGP update. Test Lab? Staggered Updates? Geeesh.... Would have been the smart thing...
Lets wait till Monday morning at 8am on the production network. | |
|
 |  |  brad join:2007-09-06 Etobicoke, ON | Re: Juniper Networks... not Jupiter... said by syslock:Test Lab? Staggered Updates? Geeesh.... Would have been the smart thing...
Lets wait till Monday morning at 8am on the production network. There was no updating of router OS's on the affected networks. Having a clue? Oh wait this is DSLR. Geeeeeshh.. | |
|
 |  |  | | said by syslock:said by skuv :Other reports claim that Jupiter Networks routers choked on a BGP update. Test Lab? Staggered Updates? Geeesh.... Would have been the smart thing... Lets wait till Monday morning at 8am on the production network. You can't stagger BGP updates because they are basically routing tables. If you were to update some routers and not others you would create a routing loop - effectively taking down networks. BGP (and other routing protocols like OSPF, EIGRP, etc) all need to update at the same time.
It's also worth noting that the updates were valid. It's just that Juniper's routers had a bad implementation of BGP that caused them to crash. Other manufacturers' core routers did not suffer the same flaws. | |
|
 |  |  | | BGP works on triggered updates. In other words, an update of the routing table is only sent when a change occurs. And the only update sent is relative to that change, the full routing table is rarely sent during normal BGP operation between peers. | |
|
 Reviews:
·Optimum Online
·Verizon FiOS
| l3 routes? I didn't notice a thing.. except maybe a website not routing for a few mins. These anomolies can usually be explained by new antivirus software website blocks or other phenomena. Suffice to say I wasn't annoyed by being offline. My routing was working around this time (I think).
The days of leaving my PC on 24/7/365 are largely over... I might have my tablet on 24/7 some days. That doesn't chew 120-750 watts of power.. probably less than 25 under heavy usage.
Level 3 is NOT the entire internet and Verizon has redundancies. It's probably more telling that TWC does not. | |
|
 |  whfsdudePremium join:2003-04-05 Washington, DC Reviews:
·T-Mobile US
| Re: l3 routes? said by tmc8080:Level 3 is NOT the entire internet and Verizon has redundancies. It's probably more telling that TWC does not.
They do have redundancies. ( »bgp.he.net/AS7843 ) Most likely is that they were using Juniper routers at the border. | |
|
 | | slight chuckle This made me chuckle, just a bit. | |
|
 |
|