reply to matthopp
Re: Nice job L3 :) Speaking from the other side of things, and from your individual gripes matthopp , you've never been
part of such work before, am I right?
Sometimes this stuff is well planned out and you have a couple months lead time. Othertimes you're tossed
into the boiling pot and told to sink or swim, for managerial, business unit or political reasons; either case, you
just buckle down and get the job done. I won't say that doing an 80 router upgrade should always be a walk in
the park, BUT so long as you plan it out right AND don't get a visit from Brother Murphy, it's doable. I've seen
and pushed such upgrades myself before, but a) I agree, when this blows up it sucks to be caught in the backblast,
whether you're a downstream user or at the epicenter of it all trying to fix the mess, and b) without that RFO
though anything at this point is pure speculation.
My 00000010bits anyways.
If you get that RFO and it's not under any sort of NDA restrictions, please do share.
Yeah I work on the "other side".. just don't post over here much.
Here's the RFO.. it was a "perfect storm" and Level3 should have not done the GCR with circuits down.
Yesterday at 16:10 GMT an excavating company drove right through our pole line which took down 4 spans of fiber total fiber count 300 pairs. Poles and fiber were restored at 12:06 GMT total outage 20 hours 10 minutes.
Level 3 had a planned Global Change Request issued for over 100 Juniper Core routers in Europe and on the Eastern seaboard including Chicago and much of the Midwest. Level 3 could not postpone the Global Change Request because a maintenance window was issued to 10,000+ customers. Unfortunately when the Juniper Core was upgraded the working path was taken down and your protect path was cut due to the larger fiber issue.
---- END QUOTE ----
During a scheduled maintenance (GCR 6287157) to upgrade a Washington DC router, a configuration issue caused the two routing engines on the router not to sync. Once the Technical Service Center notified the IP NOC of the customer impact, the NOC wiped out the configuration and loaded a current configuration to restore services.
....I agree, perfect storm indeed. And a total FML / date with Crown Royal moment.
As for that routing engine sync issue... I've worked with enough gear to know that that's
ALOT harder to zero in on... and if it was part of the fiber issue, it makes tracking it down
that much harder as all your other issues can mask the problem.
Moral of the story... I hope when they fixed the thing, the guys who fixed it were given
a 2week allexpenses trip to the Bahamas, minimum. Cuz I can say I wouldn't've wanted
any part of that cleanup job ... [facepalm]