dennismurphyPut me on hold? I'll put YOU on hold Premium Member join:2002-11-19 Parsippany, NJ |
to limegrass69
Re: [OOL] Internet went out at 7:40pmsaid by limegrass69:It's frustrating for customers...and not to minimize your suffering, but there's no point in fretting over it...there's nothing you can do, and it should hopefully be resolved by morning. An outage of this magnitude is an "all hands on deck" affair. I’ve been in the midst of one of those kind of events ... more than once ... and they are NOT fun. There’s always some gear that doesn’t come back. Fans, disks, power supplies that have been spinning for years on end, do NOT like sudden-stops. I really feel for the Altice folks / data center outages are not fun, and the redundancy isn’t always perfect. Raise your hand if you’ve seen a switchboard’s bus bar melt down to slag. ✋ |
|
| |
Anon65c37
Anon
2019-Sep-7 8:17 am
This is why we have back up service providers. My lightpath service has been amazing through the years. But you can’t anticipate a transfer switch not working no matter how many times you tested. |
|
dennismurphyPut me on hold? I'll put YOU on hold Premium Member join:2002-11-19 Parsippany, NJ |
said by Anon65c37 : This is why we have back up service providers. My lightpath service has been amazing through the years. But you can’t anticipate a transfer switch not working no matter how many times you tested. ... exactly right. I’ve seen everything from a transfer switch jam (see the melted bus bar message) to generators wired out of phase (brought the whole datacenter down during its first test) to cars racing and crashing into the main transformer - surging the power so hard, overcurrent protection shut the whole street down. The amount of things that can go wrong - mechanical and in “meatspace” (ie human error) is amazing. |
|
|
| |
It is odd. I don't profess to be an expert on their network topography, but if the White Plains (Greenburg) site was out, it's interesting that parts of Westchester were on, while others were out...along with some folks up in Dutchess and down in the Bronx.
Also, Lightpath is supposedly designed to route to two POPs (in my company's case, White Plains and Stamford). What happened to their self-healing ring...unless Stamford POP (and the rest of Fairfield) feeds back to White Plains? People in Norwalk and Milford were having issues too.
Interesting for sure. |
|
dennismurphyPut me on hold? I'll put YOU on hold Premium Member join:2002-11-19 Parsippany, NJ |
said by limegrass69:It is odd. I don't profess to be an expert on their network topography, but if the White Plains (Greenburg) site was out, it's interesting that parts of Westchester were on, while others were out...along with some folks up in Dutchess and down in the Bronx.
Also, Lightpath is supposedly designed to route to two POPs (in my company's case, White Plains and Stamford). What happened to their self-healing ring...unless Stamford POP (and the rest of Fairfield) feeds back to White Plains? People in Norwalk and Milford were having issues too.
Interesting for sure. I don’t know anything about Lightpath or Altice’s network guts, but even if the fiber ring is intact, if something critical such as AAA or even a security gateway aren’t fully seamless in failover, outages can happen. Or, sometimes a function is built in a redundant manner but because of maintenance or other failure, is running in simplex mode. I’ve seen cases of awful timing like that too. “Bring down the biz-recovery server so we can patch it. We’ll cold-soak the patches and then next week fail traffic over, and patch production.” Boom, production blows a CPU while the secondary site is in maintenance. Now both sides are down ... Been there, done that, have the scars to prove it.  |
|
·Frontier Communi..
|
to dennismurphy
said by dennismurphy:I’ve been in the midst of one of those kind of events ... more than once ... and they are NOT fun. There’s always some gear that doesn’t come back. Fans, disks, power supplies that have been spinning for years on end, do NOT like sudden-stops. Where I work, if a building is critical they buy at least two of everything; aggregation, access, and fiber. And the failover is seamless, almost transparent. Whether you do BGP, EIGRP, HSRP, etc. But not everyone's wallets are the same size. When administration says it will have to wait, it has to wait. I wonder if Altice uses DWDM ring in their topology at all. These give you two chances at surviving a cut by provisioning working and protected paths in physically diverse runs. But nothing survives human factor like you describe. It's always fun to imagine when things like this happen, it's because someone's street racer barreled down and into a datacenter. Whoops! By now residential customers should already know the deal. You are leasing best-effort service for a monthly fee. If it breaks, you have the choice to wait, or order from different carrier. |
|
dennismurphyPut me on hold? I'll put YOU on hold Premium Member join:2002-11-19 Parsippany, NJ |
said by tired_runner:But nothing survives human factor like you describe. It's always fun to imagine when things like this happen, it's because someone's street racer barreled down and into a datacenter. Whoops! Oh, the stories I can tell ... the situation above is EXACTLY what happened to us in the early 2000s. Some kid smashed into the building’s transformer at 70+mph. Those things aren’t exactly sitting on a shelf - usually takes months to get one custom built. That was fun - the power spike tripped all the overcurrent protection and shut the UPS down. Came back up pretty quick but just that fast blip was enough to drop the 100,000+ square foot data center. Whoops. You ever hear a data center go completely silent? It’s unbelievably spooky. Given the size of the employer I worked for at the time, that was a ... significant event. Sure, we had triply-redundant data centers, but application, network and database fail overs don’t happen instantly ... just that couple of minute blip is a gigantic deal. Same building, we had a sewage pump stop working. So some knucklehead electrician decided the power feed to the pump was bad, and found that the closest power point was the PDU... which fed the mainframes. He managed to take out the entire street’s power. Or the guy who was cutting a new power whip, tossed the wire to the other side of the room and managed to snag the wire on the bus bar. We found him on the other side of the room, still breathing (but barely.) Stuff happens. And at a certain scale, with a certain load, all the redundancy in the world doesn’t make it a seamless event. It makes it *survivable* but not unnoticeable.... especially for us monkeys who have to put Humpty Dumpty back together again...... |
|
·Frontier Communi..
|
said by dennismurphy:Same building, we had a sewage pump stop working. So some knucklehead electrician decided the power feed to the pump was bad, and found that the closest power point was the PDU... which fed the mainframes.  Party story time.... One building I was in charge of designing and deploying from scratch ran one single circuit in the beginning. This circuit was connected to regular power outlet. Requirement was get partial connectivity going while I built permanent topology. This building was in beginning stages of construction at the time. One day we lost the entire building. I thought hmm... Maybe FDNY is running fire drills and did mock power cut. Nope. Turns out datacenter had itself a flood, and contractor hooked up high-power humidifier to same outlet where circuit was connected while they drained the place out. Power breaker did its job and shut itself down before contractor had a chance to melt it all to shit. Trust me..... I know where you're coming from....  |
|
dennismurphyPut me on hold? I'll put YOU on hold Premium Member join:2002-11-19 Parsippany, NJ |
said by tired_runner:Trust me..... I know where you're coming from....  Beers on me! We could tell stories all night ... now you know why I left IT engineering and now I’m in sales... trying to preserve what little hair I have left.  |
|
momcat1No Relation To The Bobcat join:2002-10-21 Wappingers Falls, NY |
to tired_runner
said by tired_runner:By now residential customers should already know the deal. You are leasing best-effort service for a monthly fee. If it breaks, you have the choice to wait, or order from different carrier. That's all well and good, but many of us consumers have no alternative service available. Really makes us feel that we're being taken advantage of. |
|
| |
said by momcat1:said by tired_runner:By now residential customers should already know the deal. You are leasing best-effort service for a monthly fee. If it breaks, you have the choice to wait, or order from different carrier. That's all well and good, but many of us consumers have no alternative service available. Really makes us feel that we're being taken advantage of. Yep it’s either optimum or frontier Frontier 18meg down and 3 meg up sucks d1ck |
|
momcat1No Relation To The Bobcat join:2002-10-21 Wappingers Falls, NY |
Between Froniter's last century speeds and highway robbery prices, they are not an option, IMO. |
|
chip89 Premium Member join:2012-07-05 Columbia Station, OH |
to dennismurphy
Yeah DSLR crashed and burned because of an power loss remember. |
|
|
dennismurphyPut me on hold? I'll put YOU on hold Premium Member join:2002-11-19 Parsippany, NJ |
said by chip89:Yeah DSLR crashed and burned because of an power loss remember. Oh, I remember it well. I’m on the same power grid as the (former) colo facility DSLR used. So when I came back online out of the power outage, and saw DSLR down, I knew something bad happened. I’m truly less than 3 miles from there ... it was a pretty significant power event as I recall - not Sandy bad (that was 9 days without power for me) but still was significant. Ugh. Data center outages are never, ever fun. |
|
| |
I think I gave Justin a lot of grief after the NAC debacle. I was running a web site back then, and the datacenter I used had a backup generator, a backup-backup generator, which were tested weekly, plus multiple refueling contracts to allow generator operation indefinitely. Not to mention that their UPS system could run the entire facility on battery alone for 4 hours.
NAC didn't even bother to test their generators. |
|