site Search:


 
   
story category
Tuesday's Power Outage
Data center picks bad time to tout reliability...
by Karl Bode Thursday 26-Jul-2007 tags: business · networking
Tuesday's power outage in San Francisco once again brought forth a flood of reports noting that yes, the Internet isn't infallible. Some faulty PG&E electrical breakers caused multi-hour outages for websites like Technorati, Craigslist and Netflix, who had apparently put all their digital eggs into one basket (though it's expensive to do otherwise):

"What the episode exposed is that some companies operate entirely from one data center, a decision described by some security experts as risky. In emergencies, such companies can't shift traffic to an alternative facility where they keep additional servers."

Of course if you use one data center you expect it to have backup power, but San Francisco data center 365 Main didn't turn on its backup generators until 45 minutes after the outage began (they're investigating). The outage was bad timing considering the company just got done bragging extensively about their reliability in a press release issued the same day as the outage:

"To ensure uptime for key tenants such as RedEnvelope, 365 Main provides modern power and cooling infrastructure. The company's San Francisco facility includes two complete back-up systems for electrical power to protect against a power loss. In the unlikely event of a cut to a primary power feed, the state-of-the-art electrical system instantly switches to live back-up generators, avoiding costly downtime for tenants and keeping the data center continuously running."

Not so much.

view: topics flat text 
Post a:
Bobcat
Premium
join:2001-02-04

1 edit

Eggs in one basket

Having all your eggs in one basket is generally OK, as long as you pick the right basket.

Given that 365 Main and NAC (in an outage a few weeks ago) had backup generators that didn't work, it would seem they are the wrong baskets.
vanDSLuser
from Vancouver 2010
Premium
join:2004-07-28
White Rock, BC
Reviews:
·Shaw

Re: Eggs in one basket

Well, considering the costs of a second location, it isn't cheap at all. When you are not making any money, it is pretty hard to justify spending another huge chunk of change for redundant servers.

That being said, I think I'm going to bite the bullet and add another site... downtime's expensive!
BosstonesOwn

join:2002-12-15
Everett, MA
Reviews:
·Comcast

Re: Eggs in one basket

said by vanDSLuser:

Well, considering the costs of a second location, it isn't cheap at all. When you are not making any money, it is pretty hard to justify spending another huge chunk of change for redundant servers.

That being said, I think I'm going to bite the bullet and add another site... downtime's expensive!
2 Words my friend. Global Clustering
--
"It's always funny until someone gets hurt......and then it's absolutely friggin' hysterical!"

cdru
Go Colts
Premium,MVM
join:2003-05-14
Fort Wayne, IN
kudos:5
Reviews:
·Frontier FiOS
said by vanDSLuser:

Well, considering the costs of a second location, it isn't cheap at all. When you are not making any money, it is pretty hard to justify spending another huge chunk of change for redundant servers.

That being said, I think I'm going to bite the bullet and add another site... downtime's expensive!
I think it really depends on the application. If it's a simple site and there isn't a critical need to keep things in sync, a second location may just be 2x the cost but manageable if you need the uptime. Things get quite a bit more complicated when you need to start keeping things in sync, replication, etc. It can go from a 2x the cost to x^2 cost. If you need the reliability, then sometimes you have to bite the bullet. But if $COST_OF_HIGH_AVAILABITY > $COST_OF_DOWNTIME, then you don't do it.
--
Go Colts

53059959
Temp banned from BBR more then anyone

join:2002-10-02
PwnZone
I thought I read somewhere that a drunk employee caused the power outage by ripping all the cords out, and that they were trying to cover that up by saying it was the power companies fault.

besides, don't these places have battery backup and diesel generators?

bokamba
Chengdu Rocks
Premium
join:2002-04-05
Falls Church, VA
I would be very angry if I were one of their customers. You shell out all this money and when the power goes out, the backup generators fail. Unbelievably stupid.

KrK
Heavy Artillery For The Little Guy
Premium
join:2000-01-17
Tulsa, OK
Reviews:
·AT&T DSL Service

Re: Eggs in one basket

said by bokamba:

I would be very angry if I were one of their customers. You shell out all this money and when the power goes out, the backup generators fail. Unbelievably stupid.
Exactly, if you're a major datacenter, and you have major accounts that you care about, you make damn sure you have done everything in your power to deal with these emergencies. Ok, if your datacenter is destroyed in a major disaster or attack, well, that's one thing.... but if the power fails and your backup generators are faulty?!? Puh-leeze. That's gonna cost you valuable clientele, and damage your company reputation.

Someone should be fired (Upper management) for this.
--
"Regulatory capitalism is when companies invest in lawyers, lobbyists, and politicians, instead of plant, people, and customer service." - former FCC Chairman William Kennard (A real FCC Chairman, unlike the current Corporate Spokesperson in the job!)

supergirl

join:2007-03-20
Pensacola, FL
California has the ridiculous outages because all of the "we don't want another power plant" or "you're not building it here" garbage. Way too much regulation in California is the problem.
--
Saving the world keeps me busy. However, I find Earth very primitive from my home planet of Krypton.
-Supergirl

MysticGogeta
The Robot Devil
Premium
join:2005-03-14
League City, TX

Re: Eggs in one basket

No kidding I the Terminator would push for more power plants in California with all the support he has it can be done.
--
Team Discovery-Join the fight
Bobcat
Premium
join:2001-02-04
Reviews:
·Verizon Online DSL
·Optimum Online
·EarthLink
said by supergirl:

California has the ridiculous outages because all of the "we don't want another power plant" or "you're not building it here" garbage. Way too much regulation in California is the problem.
Please explain how not enough power plants and/or too much regulation caused a PG&E transformer in a manhole under 560 Mission St to fail.

PolarBear03
The bear formerly known as aaron8301
Premium
join:2005-01-03
NIMBY! NIMBY! Hey, how come the power's out?

jhrvta

join:2000-04-16
Ventura, CA

Re: Eggs in one basket

Umm, as a proud Californian, I have to chime in here...

This was not a problem of "capacity" or "generation." The California Independent System Operator (CAISO) reported plenty of capacity for this mild summer day. There were also no issues of "transmission." These are the very high voltage power lines that distribute power over long ranges. This was a problem of "distribution." Our good friends at PG&E had some issues in the complex system that leads power to our homes and businesses. There are transformers, power factor correction capacitors, power poles (that get hit by cars), and the like: things can go wrong. It is a very complicated system. To sum it up as simply as "stupid NIMBYs" really shows some... naivete.

The bankrupt PG&E may (or may not) have been paying too much attention to their distribution plant.

This leads us to what happened at the turn of this century and so called "deregulation."

I can go on for days and days why this was not good for all California consumers of electicity: from the delaying of building power plants, the disconnect of the power generators (which used to be the local utilities, now third party "Enron" type companies) from the local utilities (and their users), and the apparent greed of everybody (generators, utilities, lawmakers, and maybe some rate payers).

PS: I am moving to Washington. Don't want me there? NIMBY!!!

PPS: Just kidding!

jhrvta

RR Conductor
Happy 40th Amtrak
Premium
join:2002-04-02
Redwood Valley, CA
kudos:1
said by supergirl:

California has the ridiculous outages because all of the "we don't want another power plant" or "you're not building it here" garbage. Way too much regulation in California is the problem.
Another NON CALIFORNIAN California expert lol
Kearnstd
Elf Wizard
Premium
join:2002-01-22
Mullica Hill, NJ
a UPS switches over so fast even a computer cant notice, while on batteries the generator should be spooling up. if the diesels cant handle the load and you tout your 99.9999% uptime BS then its time to get a turbine or a 300Kw fuel cell.
--
[65 Arcanist]Filan(High Elf) Zone: Broadband Reports
donaldk
Premium
join:2000-10-19
Thunder Bay, ON
Reviews:
·Eastlink Cable
WARNING: LONG RANT
SUMMARY: Their back up power plant setup, SOPs, and operations are 100% unsatisfactory and I would definitely recommend moving business elsewhere. This is from my own work experience for many years with diesel electric power plants. Anyone who directly deals with or invests with 265Main should read the very last paragraph.

After reading their log on what happened with their Gensets. N+2 setup, 2.1MW per genset, what I do not get is why their generators do not have the option to at least parallel with each other besides the back ups. I work with marine diesel electric power plants all the time, which are even harder to work with than this set up, will not get into it but I'll say a 715kw load transfer is laughable. Ok enough of my rant, now to the point.

They have 16.8MW of 100% rated capacity (8 gensets). From the looks of things their generators are not normally loaded higher than 1MW per genset, could understand to have capacity to expand and allowance for overloading. Five main gensets remained running after 3 main gensets and BOTH backup gensets failed, the power required that was put onto backup#2 was 2.5kW before it tripped off.

Now they got five remaining gensets on-line, assuming 1MW they were supplying to each of their co-lo rooms, there was still 5.5MW of safe room left on these gensets to take the 2.4MW of load that Genset 1,3,4 failed and both backups failed to take on, IF they had the option of paralleling to share the load, even if for emergency use, it would have saved the data-center from any part of it going dark. if my assumptions are correct, they had double the enough capacity amongst all their good generators to keep their data-center up. Now being they claim N+2, they ought to have a load sharing and synchronization control that can distribute amongst their power plant as needed, if it was limited to just the two backup generators then who ever came up with that set up should be fired / sued for such a bad unsafe design, and they should have their main switchboard required with the logic and power runs necessary to do this. If they did have the capability to load share and synchronize them all together as needed, then the on watch engineers should be fired immediately on the spot, as their log they posted does not indicate any attempt to load share or any problems of load sharing (except for Genset 1,3,4 passing load to the two backups and then both backups totalling failing).

That is my 0.02 on the emergency operation of the plant. Now a stab at their maintenance and SOPs. With 4 Generators failing their start up sequences right when they are needed, there is a sense that preventative maintenance and routine checks/tests were not done, and if so then they should post their logs in PDF (you laugh, I know they got them if their are what they say they are) right on-line to be scrutinized, never mind the upcoming law suit that probably will happen. There is no excuse for not doing full load testing, they could have load banks set up to work the diesels to 110% load, and all none computing loads of the building wired in to function with the load banks too. (100% normal, and 110% for 1 hour per 24 hours). At the minimum, weekly rotating of machinery with 50% load, testing all features of the genset and switchboard (I get the hint that this happens but not much happens, its started up and shutdown right away? no testing of the switchboards), transferring form the power grid to the genset on a live bus and dead bus (load banks). I could go on.

I might be a little steep with my comments, but I am talking about a company who is expected to have near 100% up-time (what 99.999%?, akin to the military and health-care), and obviously they certainly cannot meet that. I wonder if all their data-centers are like this, my biggest qualm is the apparent lack of adequate load sharing and syncing for additional redundancy. Working as an engineer on a diesel electric warship, if something like this happened cause of stupidity (loosing have the power plant instantly, risking total loss of propulsion), there would for sure be an investigation, and if it was due to human error, heads literally roll.

Anyone who has invested into this company should have some serious considerations to worry about, shake things up, and consider moving your assets elsewhere. If your investment in 365Main is large and you still have faith in them, obtain the engineering specifications of their data-center, arrange to have a private engineering firm to have a tour at it and give you their own recommendations. If 365Main wanted to keep you they would allow it, consider the fact, did they have even UPS's running to allow for genset hiccups too? I did not notice any fed/state/muni government agencies listed as clients with them quickly browsing through their site, but I can guarantee you if they do have them they will not for long. There is a comment about an ISP intentionally tripping off land power, and testing their own UPS banks and Genset power plant every week to ensure proper operations, done long enough to stress there gensets but short enough that their UPS's absorb any hiccup's, That is the BARE MINIMUM!!

Nuff said about these assholes.

EMPWasUsed

@rcn.com
There is significant evidence that an Electromagnetic Pulse was used to shut down this datacenter as there was NO DRUNK MAN and the problem started on the roof.

Now the question is WHO and WHY (I have my hunches but will let you discuss this amongst yourselves)
donaldk
Premium
join:2000-10-19
Thunder Bay, ON

Re: Eggs in one basket

roflmao...... I have not had enough beer yet.. more more more.. weeeeee
Bobcat
Premium
join:2001-02-04
Nutcases are to post here - »/dev/null
Thank you.

jose3030
Premium
join:1999-08-17
Manassas, VA

Oops


ColorBASIC
8-bit Fun
Premium
join:2006-12-29
Corona, CA

S happens

That's life.

trparky
Apple... YUM
Premium,MVM
join:2000-05-24
Cleveland, OH
kudos:1
Reviews:
·Time Warner Cable
·Time Warner VOIP
·AT&T U-Verse

1 edit

I suspect....

I suspect that a lot of people are going to be moving Data Centers soon.

I do find this incident to have a bit of irony in it. Yeah, tout high reliability and then go offline that very day. Yep, irony at it's best.
--
Tom

kba4

join:2001-10-23
Canton, OH

the backup has to be switched on?!

shouldn't that type of thing be mostly automated? I mean seriously: my $50 UPS is, why shouldn't one that costs as much as a house be too?

decadent
Premium
join:2002-04-02
Piscataway, NJ

Re: the backup has to be switched on?!

It should. But it is all about testing. I can unplug UPS to test it, but it does not seem like a great idea to turn off electricity in big building and turn on generators to just try how it works. Also it may work first time, but you need to do it each week to make sure it works all the time. This is problem with high availability, it works best when nothing happens.
ajanis

join:2004-10-19
Oswego, IL

Re: the backup has to be switched on?!

Also weekly "tests" of the generator usually don't drop a load on the genset, and also doesn't test the load transfer aspects of the system. Just running it for 10-20 minutes and shutting it down is usually all that happens in a weekly "test:.

My company brings in a load bank (tractor trailer) onces per quarter and run @ 100% load for 4 hours. Even that doesn't test the transfer parts of the system...and if street power drops during the test we have 20 minuts to remove the loadbank before batteries die.

quetwo
That VoIP Guy
Premium
join:2004-09-04
East Lansing, MI
The ISP I worked at cut commercial power once a week (Sunday, like at 4am) to test their equipment.

It's like saying that you don't want to test your smoke alarms because pushing the button is not like a real fire.
donaldk
Premium
join:2000-10-19
Thunder Bay, ON

Re: the backup has to be switched on?!

Agreed! Kudos to your ISP for at least doing proper testing, 365Main after this looks like a joke, well compared to my work IT IS (Canadian Navy).

IF you know anyone who deals with 365Main they ought to move their stuff elsewhere.
smcallah

join:2004-08-05
Home
Are you saying that your $50 UPS is wired into your house wiring with an automatic transfer switch, and a generator that detects loss of street power and starts automatically is connected to that as well?

Because, that's the only way your $50 UPS would be comparable.

exocet_cm
Buckle up, it's the law
Premium
join:2003-03-23
New Orleans, LA
kudos:2

Re: the backup has to be switched on?!

said by smcallah:

Are you saying that your $50 UPS is wired into your house wiring with an automatic transfer switch, and a generator that detects loss of street power and starts automatically is connected to that as well?

Because, that's the only way your $50 UPS would be comparable.
Yes, and my $25 dollar UPS is too!

just kidding
--
"I have measured out my life with coffee spoons..." - T.S Eliot
Check out ma blog: »www.johndball.com

Jason Levine
Premium
join:2001-07-13
USA
I work in a hospital and have personally been here during power outages (including the "Great Northeast Blackout" in 2003). In every case, the lights turned off (as they aren't on the generator), but the servers kept running smoothly. The generators kicked on within milliseconds (I'm guessing since I can't time that quickly) of the power loss and kept the servers running fine.

Now, if a medium-sized health care organization/hospital can do this, why not a large data center? Especially one that just that day touted it's backup power generators. If anything, they should be able to outlast us during a long power outage.

kba4

join:2001-10-23
Canton, OH
Reviews:
·RoadRunner Cable
·AT&T U-Verse
my cheapo UPS can automatically sense an outage, with/without a connected PC, and transfer to a 5-10min battery for safe shutdown of connected equipment. it seems to me that it's only logical that this technology would be built into what must be at least a multi-thousand dollar system at a data center. I don't think anyone here should be making excuses for these failures, let the companies involved apologize.

at the very least, shouldn't there be a dedicated monitoring room for this stuff? considering that the data center probably profits at least a little, they could afford to pay people to watch for problems, with overlapping shifts, and flip the switch...
--
illegal wars, prisoners with no trials, and state controlled media. welcome to the land of the free!

Maxxxt
Peculiar Mental Twist
Premium
join:2001-06-12
Denver, CO

Smells fishy

I thought those expensive generators click on instantly after loosing power? Someone screwed up big time. Maybe a disgruntled employee with a sense of humor?

Also, why would big web dependent..(I mean they exist only on the web)..not have back up data center for backups and alternate web servers? Different geographic servers would be a must especially having mains in San Francisco which will likely slip into the ocean in the next decade or at least have a few disruptive quakes.

Maxxx
--
Don't argue! with an idiot; people watching may not be able to tell the difference.

See 10 replies to this post
Whome

join:2005-10-10
Newbury Park, CA

You got to love it

Yiddish expression - Man plans, God laughs

Titus Pullo
I came, I saw, I slept

join:2004-06-26
kudos:1

Re: You got to love it

Roman expression: "arimané come ddon Farcuccio" - the customers, that is.

»www.geocities.com/mp_pollett/idiomexp.htm

--

iLive4Fusion
Premium
join:2006-07-13
Reviews:
·AT&T U-Verse
·T-Mobile US
·AT&T Wireless Br..
·ViaTalk
·Verizon Broadban..

Reliable servers haha

I have 2 $50 APC UPS backup battery connected to my desktop computer and one for the internet modem/VOIP to provide power for 25 seconds until my whole house Generac Generator kicks in to provide power to everything but the stove and A/C. Seem's like a big datacenter like 365 could at least afford a battery backup until the generators kick in

See 21 replies to this post

cline3621
Mr. Yuk is MEAN Mr. Yuk is GREEN
Premium
join:2006-06-14
Clarksville, TN
Reviews:
·CDE

Power

It might be a caustic and smart assed answer, but I think about it this way. These companies are housed in California. People there bitch, piss, moan and whine about not having enough electricity, and they have to endure rolling blackouts. On the other side of the coin they won't allow anyone to build necessary electrical capacity to serve the public. Now these net companies listed above stand to lose millions a day when the power goes out and there generators fail. Why not do this: Build there own power system and own power generating system to run on either natural gas, solar (Which I use in the summer) wind or what-have-you. No need for PG&E anymore, no worries about losing power, and they would have to maintain their own power grid, so there would be no need for testing, as the system would have to run constantly.
Raficoo

join:2006-11-14

aa

so tat would explain y www.topbb.com(free forum hosting) is down
ricep5
Premium
join:2000-08-07
Jacksonville, FL

Laughable

I think its laughable that the area that touts itself as technology central insists on maintaining datacenter assets so close to its HQ.

This is a provincial attitude at its highest.

I don't care if its earthquakes, grid failures or poor staffing levels.

Any of those websites could have had a global load balancer perform a redirect to a back up farm. While yes, it costs money, remember that stock prices and company revenue are also based on its ability to deal with risk.

Any company that relies on that single point of technology for so much of its revenue (regardless of the SLA's the datacenter has) deserves any loss in revenue or stock price the outage creates.

This attitude that the assets have to be within "show off" distance to your VC's, investors or other technorati is poor planning of the utmost.

There is no technical reason on the planet that these sites can't be hosted in any other place other than California. The decisions to host so close to the HQ is an emotional and irrational one and will be their undoing in the future.
Raficoo

join:2006-11-14

1 edit

oops(edited)

srry for the double post, though i didn't see the one below it
Raficoo

join:2006-11-14

2 edits

Advantage for a few of us

Well sice datacenters are down, some people can register domain name that WERE only used For useless strange advertisments, finally now i can regist my Gameserver name
sumdumgai

join:2007-07-23

best place for a data center?

on a kind of related note, I believe I heard that somewhere like Iowa or nebraska was a good place to have a data center because there are not too many disasters there.... ??

RR Conductor
Happy 40th Amtrak
Premium
join:2002-04-02
Redwood Valley, CA
kudos:1

Re: best place for a data center?

They have Tornadoes there, and the midwest is not immune to Earthquakes. The biggest Earthquake in US History took place in Missouri, and it will happpen again there in the future.

CyBrChRsT

join:2003-02-28
Overland Park, KS

1 edit

Eh hem! ONLY 2?

I remember back in like 2001 msn messenger was down for like 3 days for some 10,000 accounts, of which mine was one of those, all over a faulty hard drive AND **YES** AND 2 backups behind it failed also. Since then I've learned from Micro-Sloth 2 backups is never enough, I believe in at least 3 and 4 for safe measure, and these guys boast their 2 backups? Sounds like whoever told them to only get two and not even test them to switch over in a test run of power failure has no business in the IT industry. I'd be going after his job and suing over liability for incompetency lol (if it's possible)!!! But that's just me. I'm sure a company that makes and spends that much money could surely afford a good enough lawyer to find a way to sue over that. =)

Sunday, 12-Feb 04:34:00 Terms of Use & Privacy | feedback | contact | Hosting by nac.net - DSL,Hosting & Co-lo
over 12.5 years online! © 1999-2012 dslreports.com.