Understand..well I was really interested in the Title of the thread..Security and redundancy for cloud which everyone seems to be preaching these days.
Back in 2011..when Dublin when out..Amazon painfully got back the cloud in ways that almost parallel what Justin and crew had to go through to get DSLR back up..and at that time they claimed redundancy was going to be the byword.
Amazon promises to improve cloud computing redundancy after Dublin outage
Problem may been with utility provider rather than a lightning strike
16 August 2011
Amazon Web Services (AWS) will work to improve power redundancy, load balancing and the way it communicates when something goes wrong with its cloud, following the outage that affected its Dublin data centre.
A post mortem delved deeper into what caused the outage, which affected the availability of Amazon's EC2 (Elastic Compute Cloud), EBS (Elastic Block Store), the RDS database and Amazon's network. The service disruption began Aug. 7, at 10:41 a.m., when Amazon's utility provider suffered a transformer failure.
Related Articles
Amazon says poorly executed, planned upgrade caused massive cloud outage
A warning to us all... >>
Amazon investigates after cloud nightmare
Cloud service largely back up after major outage >>
How to reduce the risk of cloud service failure
Cloud computing and seven tips on how to enhance its reliability >>
At first, a lightning strike was blamed, but the provider now believes it actually wasn't the cause, and is continuing to investigate, according to Amazon.
The service that caused Amazon the biggest problem was EBS, which is used to store data for EC2 instances. The service replicates volume data across a set of nodes for durability and availability. Following the outage the nodes started talking to each other to replicate changes. Amazon has spare capacity to allow for this, but the sheer amount of traffic proved too much this time.
When all nodes related to one volume lost power, Amazon in some cases had to re-create the data by putting together a recovery snapshot. The process of producing these snapshots was time-consuming, because Amazon had to move all of the data to Amazon Simple Storage Service (S3), process it, turn it into the snapshot storage format and then make the data accessible from a user's account.
By 8:25 p.m. PDT on Aug. 10, 98 percent of the recovery snapshots had been delivered, with the remaining few requiring manual attention, Amazon said.
For EBS, Amazon's goal will be to drastically reduce the recovery time after a significant outage. It will, for example, create the capability to recover volumes directly on the EBS servers upon restoration of power, without having to move the data elsewhere.
»
www.computerworlduk.com/ ··· -outage/And after Redundancy you must have a Seamless Transition.
»
www.slideshare.net/Conso ··· recovery»
www.telecomassociation.c ··· 0309.htm