 HiVoltPremium join:2000-12-28 Toronto, ON kudos:17 | Recovery progress? Any update on how the recovery is progressing? No update from Justin on that google docs page for a while now. -- BUCK FELL ,,!,,('-'),,!,, |
|
 mjfwish I was bluePremium,Mod join:2000-08-05 New Orleans, LA kudos:2 | I have heard from a very good source that he is guessing within a week.
I wouldn't bet the house on that! |
|
 HiVoltPremium join:2000-12-28 Toronto, ON kudos:17 | Awesome!  |
|
|
|
 shortcktWatchen Das Blinken LightsPremium join:2000-12-05 Tenant Hell | reply to HiVolt I hope Justin (or someone else who has access to all the details) eventually posts up a detailed description of the failure and restoration with all the small technical details when this is all over and they have time. |
|
 statestress magnetPremium,Mod join:2002-02-08 Purgatory kudos:6 Host: Webhosting Android Sonic.net Washington & Balti.. UK Chat
| There is an update here: »docs.google.com/document/d/1kll8···rHY/edit
Update: 7th May I am told by the second lab the missing data is intact up to the time of the power fail. I have to now wait for payment to go through and return by courier of the media. Then there will be time spent checking the data, and restoration of both data and hardware without adding too much downtime. For the curious: this event has cost $28,000 in lab recovery fees. This is not including any money that we must now spend on new hardware, does not include financial impact of downtime, permanently lost traffic, and so on.
|
|
 HiVoltPremium join:2000-12-28 Toronto, ON kudos:17 | Yeah I just read that... Great that the recovery was successful.
The cost for recovery is nuts though... -- BUCK FELL ,,!,,('-'),,!,, |
|
 J E F FWhatta Ya Think About Dat?Premium join:2004-04-01 Kitchener, ON kudos:1 Reviews:
·Rogers Portable ..
·WIND Mobile
·Rogers Hi-Speed
·magicjack.com
| reply to state IS Justin going to come on here so we can talk to him? I had concluded that this mess was costing about $40,000, so I'm close. Someone needs to buy Justin and me a beer.
You think Justin will set up a paypal like what wikipedia does to collect? Or is it good? -- Not all men are idiots. There are still a lot of bachelors out there.
|
|
 Reviews:
·Verizon Online DSL
·Optimum Online
·EarthLink
| reply to shortckt said by shortckt:I hope Justin (or someone else who has access to all the details) eventually posts up a detailed description of the failure and restoration with all the small technical details when this is all over and they have time. Maybe Justin will write a front page article about how RAID is not backup. |
|
 shortcktWatchen Das Blinken LightsPremium join:2000-12-05 Tenant Hell | said by Bobcat79:Maybe Justin will write a front page article about how RAID is not backup.
Would definitely be a timely article, since backups are often an ignored subject in both personal and business settings. BBR got lucky... a white paper I downloaded some time back gave some stark, eye opening figures for business losses and failures caused by data loss. A timeline of what happened here, along with the associated numbers, can be a good example to point to when a client wonders why backups are so important.
Mostly for curiosity I would still like to read a technical writeup of the incident, such as how the drives are configured and what is stored where, where was the damage and how was it recovered, what prevented use of the mirrored data, how are backups performed and what backups were available*, did the hosting site UPS have the means to signal power failure, did they ever determine why the gen didn't start.
Along with that, it would be interesting to see site stats for the first few days BBR was online again.
From: sequence of unfortunate events Tuesday 17th
Dell support says they dont know the cause, but we must wipe entire array, do firmware upgrades, and start again. I dont trust this gear. *Check backups, mail: ok, nfs:ok, site files: ok. The sql backup is incomplete.
Since NAC is a large hosting facility I wonder how many other clients had problems or loss caused by the power failure. |
|
 statestress magnetPremium,Mod join:2002-02-08 Purgatory kudos:6 Host: Webhosting Android Sonic.net Washington & Balti.. UK Chat
| reply to HiVolt Just a quick status update:
Data restoration is underway. User accounts, ISP reviews and news have been restored. There will be some hours of down time scheduled for final restoration. Full restoration is anticipated within days (May 9th)
|
|
 HiVoltPremium join:2000-12-28 Toronto, ON kudos:17 | Awesome!  |
|
 dvd536as Mr. Pink as they comePremium join:2001-04-27 Phoenix, AZ kudos:4 | reply to HiVolt said by HiVolt:Yeah I just read that... Great that the recovery was successful.
The cost for recovery is nuts though... Expensive event! what is nac.net kicking back to justin because it was their fault? - I saw on a site that lists what sites make on ad revenue and dslr was around ~$1300 per day. OUCH! |
|
 Matt_31Who Hit The Power ButtonPremium join:2003-02-21 Jasper, IN | reply to state feels good to be back. What a mess, I have missed this place. |
|
 JackarinoPremium join:2006-12-28 Allendale, NJ kudos:1 | reply to HiVolt You never realize what you have until its gone |
|
 | reply to state Just saw announcement on the top of the page....Excellent news. Great work. |
|
 | I guess I don't understand the whole process. If all the data was intact, but just something with the SQL got messed, why the need for recovery? Why not be able to use, or just copy the existing drives? |
|
 WeirdalPremium join:2003-06-28 Grand Island, NE kudos:20 | reply to HiVolt Looks like a few threads got mixed up in the recovery process. For example: »We're back... (most of that thread was originally in the cooler)
Good job getting everything back on the site though. -- »[Info] The DSLR Orangeface extension 2.0! |
|
 cdruGo ColtsPremium,MVM join:2003-05-14 Fort Wayne, IN kudos:7 | reply to UmmaGumma
Re: Recovery progress? said by UmmaGumma:I guess I don't understand the whole process. If all the data was intact, but just something with the SQL got messed, why the need for recovery? Why not be able to use, or just copy the existing drives? The site was ran on multiple servers off of a common storage array, a Dell MD3000 plus a MD1000 expansion module, from the status update document Justin was maintaining. The storage array keeps track of the drives, their RAID array configuration, etc and presents storage to the server operating systems as one or more virtual disks across one or more physical drives.
The problem was that the storage array decided to go on vacation and just leave the virtual drives in an inaccessible state. All the bits are still there, or at least almost all there depending on what exactly had or hadn't been committed when the power was lost. Just where all those bits were at precisely and in what order was the first step to just determining the state of the rest of the system.
Once they could determine that the virtual drives could be recovered then the "fun" task of recovering the files/databases/etc and trying to reincorporate them back into the site that was limping along. |
|
 | The real problems:
1. Justin used Dell hardware. 2. Justin didn't have real backups. 3. NAC is a lousy datacenter. |
|