Our arrays are designed to have redundant local UPS' within the cabinet. The incoming power to the cabinet is split on two rails from separately fed PSU's within the data center. Preferably, each of those PSU's not only has a dedicated UPS, but a dedicated generator, but this last piece isn't always the case.
Now, we have thousands of customer's across the globe, and we RARELY ever reach the scenario given here, and I'm pretty sure the setup that I describe is pretty standard, to a degree, so I still fail to see how this perfect storm emerged.
In my mind, what SHOULD have happened, is that once the power loss was detected within the cabinet, and we would be talking both rails here, so this is a FATAL power loss situation, the Server Management should have detected the power loss, the local UPS within the cabinet kicked in and the servers SHUT DOWN GRACEFULLY as there is no power coming in at all. Better to be safe and take the outage at this point.
Assuming that the generators will kick in to provide the power before the batteries die is a fail point, imo, that can be minimized by the UPS at PSU, as this would keep the cabinet from shutting down immediately in a fatal power loss scenario, as the cabinet would never feel a loss of power, and remain up giving the generators time to come up. If the generators don't come up, you've got PSU UPS power until the batteries go, and again at this point, the cabinet sees the fatal power loss and goes down gracefully.
Maybe I'm missing something, but this is a facility design problem in power redundancy and/or bad server management power policy, imo.--
My Blog - Raising Connor
WoW: Mal'Ganis : Aftershock : Krimdal