Quantcast
Viewing all articles
Browse latest Browse all 7

Why Amazon’s Outage Does NOT Expose Risk Of Cloud Computing

Over the weekend, Amazon experienced another major outage with some of it’s cloud services.  This time, apparently it was due to the power outages that swept the east coast as a result of some major storms.  There are a ton of articles out there today claiming how this exposes risk to cloud adoption and cloud services in general.  Here is one example on the Huffington Post.  I would encourage you to go and read the article so you can understand how off base most of the industry is when it comes to understanding what cloud computing is and the benefits it brings to the overall industry.

To start, I admit that I know nothing about the root cause of Amazon’s recent outage, other than what I’ve read in the numerous articles out there.  At this point, all of them claim it was due to a power outage in one of their major data centers around the Washington, DC area.  Let’s just assume that is the case since that is what everyone else is doing.

If it’s true that the outage was caused by a power failure to one of the grids serving that data center, it has nothing at all to do with “The Cloud”.  It has everything to do with traditional IT services and DR/BC planning.  Ten years ago, when the cloud did not exist, companies were still hosting parts of their technology infrastructure in data centers they did not own.  There were lots of data center companies out there that owned a lot of space that could be rented and companies could move part or all of their IT infrastructure to those locations.  Because of how expensive it is to build data center space, it made a lot of sense for companies to outsource the building and daily operations of their data center locations.  Why spend millions in capital on building your own when there were a lot of companies already providing that service as part of their core business?

When a particular company was interested in moving their infrastructure to one of these locations, they went to the data center and met with the teams running them.  They took tours, they asked lots of questions about redundancy and daily operations, and they generally had an RFP that detailed the level of redundancy the company had to have as a bare minimum.

A large part of those requirements revolved around power.  Data center operators are in fact, some of the largest consumers of commercial power in the country.  Look at my home state, North Carolina, to see why our state has seen companies such as Google, Apple, and Facebook build huge data centers here over the past year.  We have cheap power and they use a lot of it.  Our state government offers tax incentives, sure, but don’t kid yourself with thinking those tax incentives have much to do with the 15 – 30 jobs a lights out data center might generate, most of which are entry to mid level tech jobs that don’t pay big salaries.  North Carolina wants these facilities in large part due to the power consumption and the tax revenues that come along with those.  There is no coincidence that our state is also the home to the largest US power company now with the Duke Energy and Progress Energy merger.  So you get it.  Data centers consume power and it’s important to make sure that base is covered if I’m going to move my infrastructure there.

Typically a data center will be serviced by more than one power feed.  Ideally, which isn’t common due to the way utilities are set up, you would like to get multiple power grids coming into your facility and have those feeds provided by different utilities.  Since that isn’t always possible, you go the next best route.  You get redundant power feeds from the same provider and get each of those fed from different substations, different grids.  The power companies are going to have failures.  When they do, you don’t want that to mean your facility has to go down.

Next, you go out and pay millions for massive generators to power the facility in the event that you do lose your multiple external feeds.  Typically, those generators are also redundant and you have multple to serve the needs of the entire facility, even if you lose one of those generators.  The generators are fueled from large onsite fuel supplies.  Those fuel supplies are tied to contracts that guarantee your generators get refueled during an emergency on a priority basis.  Data center operators generally can structure their contracts so they receive refueling right after hospitals, which always get first priority.

That covers the power to the data center.  Once inside, you have each of those power feeds split into multiple power grids within your data center.  When a company moves their infrastructure into the space or racks they have rented, they will have A and B power feeds that go to different UPS facilities that then tie into the redundant power grids mentioned above.

What you end up with is redunancy from end to end.  So what if a tornado hits the primary substation providing power to your data center.  You’ve still got power feeds from other grids.  So what if a hurricane takes out all of the substations or lines feeding your facility.  You’ve got N+1 generators to back it up and you’ve got an unlimited supply of fuel that will be ready when you need it.  In many cases, you’ll even have N+2 generators so if you lose one during an emergency, you still have N+1.

All of this infrastructure costs millions of dollars and lots of time to put into place.  That’s why a lot of companies choose to outsource it and let companies, like my old company Windstream who specialize in that business, run their data center operations.

You go to those companies.  You make them prove to you that they know data centers.  You make them prove to you that you don’t go down when there are power outages.  You make them show you their routine power testing, results of those tests, and mitigating actions they’ve had to take.  You make them show you their SLAs around power for the past several years.  You get customer references.  In the end, data centers can’t go down.  When that is your core business, you can’t afford even the smallest outage because it could result in significant revenue loss for your customers and yourself.  You DON’T take shortcuts on infrastructure.

So let’s get back to Amazon.  If a power outage truly did cause the disruption this weekend, someone didn’t do their homework.  Amazon didn’t plan well and if they did, they didn’t execute.  If they did execute well, they didn’t test well.  If they did test well, they didn’t test often enough.  I promise you, this is not rocket science.  Understanding the cloud may be getting closer to rocket science, but I can assure you we are a long ways from talking about cloud when we’re dealing with basic physical data center infrastructure.

In addition, Amazon’s customers failed in that they didn’t hold Amazon accountable for proving their reliability.  Netflix, Pinterest, and Instagram are all quoted as having suffered outages due to this.  Those companies are large enough and bring enough revenue to the table that Amazon should have had to prove their infrastructure before these companies moved their first service onto their platform.  Someone didn’t.  It’s that simple.  If they did, they didn’t do a good job or they got lazy.

The bottom line, this is a decades old problem that has nothing to do with cloud.  Ten years ago, Netflix or Instagram might have had all of their physical servers located in this same data center managed by an Exodus or Terramark.  The same problem could have happened and it has absolutely nothing to do with cloud.

In fact, we are now in a much better situation to deal with these type problems and have more options than we’ve ever had.  We can run our cloud environments active/active in multiple data centers.  We can choose to migrate virtual workloads from one data center to the other in a matter of hours if we foresee a major event coming that could cause a disruption in one data center.  We can virtualize storage across data centers so not only do our front end servers exist in multiple locations, so does our data.  The possibilities are becoming endless.

Finally, the people writing about our industry really need to learn to embrace this, understand how it all works so they can intelligently write about it, and then that will help foster a quicker and smoother migration to the cloud.  I live and breath this stuff day in and day out and I see how it helps transform the operations of companies and give them tools they’ve never had.  At no point in history have we had more at our disposal to prevent a single emergency from impacting our business and it’s because of, not in spite of, “the cloud.”


Image may be NSFW.
Clik here to view.
Image may be NSFW.
Clik here to view.

Viewing all articles
Browse latest Browse all 7

Trending Articles