<img height="1" width="1" style="display:none" src="https://www.facebook.com/tr?id=1678611822423757&amp;ev=PageView&amp;noscript=1">
Defrag This

| Read. Reflect. Reboot.

Amazon Outage and Your Business Recovery Strategy

Missy Januszko| March 02 2017

| News & Events

amazon-outage-business-recovery-strategy.jpgAn outage in Amazon Web Services’ Simple Storage Service (S3) reminded us why you need a business recovery strategy.

Your business relies on your data to be available when and where you want it.  But on February 28th, 2017, many businesses were reminded that “the cloud is just someone else’s computer,” as an outage in Amazon Web Services’ Simple Storage Service (S3) impacted sites across the internet. This is why you need a business recovery strategy.

For businesses whose sites were impacted, it meant loss of sales during the time the sites were unavailable, or perhaps loss of productivity until functionality was restored.  But in the minds of many CIOs, CTOs, and IT professionals alike, it brought questions: “What if it were my site?  Am I doing enough to ensure that my business continues to operate in the event of an outage?”

Moving to the Cloud Alone Isn’t a Panacea

Cloud service providers advertise SLAs (service level agreements) of “99-point-very-many-nines” uptime, but even a small brief outage of a few minutes can cause an SLA miss.  These advertised SLAs, combined with cost-savings on hardware, real estate, and some IT staff, can make a move to the cloud very attractive from a financial perspective.  As was learned from yesterday’s incident, simply moving the systems or data offsite doesn’t necessarily guarantee 100% incident-free time.  Proper business recovery planning requires an understanding of where your data resides, understanding of your redundancy needs, and a plan to mitigate potential impacts if a service provider has an incident. 

New Call-to-action

Return on Investement (ROI)

How much protection is enough?  How much is too much?  Measuring return on investment is always an exercise in metrics and statistics.  Understanding the value of a system being available or unavailable can help determine how much is worthwhile to spend on a recovery strategy. 

A blog site may not be worth a recovery strategy, but a blog site that generates revenue through advertisements would lose money in an outage.  The cost of business lost should be measured against cost of investing in higher-tiered business recovery strategies. 

Analysis of business recovery strategies should also include figuring out the recovery time objective and recovery point objective (RTO and RPO, respectively).  RTO analysis needs to determine how long the business could tolerate an outage.  RPO analysis defines the maximum period that the business could tolerate data loss

That blog site from the previous example may have a low threshold for recovery time due to the loss of ad revenue, but the data on the site may only change once a week, therefore having a higher threshold for recovery point in the case where the fastest recovery procedure is a restore from backup.  Consider both factors in the analysis to determine how much protection is enough.  

Multi-Region Redundancy

The AWS S3 incident was limited to the region named US-EAST-1, and regions are segregated from one another.  Further, options exist to use a region-specific endpoint (i.e. http://s3-eu-west-1.amazonaws.com), but if the default endpoint (http://s3.amazonaws.com) is used, this is routed by default through the US-EAST region for redirection to the correct endpoint.  Although the incident was limited to the US-EAST-1 region, it’s possible that the issue was more widespread if a site relied on the redirect. 

Since regions are segregated from one another, if your site had cross-region replication set up on the S3 buckets, and all objects had been replicated, and you have the ability to redirect the application to a different S3 bucket, a site owner could have taken some steps to try to restore service. 

Since the actual root cause is yet unknown, it’s not possible to determine if this would have restored service faster, but there are likely many options for faster recovery if you have an IT team who knows the systems and layout – even those in the cloud.

Proactive Testing

The previous statements regarding cross-region replication contained a lot of “ifs”. 

  • “If” you had cross-region replication set up  
  • “If” all objects had been replicated
  • “If” the application can be redirected

The only way to know if your team’s recovery strategies are going to work is by testing the recovery procedures often enough to have confidence in them.  You don’t want to be in a recovery situation trying to figure it out for the very first time. 

Redundancy Part of a Business Recovery Strategy

You’ve heard the saying “don’t put all your eggs in one basket”.  Depending on the criticality of your data, you may not want to store all your data in one single cloud provider either.  This option is likely both costly and complex, so return to that ROI calculation to determine how much you would benefit from having your data stored with multiple providers. 

AWS, Microsoft Azure, Rackspace, and others offer enterprise-class storage options, and another option is to have a hybrid cloud/on-premises solution.  Yes, you may have gone to the cloud to get rid of the on-premises systems, but depending on your ROI metrics, a hybrid recovery solution may make more financial sense than a multiple cloud provider solution.

Assess Your Situation

Downtime for your systems can result in loss of business, revenue, or productivity, all of which equate to real dollars and cents.  No matter where your systems reside, the S3 outage is a reminder to assess your business recovery strategy and procedures, to ensure that the damage is minimized in the event of an incident.  Don’t wait until it’s too late.

Topics: News & Events

Default HTML block

Leave a Reply

Your email address will not be published. Required fields are marked *

THIS POST WAS WRITTEN BY Missy Januszko

Missy Januszko is an independent IT consultant, with more than 20 years of experience as an enterprise hosting architect, large-scale infrastructure designer, and hosted application designer. She specializes in DevOps, automation and configuration management, PowerShell, and Active Directory, and has broad experience across the entire line of Microsoft business technologies. Missy is a co-author of “The DSC Book” with Microsoft MVP Don Jones, and she is also a conference speaker on DSC-related topics. She is a contributor to a number of open-source projects, including “Tug”, the open-source DSC pull server, and “Autolab”, an automated, rapid-install lab build.

Free Trials

Getting started has never been easier. Download a trial today.

Download Free Trials

Contact Us

Let us know how we can help you. Focus on what matters. 

Send us a note

Subscribe to our Blog

Let’s stay in touch! Register to receive our blog updates.