I Work For Dell

Whilst I work for Dell, the opinions expressed in this blog are those of the author.
Dell provides IT products, solutions and services to all industry sectors including end user orgainsations and cloud / service providers.

Friday 21 March 2014

How Will You Cope With A Cloud Disaster?

What happens when a cloud service fails?

What happens when a cloud provider has a disaster?

How does your business continue?

Why does my internal IT cost me so much more than buying a few VMs from a cloud provider?

I've recently read some interesting points on this topic (they are paraphrased below and the authors will not be named), which gives me cause to think about the approaches that businesses are taking to cloud services, and some of the areas which appear to be adding risk to their business for their business critical IT services.  Here are a couple of examples and why they're not correct:

"One of the benefits of cloud is that IT service continuity becomes the problem of the service provider".  No!  This is one of the most concerning statements I've seen recently.  Service continuity is never the problem for the service provider.  It is always the concern of the business consuming that service.  It is possible that the way in which the service continuity is provided is within the remit of the service provider, but the continuity of the service itself is the problem of the consuming business.  Your business must ensure that the right provisions are in place, either through contractual arrangements (guaranteed service level agreements, recovery point objectives, recovery time objectives all with penalties which are commensurate with the business you will be loosing if these are not met) or, your business needs to think about service design as part of adopting cloud. See thoughts further down this article.

"public clouds do not offer disaster recovery".  That's also not strictly true.  By default, most probably don't.  However, many will offer the option to add such services.  Also, if you have an application that can run in multiple sites to provide high availability (HA) facilities, by agreeing with your service provider that they guarantee to run your application in multiple sites, then your HA can become your disaster recovery (DR) approach.  Also, the public cloud could be your DR facility - more later.

So a cloud strategy, just like any other IT strategy, needs back ups and DR plans. To dismiss public cloud as not providing DR is missing some of the options, equally to just assume that the service provider will look after DR is equally problematical. There are many approaches to this, and I'll suggest some below, but this isn't a comprehensive list or guide, its here to provoke some thoughts and some ideas.

By default, public cloud usually does not include DR in the traditional sense. However, by carefully selecting multiple cloud providers for a single business process or a cloud provider who can guarantee your systems will run in multiple sites, DR can be achieved - as long as your application is also designed with multi-site capabilities. In this case, DR is essentially just an extension of your HA approach.

Additionally, cloud can be part of your DR strategy. For example, you can run all your systems in house and upload data and source code up to cloud services on a regular basis. With the right process design and cloud service provider contract you can then invoke your DR by running that code and using that data that is stored in the cloud wherever your cloud provider has the capacity available.

So be careful how you design your service provision.  Think about the options, of which these are some:

- multiple physical locations offered by your service provider (make sure your data is still geographically compatible with relevant regulations, and your applications are designed for multi-site use);
- service provider provisioning of DR processes and facilities.  Make sure the design and contractual arrangements meet the business criticality of the system.  Where contractual penalties are agreed, make sure an hour of penalties equals or exceeds an hour of lost business, and this is maintained as business volumes fluctate;
- choose multiple service providers for the same service on a continuous basis - make sure they don't share data centre facilities;
- choose one service provider to provide the service on an ongoing basis, and another to provide quick burstable capacity for use in a DR situation.  Design processes to ensure that the second service provider has current copies of your systems and regular updates of data, commensurate with your recovery point and recovery time objectives;
- use the cloud as your DR strategy for your in-house systems.  Regular copies of systems and data up to the cloud provider with a contract that allows you to start up your systems very quickly.  Treating the cloud provider as your second "warm" standby data centre is a viable approach.

Make sure you use one of the above, or a similar approach to ensuring your business can continue when there's an IT disaster and make sure that each of the above can meet your security requirements. And test them frequently to make sure whichever approach you choose, actually delivers to the design parameters and service requirements.  Remember that when to invoke, how to invoke and who is responsible for what during invocation is just as important as the IT elements.  Also cover what happens when the disaster is over and you need to return to your regular service provision - getting back to normal can be just as hard or harder than invoking disaster recovery.

When comparing the costs of your internal IT with the cost of buying a few VMs from a cloud provider, remember that you are unlikely to be comparing apples with apples - make sure the capacities, performance and guarantees of up time and recovery time are equal.  Only then will costs of the insurance service provided by most internal IT teams become more apparent.  Its surprising how quickly the cost of a cloud service escalates when you truly match the service levels to existing provision.  This is one of the greatest risks with shadow IT services - they are often not comparable to internal services and are often putting critical business processes at risk.

Or choose to take the risk of course - that's always an option, as some business services are not critical to the business - if you can afford for service to be unavailable for a few days or weeks, then its probably appropriate not to pay for contingency planning up front.

No comments:

Post a Comment