Infrastructure Musings: February 2014

Monday 10 February 2014

Counting Cores

Once upon a time, an application ran on a CPU.

Over time, those CPUs got faster and more powerful.

Then, one day, those CPUs had more than one core per CPU. The multi-core CPU was born.

Over more time, those cores have multiplied many times and 8, 10, 12 and 16 core CPUs are now common. Whilst CPU clock speeds have settled in the 2.0GHz to 3.0GHz range over the past few years, more "bang per buck" has been delivered through adding more and more cores, and through improvements in memory bandwidth, caching and moving connectivity ever closer to or onto the CPU.

So the applications will have kept pace and will be making use of all of these extra cores of compute, right? Well yes, and no. The advent of virtualization and hypervisors has effectively allowed multiple applications to share a common CPU compute platform. This is an excellent way to consolidate and get more value out of this very powerful infrastructure. It also allows the opportunity for laziness amongst application developers as virtualization can allow them to "hide" those applications that haven't been adapted to the multi-core World. So those "single threaded" apps are still out there, although they are becoming less and less common.

Why do I raise this now? Well just at the end of last week I was in conversation with a customer who had a bit of a dilemma when migrating a VM from an older virtual server farm to a newer one. From a 3 year old cluster to a 3 month old cluster. The VM ran slower on the new farm than it did on the old. We had a good discussion about resource utilisation (everything unstressed), storage IOPs and performance (nothing of real note) and versions of firmware, VMtools etc. But the point of raising this here is that the old cluster had servers with 4 core 3.05GHz CPUs, the new one has 8 cores of 2.5GHz CPUs. Multi-thread capable apps would probably be seen to run faster on the new cluster which has more cores. However, it turns out that this particular application is still single threaded. And when the hypervisor starts scheduling this across multiple cores, all we get is a queuing effect. And once the work gets done, its getting done on slower cores. So the single thread is interrupted as the work is moved across multiple cores, which is slower than a dedicated core. In the older 4 core CPU, there's less scheduling being done (or maybe very little if any scheduling, depending on what else is happening on that host) so the application runs quicker.

So, do we have to go back to the drawing board to re-write the application? Scarey from a cost point of view, I'm sure. Well, that would be best, but probably isn't practical in very many circumstances. So the settings in the hypervisor to tie a particular VM to a single CPU core ("core affinity") can be selected to overcome this performance hit. Its a good and effective fix for the short to medium term.

To fix this for the longer term, the next time the hood's up on that application for some other fix or development, see if you can persuade the developers to move it to multi-thread - its about time!

Thursday 6 February 2014

When Should You Not Go To The Cloud?

This post is in response to a question posed on LinkedIn by Paul Calento Chief Executive Officer at TriVu Media Top:

Disclosure: I work for Dell - we sell data centre products and services to end user organisations and to cloud / service provider organisations.

There are many factors in the decision to use or not use cloud. For the purposes of responding to your question, I'm going to assume you are questioning the use of public cloud vs internal provision.

Scale / Cost:
If you are very large enterprise, there is every reason for you to be capable of delivering an internal cloud that rivals or betters an external cloud in terms of performance, cost and security. So if you have the scale, with the skills, then the potential is there to do something better within your organisation.
We should also consider the scale of individual projects. For the dev, test, and small scale production, external cloud can make sense. Pay as you grow models at the lower end of the scale are attractive. However, the "$ per GB" model (as an example) of quick to deploy commodity compute environments are initially very attractive - but because costs scale linearly with consumption, there is the real chance that, at some point, the multiples of unit cost are going to gross up to a total cost that exceeds the costs of providing the service in a more traditional way. So it makes sense to fully understand what you expect a service to consume in 18 months or 3 years time, when deciding to go external cloud or more traditional route. You also need to consider that, over time, the cost / benefit model will change through growth, or shrinking of the scale of the service. So what originally made sense internally might become smaller over time, or less critical to the business and should eventually be farmed out to a commodity cloud provider. And vice-versa, what initially might've been an experimental service to "see how it goes" could grow into something large scale and upon which you are now spending more than you could provide that service internally. Typically I don't see much reference to this kind of change over time in the cloud discussion, which is a high risk strategy.
For those organisations with a large investment in existing systems, the business case for ANY change (whether it involves cloud or not) must take into account the value of the existing service provision. For example, it's likely to be difficult to justify the business case for migrating a service to the cloud, if you've just made a multi-million dollar investment in the existing platform, and that needs to be depreciated over 4 or 5 years. Similarly, if you have a very large data centre facility - as you migrate services out to the cloud, the unit cost per square metre of data centre space becomes significantly larger for each of those services that remain in the data centre. So if team A move their service out to the cloud to save 25% on their costs, but it increases the running costs of teams B and C that stay behind by 30% then the holistic business case doesn't stack up. So be careful about the big picture, not just the individual services.

Control and Risk:
For some, control and security of data is paramount to their business - particularly in heavily regulated environments. Whilst many public cloud offerings now offer the potential for good security, for some this just isn't enough, as any compromise is potentially ruinous to their business, including the risk of hosting customer data outside appropriate geographic boundaries. The more a company needs to have an iron-grip on who accesses data, where that data lives, how they can have audit rights to that data and when they need to guarantee physical separation, all of that heavily weighs against a public cloud service. The economics of cookie-cutter services, shared infrastructure etc. of the public cloud don't stack up as soon as you start to add customization.
I would also suggest, for many companies, that if they put something into an external cloud service then it may be wise to use more than one provider, if that is practical. This has the benefit of keeping the providers in competition with each other, and provides a measure of protection in the event that one provider ceases to exist, under-performs, over-charges or removes that service from their offerings.

Integration Complexity:
Some organisations are struggling to cope with the complexity of their internal systems and how they all interact with each other, often not having full documentation. This means they struggle to understand what service impacts will occur when they un-pick old systems, or add in new functionality. If you then add to that the need to include external connectivity and industry standard APIs, then its not easy. The cost of change may well exceed the benefits of hooking existing services to new external services. Requirements such as single customer view, which are required for some regulations, encourages organisations to keep services internal to ensure that the data remains under control and accurate, moving large volumes of data up and down external connections doesn't always make much sense in these situations.

Management:
Many larger companies who have outsourced services (including management of on-premises services) such as networks, data / storage management, some commodity processing applications, have found managing the interfaces to these providers to be complex and onerous. Getting all these providers to work together for the benefit of the company who is buying the services can be something that needs a great deal of management time and rigorous processes, and that's just when everything is going smoothly. When there are service issues and those issues cross boundaries then who is at fault and who needs to rectify the issues can be complex to identify and resolve at best, and can end up with long and expensive legal disputes at worst. Add the potential for multiple cloud service vendors into that mix and you can see how quickly the costs of managing such an environment (and by that I don't just mean the operational costs, but also the risk / cost to customer services and regulatory compliance) could outweigh the expected benefits.

Some of the above issues can be mitigate with a rigorous management approach (which is not happening in the grey IT economy) and/or the right tooling (such as Dell Boomi or Dell Multi-Cloud Manager). This is why internal IT needs to become the conduit for IT services - so control to meet the organisation's objectives can be maintained whilst ensuring that IT services are provided in the most effective way - internally, private cloud or public cloud. The correctly balanced mix will be the best solution for many organisations. One size fits all is not likely to lead to a good res

VMWORLD Europe 2013 BLOG LINKS

Whilst attending VMWORLD Europe 2013, I made notes on the sessions and captured some early thoughts about the potential impact of what I saw and heard. Here's the summary of links to each of the blog posts, to help you quickly get the information that interests you most:

Technical Content

VMWORLD General Content

Local and Travel Content

The First Musing

This blog will be about thoughts that occur to me as I go about my daily role. Currently I work for Dell in the UK, in a mixed infrastructure architecture / design role for our largest global customer organisations. Much of the time I'm working on specific virtualisation / cloud solutions for specific customers or helping them develop their strategic direction. This gives me a good insight into the common challenges I see across a number of customers in different markets, which allows the development of reference architectures that we can take as solutions to many or our customers.

Brief career history, prior to Dell (most recent first):

Head of x86 Server Infrastructure Architecture, Lloyds Banking Group: 5 year overall infrastructure strategy & Direction; 3 year investment planning and ownership of architecture assets for x86 server, Windows server, VMware and Linux OS; Team of 11 architects; design of VMware platform for bank merger – 1100 host servers.
Head of x86 Infrastructure Architecture for HBOSplc – role as above.
Head of Infrastructure for Bank of Scotland Corporate Banking
Head of IT Audit for Bank of Scotland
Several roles in Civil Service IT

I Work For Dell