Infrastructure Musings: Counting Cores

Once upon a time, an application ran on a CPU.

Over time, those CPUs got faster and more powerful.

Then, one day, those CPUs had more than one core per CPU. The multi-core CPU was born.

Over more time, those cores have multiplied many times and 8, 10, 12 and 16 core CPUs are now common. Whilst CPU clock speeds have settled in the 2.0GHz to 3.0GHz range over the past few years, more "bang per buck" has been delivered through adding more and more cores, and through improvements in memory bandwidth, caching and moving connectivity ever closer to or onto the CPU.

So the applications will have kept pace and will be making use of all of these extra cores of compute, right? Well yes, and no. The advent of virtualization and hypervisors has effectively allowed multiple applications to share a common CPU compute platform. This is an excellent way to consolidate and get more value out of this very powerful infrastructure. It also allows the opportunity for laziness amongst application developers as virtualization can allow them to "hide" those applications that haven't been adapted to the multi-core World. So those "single threaded" apps are still out there, although they are becoming less and less common.

Why do I raise this now? Well just at the end of last week I was in conversation with a customer who had a bit of a dilemma when migrating a VM from an older virtual server farm to a newer one. From a 3 year old cluster to a 3 month old cluster. The VM ran slower on the new farm than it did on the old. We had a good discussion about resource utilisation (everything unstressed), storage IOPs and performance (nothing of real note) and versions of firmware, VMtools etc. But the point of raising this here is that the old cluster had servers with 4 core 3.05GHz CPUs, the new one has 8 cores of 2.5GHz CPUs. Multi-thread capable apps would probably be seen to run faster on the new cluster which has more cores. However, it turns out that this particular application is still single threaded. And when the hypervisor starts scheduling this across multiple cores, all we get is a queuing effect. And once the work gets done, its getting done on slower cores. So the single thread is interrupted as the work is moved across multiple cores, which is slower than a dedicated core. In the older 4 core CPU, there's less scheduling being done (or maybe very little if any scheduling, depending on what else is happening on that host) so the application runs quicker.

So, do we have to go back to the drawing board to re-write the application? Scarey from a cost point of view, I'm sure. Well, that would be best, but probably isn't practical in very many circumstances. So the settings in the hypervisor to tie a particular VM to a single CPU core ("core affinity") can be selected to overcome this performance hit. Its a good and effective fix for the short to medium term.

To fix this for the longer term, the next time the hood's up on that application for some other fix or development, see if you can persuade the developers to move it to multi-thread - its about time!

Infrastructure Musings

I Work For Dell

Monday 10 February 2014

Counting Cores

No comments:

Post a Comment