The Bane of the Software-Defined Universe: Software-Defined Latency

November 11th, 2014 by

The Bane of the Software-Defined Universe: Software-Defined Latency

Software-defined storage.

SDN.

Private Cloud.

Converged Infrastructure.

Public Cloud.

VDI.

Hybrid Cloud.

Containers.

CloudOS.

It seems that the entire universe is pivoting away from hardware to be defined in software.

I’ve been involved in IT in some form for the last four decades, and I’ve never seen a more exciting, more innovative time. Every part of the IT stack is being abstracted away from hardware into software, which is unleashing incredible opportunities for the kind of disruptive innovation that fuels our economy.

This disruption also leaves the feeling that IT is in chaos.

Every last bit of the stack that we used to be able to touch, install with our hands, fix with a Torx are now 0’s and 1’s we have a hard time imagining, let alone touch. And every last bit now has become a query (or task) from above: What’s our cloud strategy? What type of converged infrastructure should we pursue? Should we software-define just our storage, or our network as well?

This may leave many experienced IT professionals feeling as if this “software-defined” universe is out of control. How are we supposed to manage something that’s seemingly changing every few days?

Ultimately, I’ve argued it is all about tradeoffs:

  • between budget and cost,
  • between resiliency, performance and agility,
  • between application performance and infrastructure utilization,
  • between workload QoS and sweating assets,
  • between compute, storage and network bandwidth,
  • between compute, storage and endpoint latencies,
  • between infrastructure constraints and business constraints,
  • between compute and storage,
  • between CPU, memory, IO, network, ready Qs, latency, etc.,
  • between application priorities
  • and among business priorities.

As I argued a few weeks ago, these tradeoffs aren’t simple. These tradeoffs are between huge, conflicting forces that pull the data center in different, opposing directions – in real time, all the time.

And because these tradeoffs aren’t simple – and are indeed conflicting all the time in real time – it has gone beyond human capacity to manage this n-dimensional challenge manually in the decades-old tradition of IT where we stare at screens waiting for things to break.

Come to think about it, that tradition seems sillier and sillier. After all I wouldn’t try to solve for the square root of 76,597,504 with paper and a slide rule today, would we? (the answer to which is 8752 – using a calculator, of course)

Then why do we keep trying to do this with one of our most-valuable, most-complex assets – our datacenters?

Just like with advanced computations, some things are better handled in software. Human beings are incredibly creative problem solvers, but less adept at data processing, particularly at scale. That’s exactly why humans invented computers.

Since we invented computers a few decades back, we’ve discovered newer and more interesting ways to put them to use. Consider the rise of the modular, multi-tier app. Compared to the old monolithic architectures of the past, multi-tier apps are more flexible, more resilient, more scalable, more standardized (leading to lower development costs) and of higher quality. There are literally no downsides to the move to n-tier architecture.

Save, perhaps one – latency. After all, for all the benefits of n-tier architecture – the modularity, the reusability, the resiliency – one must tolerate some latency as data is passed between the various layers of the architecture. The key is to minimize that latency – or else you may run into spinning beach balls, or worse, blue screens of death.

Latency, however, is poised to become the bane of the software-defined universe. For as more and more is abstracted into software, the less we can keep track of the physical location of workloads. As those workloads migrate due to concerns about high CPU or memory congestion on a host, they may end up residing in parts of the datacenter (or with SDN, perhaps in a different datacenter entirely) that are many hops away from those with which they must communicate in an n-tier architecture.

The result of which might not only be slow response time, but in a world of gigabit and beyond switches where a second of packet loss is larger than War and Peace on Blu-Ray, might lead to application failure – ever lost a Skype call before? Imagine something much more mission critical.

The key to combating latency is to localize those components (or workloads or containers) that communicate the most. Keep them together underneath the same switch (ideally on the same host) and you not only minimize latency but you reduce the potential points of failure – and in an internet bursting Netflix streams and high-velocity trades into 10, 40 or 50 gigabit switches, port overload is going to become a greater and greater issue.

Considering how often certain components chat with one another, depending on demand changes from minute to minute, how on earth are we supposed to keep track of chatty workloads? With a slide rule? With traditional monitoring? By prayer?

We built VMTurbo to stop praying and gain control. We designed VMTurbo as a Software-Driven Control platform that would abstract your software-defined datacenter into a software-driven commodities market. And to do so, we created a way to abstract every datacenter entity – from server to storage, memory to fabric, and now even network – into a common currency so software negotiates the tradeoffs we’re all forced to make on a daily basis (some 35,000 decisions every day for the average human, according to researchers).

Now, flow – or the stream of communication between application workloads – has been added to this abstraction, so that your own system can find the healthy equilibrium between the right CPU, memory, storage, network (and on and on) constraints to assure performance without making your CFO scream.

In doing so, VMTurbo not only enables more efficient networks – localizing chatty traffic reduces east-west traffic and top-of-the-hierarchy congestion – but makes the SDN strategy your CIO is demanding less risky. But it doesn’t end there.

By bringing flow into the equation and localizing chatty components, VMTurbo helps with your hybrid cloud strategy as well. After all, the last thing you’d want to do is burst something chatty to the public cloud and introduce latency, would you?

By adding flow as one of the resource commodities the workloads must buy, VMTurbo enables your own environment to self-optimize – to effectively work it out amongst themselves through millions of tiny negotiations over the price (based on utilization and constraints) of CPU, storage, memory, flow or even Ready Queue, so that they can tell you where they should reside, how large they should be, how they should be configured. All you have to do is agree…

That’s what makes me most proud of what we’ve designed at VMTurbo – every new entity we add to the marketplace of your environment, the better your environment will perform, and the most efficient will it become.

In the coming months you will see more about how we’ve added hybrid cloud control, VDI control, applications control, even containers control to our Software-Driven Control universe. When you put them all together, you begin to realize the power of Demand-Driven Control.

Leave a Reply

Your email address will not be published. Required fields are marked *