This article was originally published by The New Stack.
The learning curve for deploying containers is steep. Most organizations are only beginning to experiment before moving to production. The performance, cost and scalability benefits that containers deliver to an organization’s infrastructure (on-premises or in the cloud) are undeniable, which is likely why the smart folks over at Gartner predict that, by 2020, more than 50 percent of global organizations will be running containerized applications in production — up from less than 20 percent today.
And get ready, because once an organization figures out where and how to best deploy containers, adoption will grow at a much faster rate than the transition from physical to virtual ever did. It’s easy for containers to spread like wildfire because they can pop up in minutes, almost instantly servicing customers. This is fantastic for agility and elasticity, but the instant gratification will lead to challenges — especially when it comes to cost overruns in the cloud.
The main character in all of this is Kubernetes. “Kubernetes, here’s my application — now run it for me.” That’s the end game advocated by Google’s Kelsey Hightower. With Kubernetes, you can have your legacy application, micro-services, or functions, spanning one or more public clouds and on-prem. Sounds simple right?
Not so fast! It might be simple enough for the first few applications. However, as adoption increases so does the complexity in managing many services, each with fluctuating demand. It’s critical to have foresight into the complexities and understand all the layers of deploying containers so that you’re prepared to control the wildfire once the spark hits your organization.
To assure that your services run well and the platform can handle the rapid growth of deployments, managing multiple dimensions that factor into Service Level Objectives (SLOs) is imperative. This raises multiple questions.
- Workloads can scale, but how do you optimize scaling across all key dimensions of CPU or Mem, or Transactions (response time, throughput)?
- When and should it scale horizontally or vertically?
- If you scale for one dimension do you risk congestion in the other?
- Services can have dependencies on other services, data, or proximity to clients, so where should they run — closer to clients or closer to data?
- While the platform provides elasticity, should you scale out your on-prem cluster or burst to the cloud, and how do you identify when you can save money?
These are important questions. So, let’s explore the options.
Option 1: Auto-Scaling Workloads
Container orchestration platforms provide auto-scaling policies. However, one challenge is that you need to decide the metric (CPU and memory are default) and set a static threshold on each, individually. This type of analysis requires a user to determine the threshold value, then manually maintain it AND analyze trends to assess whether thresholds are still good enough. These policies can’t scale horizontally and assume the container is in the best configuration for resources. Consider that even a single container may be constrained in one dimension (like memory) but over-allocated on another (like CPU). Horizontal scaling, without first resizing, means you are propagating a configuration that is not assuring performance. A first, the best action may be to resize the container instead of relying only on horizontal scaling. Any scaling action needs to assess the availability of node resources, and if none are available, then the system should provision more.
Managing resources alone is not enough. You can add custom metrics around transactions and response times, but first, you need to add a solution to generate telemetry data (such as a service mesh), and then configure Prometheus to collect it. But again, you are managing to a myopic threshold that is not dynamically maintained. A self-managing system should be continuously looking at workload demand and, without manual intervention, determine whether vertical or horizontal scaling is needed while assuring that you can accommodate peak demand. That requires a multi-dimensional analysis.
Even before workloads scale, the system should identify how to better manage fluctuating workload demand to avoid resource congestion due to noisy neighbor or node performance issues, whether coming from containers peaking together or congestion of compute or storage in the underlying cluster. To assure workloads have the right access to resources available to them, workload redistribution by way of rescheduling pods means pods can leverage available resources that are on other compliant nodes.
You want to be efficient with your resources, and better redistribution of workload should target efficiency to free up larger “blocks” of resources reducing pods that are stuck in a pending state – without introducing risk to performance. So, how do you then know which pod and node to move, while assuring that you don’t introduce a risk to another pod? This is where multi-dimensional analysis optimizes for rescheduling and scaling decisions, and makes them actionable, thanks to a third party scheduler interacting with the native one to execute the move.
Option 2: Auto-Scaling Infrastructure
It’s typical to be over-provisioned when beginning container adoption. However, this leads to wasting money if you continue to operate at 40- to 60-percent utilization because you do not have a better way to assess the impact of workloads scaling. On the other side of the Kube-coin, you want to auto-scale the cluster, which should be based on the demand you have, growth seen in the environment, and knowledge of pods in pending state. Intelligent scaling needs to assess demand against the availability of the underlying infrastructure — and if in the public cloud, against cost. Once you have a way to know when additional nodes should scale out or in, then you need to consider how to execute the action. Public Cloud offers scale sets/scale groups, but thresholds look at only the node.
Having a consistent way to scale Kubernetes nodes would allow for multidimensional analysis and consistent execution. My team over at Turbonomic is closely watching the SIGs Cluster API (use of MachineSets shows promise), Cluster Registry and Multicluster for the breakthrough that will make this possible.
Option 3: Multicloud
While you may not be there yet, containerized Kubernetes workloads offer true portability and can run anywhere, allowing organizations to leverage different infrastructure providers both to optimize cost and to access other services or functions. This requires application services, architecture and the right patterns ready for multicloud. Multicloud brings DevOps challenges with location and proximity of services to each other as additional dimensions in the analysis. This begs the question of “what’s next?”
It’s looking like composite services that will leverage containerized services and functions, which will further push the need to manage performance and efficiency based on Service Level Objectives. But, that’s a topic for another day.