Digital transformation is creating intense pressure on IT to deliver applications and services faster and with greater efficiency. This requires transforming IT and how it operates.
The status quo – IT operating in silos – must change. It’s an illusion that resources can be managed in each layer of the stack, in isolated technology silos, and that somehow applications will magically get the resources they need when they need them.
In the last few decades, IT management has been on a journey to nowhere. Many different moving parts with complex interactions make it difficult to effectively track, monitor and control environments. For years, in trying to address the broad range of pain points, the industry has been throwing more and more management tools and products at the problem, while still failing to address the pain. Instead of addressing the management challenges, this increased the TCO and created an operational and administrative nightmare – what I call, “IT management on drugs.”
For example, when an application developer deploys an application, the PaaS services he/she requests need to be deployed. The IaaS must allocate the compute, storage and network required by these services. The compute, storage and network must have enough CPU, memory, I/O and bandwidth to support the demand of the deployed application. Once the application is deployed, its demand will continuously fluctuate. As the demand fluctuates, the resources allocated along all the layers must be adjusted. Most environments now have several layers of abstraction and decoupling between the applications and the CPU, memory, I/O and bandwidth they consume – but at the end of the day, the application needs these resources in order to perform.
It is not possible to deliver services with the required agility, elasticity and scale if each layer (PaaS and IaaS) and each technology (compute, storage and network), continue to operate in silos.
Breaking Down Silos is Not Enough
At every layer, in every silo, humans are analyzing massive amounts of data to make three types of decisions: Do we need more of something? Do we need less of something? And who should consume what and from whom (or where to place something)? This doesn’t scale. At any typical enterprise, there are thousands of such decisions to be made each day. It is not possible for a human to keep up and continuously, and consistently, make the correct decisions. This means that IT will be unable to deliver the agility, elasticity and scale required by the digital world if it continues to rely and operate with humans making these decisions. IT operations must embrace software systems to automatically make these decisions and take action.
This requires more than a collection of niche tools. IT needs integrated areas of topology scope, data collection and management functions with context and purpose. It calls for a common data model to semantically represent all the interdependencies among the broad range of entities spanning physical compute, virtual compute, physical network devices, virtual networks, physical storage devices, virtual storage, OS, applications and public workloads running in AWS, Azure, etc.
IT operations needs common abstraction and a common semantics that aren’t present in a patchwork of automation tools, basic performance monitoring tools, identifying abnormalities tools, capacity management tools, change management tools, log management tools, compliance tools and troubleshooting tools.
The Desired State
IT’s goal is to assure application performance, and to do so an environment must be maintained in the “Desired State”: a state in which application performance is assured WHILE the environment is utilized as efficiently as possible, minimizing cost and maintaining compliance with business policies. Achieving this requires continuously making the right trade-offs across many dimensions, continuously and in real-time. Achieving the desired state is a trade-off between:
- budget and cost
- resiliency, performance and agility
- application performance and infrastructure utilization
- workload QoS and sweating assets
- compute, storage and network bandwidth
- compute, storage and endpoint latencies
- infrastructure constraints and business constraints
- compute and storage
- CPU, memory, I/O, network, ready Qs, latency, etc.
- application priorities
- liquidity and compliance regulations
To complicate matters even more, this occurs in an “N-dimensional” space. Every layer of the stack consists of different resources, which add multiple dimensions that must be considered among multiple trade-offs. And they must be satisfied across multiple entities in the data center in order to achieve a desired state. Don’t forget that all of this must be managed while balancing business priorities.
Managing IT is complex, involving many different entities that may each be in many different states. Managing such an environment with endless low-level rules is untenable, which is why the abstraction of generic concepts and behaviors that are simple and able to scale matters so much.
Abstraction hides the messy details of the managed environment yet exposes the necessary level to control and maintain a “healthy” environment. Abstraction scales to large environments by collecting and analyzing only the required information, avoiding the “big data” problem that can manifest in larger environments. It also simplifies the management of complex heterogeneous environments and allows managing them without having to be skilled in all of the underlying platforms. Additionally, abstraction alleviates platform lock-in, eliminating the need to implement a series of platform-specific proprietary tools. It also reduces analysis complexity and with a common abstraction, analytics need to deal with only one resource, say, disk I/O and not many different incarnations of different device models.
Last, but certainly not least, abstraction solves the N-dimensional trade-offs equation. Without the proper abstraction, the fundamental problem of figuring out the trade-offs can’t be solved. How would you compare/evaluate the proper trade-off between CPU, memory, IOPS, network and storage latency, response time, TPS, heap size, connection pools, etc.? To solve the problem, we must be able to evaluate/compare all the dimensions. The only way to do it is with a common abstraction that provides a mechanism for comparison between all the dimensions.
Don’t mistake the automation of any single decision by a point tool as a viable solution. Any approach that looks at a specific resource or subset of resources in isolation cannot drive the environment to a desired state. A common abstraction is necessary, and the only way to manage massive and increasingly complex IT environments to maintain the desired state is through automation. And, in order to automate the infinite moving parts of an environment, abstraction is key.
The Challenge Ahead
The industry is poised for disruption. This is an opportunity to take a new approach to make the next evolution of IT one that will scale more effectively and efficiently along with the capabilities of the technologies driving it. If we try to scale the infrastructure without changing the way we manage and operate that infrastructure, successes can only occur in silos – if at all. A dynamic future needs dynamic, self-managing infrastructure which gives the right level of abstraction and removes the barriers and burden on the business and IT operations teams.
Editorial Note: This article was originally published by Data Center Knowledge.