09.10.20

What’s‌ ‌Next‌ ‌in‌ ‌DevOps:‌ ‌AIOps?‌

By Juan Ignacio Giro

Since the days of gigantic data warehouses and big data analysis, experts have been saying how big IT infrastructures are becoming today, and how—soon enough—they will be too massive and complex to manage manually. New tools are being introduced to simplify IT infrastructure management all the time it seems. Containerization, for example, is now an entirely automated process that is completely abstracted from the hardware and software layers underneath it.

Regardless of the automation tools and the many options for simplifying IT infrastructure, it is only a matter of time before we need to address the process itself. DevOps specialists are working at capacity. Once again, artificial intelligence (AI) is the solution that everyone is focusing on. Imagine features such as continuous monitoring, adaptive (or predictive) cloud management, and containerization on a level that we have never seen before; wouldn’t that be the future?

AIOps as a Solution

AIOps is not a new term. It was first used in 2017 when the utilization of big data and modern machine learning was first introduced in the field of IT operations. It was the first time big data and machine learning were used to directly enhance IT operations, mostly in the form of constant monitoring, faster and more efficient alerting, and personalized insights generation.

Since then, AIOps has been seen as the way forward. There are data warehouses that collect data for the purpose of finding new, better ways to manage IT services and infrastructure. It allows for a more proactive approach to IT ops management rather than a reactive one. Instead of waiting for problems to occur, constant monitoring and predictive analysis allow AI to prevent problems.

For operations personnel, predicting problems is virtually impossible. For AI, it is a matter of taking a deep dive into historic data, recognizing patterns and outage causes, and then creating a predictive model for early alerting and issue prevention. Downtime can be kept at minimum, too, because AIOps can also automate redundancies.

AIOps and its reliance on data also allows for IT infrastructure and operational decisions to be entirely based on data and insights. Operators no longer have to do a manual analysis of existing conditions and can instead give AI some guidelines—more like ground rules—to allow it to make instantaneous decisions when needed.

Last but certainly not least, AI significantly reduces the Mean Time to Repair or MTTR. While prevention can help reduce downtime by a significant margin, some problems are just impossible to avoid. In this situation, AIOps provides assistance to operators in solving the problem, all while speeding things up altogether.

Why AIOps?

There is no doubt that AIOps, and the use of AI in IT infrastructure, comes with some advantages. Shorter MTTR and better automation are just a few examples. With enough data and a deeper understanding of the infrastructure, it is not difficult to imagine AI being more involved in tasks such as deploying updates and managing the cost efficiency of complex cloud infrastructures.

However, we have to return to the most basic question: why do we need AI in IT operations? The answer is right in front of us all along. We’ve been pushing for digital transformations for several years now, and that push for digital transformation means demand for more cloud resources—and more complex architecture—is on a steep rise.

IT infrastructure and operations are quickly exceeding our limits as operators. While we can make cloud-native apps immensely scalable thanks to microservices and the tools we have today, it is us—the human operators—that will eventually become the bottleneck of this scalability. AI doesn’t have the same limit.

At the same time, AI can crunch to vast amounts of data in seconds while remaining immensely efficient. Larger data warehouses lead to better analysis models. Better analysis models lead to better insights. Better insights, eventually, lead to better IT decisions; they can be made at a faster pace and in a more consistent fashion too.

Edge computing is a primary driver of AIOps, especially for wide-network environments. As edge computing devices can provide higher processing power, AI and computational runtimes can be moved to the last-mile portion of the cloud infrastructure. The result is faster and more efficient infrastructure, regardless of its complexity.

AI also frees up valuable resources, mainly the time and energy of lead developers and IT specialists. Since developers and operators no longer have to do monitoring tasks, they can focus on more important things such as improving their applications, becoming more agile as an organization, and bringing more innovation to the table.

As a bonus, we have the fact that AI is not here to replace us. It augments our ability to perform IT ops tasks and fill strategic roles. Specialists will still need to expand their skills to remain relevant, but there are plenty of opportunities to do so with the help of AIOps. You’ll be surprised by the number of new jobs already being created right now.

AIOps as the Future

The next thing to understand about AIOps is its components. AI relies on data and some degree of human assistance to operate properly. While we now have AI entities capable of unassisted learning, most machine learning runtimes still require annotation and data tagging before they can recognize patterns.

A more diverse collection of data sets is also a necessity, but this is something a lot of AIOps experts are already pushing. Operations management and service management need to work together in order for the organization to have capable AIOps. For this to happen, the data silos we normally deal with today need to be broken down.

Of course, there is the machine learning element itself. Machine learning produces more accurate models when the data it consumes is relevant and contextual. You cannot expect to use ML and immediately have an accurate model to work with out of the box; it will take some time and deep observation before AIOps can be mature enough for long-term use.

These components in mind, it is easy to see AIOps as the future. AI is the perfect technology to augment IT operations and service management. AIOps implementation on a larger scale is inevitable.


Caylent provides a critical DevOps-as-a-Service function to high growth companies looking for expert support with Kubernetes, cloud security, cloud infrastructure, and CI/CD pipelines. Our managed and consulting services are a more cost-effective option than hiring in-house, and we scale as your team and company grow. Check out some of the use cases, learn how we work with clients, and read more about our DevOps-as-a-Service offering.