Taking a proactive and efficient approach to Kubernetes cluster monitoring can help engineering teams identify and predict many critical problems like CPU outage, memory outage, storage issues well in advance of these issues taking a toll on a business. Companies of all sizes such as enterprises like CERN monitor petabytes of their Kubernetes cluster data to understand all their cluster workloads. Solving critical problems before they have the chance to make too significant an impact saves money, time, and reputation. The task is a challenge though as proper cluster monitoring can be a pain point for many companies as it’s important to be aware of what exactly we want to monitor in a cluster.
This article will discuss cluster monitoring fundamentals and how we can use Prometheus Operator to deploy Prometheus and Grafana to monitor a Kubernetes cluster.
What Is Cluster Monitoring?
Cluster monitoring is the process of monitoring all the components and resources running on a cluster. With this process, you actively check the health of all your services and applications and set up monitoring systems to send alerts to administrators to immediately notify them about problems. We can monitor CPU utilization, memory utilization, numbers of namespaces/pods/deployments/services running on the cluster, and many more resources.
Tools for Cluster Monitoring – Prometheus & Grafana
Prometheus and Grafana are two very popular tool choices for cluster monitoring.
Prometheus is an open-source monitoring system that collects the cluster data by sending HTTP requests to the metrics endpoints of the various resources running on the cluster. Prometheus stores data in a time-series database for analysis and alerting purposes.
Prometheus does generate raw visualizations of the metrics it collects. However, the final data images are not necessarily easy to navigate and understand. Optimizing Grafana to work alongside Prometheus allows you to combine the best features of both tools together. Grafana provides excellent cluster and data visualization images, plus the tool integrates with Prometheus seamlessly and generates beautiful dashboards for the cluster data.
Business Advantages of Cluster Monitoring
Cluster monitoring is crucial for any organization whose applications run on clusters. Any problem with the cluster can lead to a huge loss to the organization. For example, Moonlight had a 100% traffic outage due to their Kubernetes cluster issues.
Cluster monitoring:
- Saves a lot of time and money for the organization by identifying critical issues in the cluster.
- Helps in analyzing the cluster performance and measures critical information proactively.
- Identifies and helps avoid any upcoming downtime due to bad cluster resource consumption.
- Alerts the individual responsible in real-time about the problems in the cluster.
- Can prevent or predict any massive issue which can bring down the cluster.
- Maintains a pro-active health check on all the deployments and services.
Use Cases of Cluster Monitoring
- We can curate and visualize cluster data for a better understanding of the cluster by selecting the desired metrics we want to monitor.
- Cluster monitoring dashboards are easily shareable with the teams to share cluster insights with them.
- We can run ad-hoc queries on the cluster monitoring tool to explore the cluster data. We can also explore data in different time ranges, data sources, queries.
- Exploring logs is a fundamental use case for cluster monitoring which every administrator must do daily. We can also explore log metrics to understand data in detail that might not be visible in dashboards.
- We can write our own conditions to generate alerts via email, chat tools like slack, webhook, etc., for critical cases.
Monitoring with Prometheus Operator
We can use Prometheus Operator to manage Prometheus-based Kubernetes monitoring stack by implementing the Kubernetes operator pattern. These Kubernetes operators configure, manage, and optimize the deployment on a Kubernetes cluster automatically. Prometheus Operator uses four custom resource definitions (CRDs) – Prometheus, ServiceMonitor, PrometheusRule, Alertmanager to act on. As the advantages of using the operator pattern for deploying and configuring Prometheus, Grafana, and Alertmanager have become clear, several companies have also made this easier by packaging Prometheus Operator using Helm to make it easier to deploy and manage, for example:
- The Prometheus Operator entry on operatorhub.io originally written by the coreos team, and now maintained by Red Hat
- The loki-stack helm charts created by the team at Grafana Labs can install the Prometheus Operator along with Promtail and Grafana Loki to give you a unified observability option for metrics-based monitoring as well as powerful consolidated and searchable access to your logs for your Kubernetes workloads.
Prometheus Operator also has a kube-prometheus repository which is a combination of Kubernetes manifests, Grafana dashboard templates, and pre-generated Prometheus rules which configure the Prometheus Operator to enabling monitoring, observability, and alerting for the Kubernetes Cluster itself. Kube Prometheus consists of the below packages in the monitoring stack:
- The Prometheus Operator
- Highly available Prometheus
- Highly available Alertmanager
- Prometheus node-exporter
- Prometheus Adapter for Kubernetes Metrics APIs
- kube-state-metrics
- Grafana
Set-up Steps
Now, we will monitor a Kubernetes cluster with Prometheus Operator and visualize the monitoring components in Grafana. But you must have an up and running Kubernetes cluster before following the steps shown below.
Step 1: Clone Kube Prometheus from Prometheus operator git repository.
Conclusion
We hope this article helped you in understanding the importance of cluster monitoring and how Prometheus Operator can be the one-stop solution necessary to monitor your Kubernetes clusters with ease.
Caylent provides a critical DevOps-as-a-Service function to high growth companies looking for expert support with Kubernetes, cloud security, cloud infrastructure, and CI/CD pipelines. Our managed and consulting services are a more cost-effective option than hiring in-house, and we scale as your team and company grow. Check out some of the use cases, learn how we work with clients, and read more about our DevOps-as-a-Service offering.