Taking a proactive and efficient approach to Kubernetes cluster monitoring can help engineering teams identify and predict many critical problems like CPU outage, memory outage, storage issues well in advance of these issues taking a toll on a business. Companies of all sizes such as enterprises like CERN monitor petabytes of their Kubernetes cluster data to understand all their cluster workloads. Solving critical problems before they have the chance to make too significant an impact saves money, time, and reputation. The task is a challenge though as proper cluster monitoring can be a pain point for many companies as it’s important to be aware of what exactly we want to monitor in a cluster.
This article will discuss cluster monitoring fundamentals and how we can use Prometheus Operator to deploy Prometheus and Grafana to monitor a Kubernetes cluster.
What Is Cluster Monitoring?
Cluster monitoring is the process of monitoring all the components and resources running on a cluster. With this process, you actively check the health of all your services and applications and set up monitoring systems to send alerts to administrators to immediately notify them about problems. We can monitor CPU utilization, memory utilization, numbers of namespaces/pods/deployments/services running on the cluster, and many more resources.
Tools for Cluster Monitoring – Prometheus & Grafana
Prometheus and Grafana are two very popular tool choices for cluster monitoring.
Prometheus is an open-source monitoring system that collects the cluster data by sending HTTP requests to the metrics endpoints of the various resources running on the cluster. Prometheus stores data in a time-series database for analysis and alerting purposes.
Prometheus does generate raw visualizations of the metrics it collects. However, the final data images are not necessarily easy to navigate and understand. Optimizing Grafana to work alongside Prometheus allows you to combine the best features of both tools together. Grafana provides excellent cluster and data visualization images, plus the tool integrates with Prometheus seamlessly and generates beautiful dashboards for the cluster data.
Business Advantages of Cluster Monitoring
Cluster monitoring is crucial for any organization whose applications run on clusters. Any problem with the cluster can lead to a huge loss to the organization. For example, Moonlight had a 100% traffic outage due to their Kubernetes cluster issues.
- Saves a lot of time and money for the organization by identifying critical issues in the cluster.
- Helps in analyzing the cluster performance and measures critical information proactively.
- Identifies and helps avoid any upcoming downtime due to bad cluster resource consumption.
- Alerts the individual responsible in real-time about the problems in the cluster.
- Can prevent or predict any massive issue which can bring down the cluster.
- Maintains a pro-active health check on all the deployments and services.
Use Cases of Cluster Monitoring
- We can curate and visualize cluster data for a better understanding of the cluster by selecting the desired metrics we want to monitor.
- Cluster monitoring dashboards are easily shareable with the teams to share cluster insights with them.
- We can run ad-hoc queries on the cluster monitoring tool to explore the cluster data. We can also explore data in different time ranges, data sources, queries.
- Exploring logs is a fundamental use case for cluster monitoring which every administrator must do daily. We can also explore log metrics to understand data in detail that might not be visible in dashboards.
- We can write our own conditions to generate alerts via email, chat tools like slack, webhook, etc., for critical cases.
Monitoring with Prometheus Operator
We can use Prometheus Operator to manage Prometheus-based Kubernetes monitoring stack by implementing the Kubernetes operator pattern. These Kubernetes operators configure, manage, and optimize the deployment on a Kubernetes cluster automatically. Prometheus Operator uses four custom resource definitions (CRDs) – Prometheus, ServiceMonitor, PrometheusRule, Alertmanager to act on. As the advantages of using the operator pattern for deploying and configuring Prometheus, Grafana, and Alertmanager have become clear, several companies have also made this easier by packaging Prometheus Operator using Helm to make it easier to deploy and manage, for example:
- The Prometheus Operator entry on operatorhub.io originally written by the coreos team, and now maintained by Red Hat
- The loki-stack helm charts created by the team at Grafana Labs can install the Prometheus Operator along with Promtail and Grafana Loki to give you a unified observability option for metrics-based monitoring as well as powerful consolidated and searchable access to your logs for your Kubernetes workloads.
Prometheus Operator also has a kube-prometheus repository which is a combination of Kubernetes manifests, Grafana dashboard templates, and pre-generated Prometheus rules which configure the Prometheus Operator to enabling monitoring, observability, and alerting for the Kubernetes Cluster itself. Kube Prometheus consists of the below packages in the monitoring stack:
- The Prometheus Operator
- Highly available Prometheus
- Highly available Alertmanager
- Prometheus node-exporter
- Prometheus Adapter for Kubernetes Metrics APIs
Now, we will monitor a Kubernetes cluster with Prometheus Operator and visualize the monitoring components in Grafana. But you must have an up and running Kubernetes cluster before following the steps shown below.
Step 1: Clone Kube Prometheus from Prometheus operator git repository.
ubuntu@ubuntu:~$ git clone https://github.com/prometheus-operator/kube-prometheus Receiving objects: 100% (11526/11526), 5.89 MiB | 3.33 MiB/s, done. Resolving deltas: 100% (7136/7136), done.
Step 2: Using the configs present in the manifest directory, create the monitoring stack. This will create a lot of CRDs and a namespace – “monitoring”.
ubuntu@ubuntu:~$ cd kube-prometheus ubuntu@ubuntu:~/kube-prometheus$ kubectl create -f manifests/setup namespace/monitoring created customresourcedefinition.apiextensions.k8s.io/alertmanagerconfigs.monitoring.coreos.com created customresourcedefinition.apiextensions.k8s.io/alertmanagers.monitoring.coreos.com created deployment.apps/prometheus-operator created service/prometheus-operator created serviceaccount/prometheus-operator created ubuntu@ubuntu:~/kube-prometheus$ until kubectl get servicemonitors --all-namespaces ; do date; sleep 1; echo ""; done No resources found ubuntu@ubuntu:~/kube-prometheus$ kubectl create -f manifests/
Step 3: Check all the resources created for monitoring namespace. We can see multiple pods, daemonsets, services are now running on the cluster.
ubuntu@ubuntu:~/kube-prometheus$ kubectl get all -n monitoring NAME READY STATUS RESTARTS AGE pod/alertmanager-main-0 2/2 Running 0 3m35s pod/alertmanager-main-1 2/2 Running 0 3m35s pod/grafana-665447c488-9snqs 1/1 Running 0 3m32s pod/kube-state-metrics-6f4dfb9ffb-g4gb7 3/3 Running 0 3m32s pod/prometheus-k8s-0 2/2 Running 1 3m30s pod/prometheus-k8s-1 2/2 Running 2 3m30s pod/prometheus-operator-764cb46c94-jdd28 2/2 Running 0 5m1s NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/alertmanager-main ClusterIP 10.110.145.114 <none> 9093/TCP 3m36s service/alertmanager-operated ClusterIP None <none> 9093/TCP,9094/TCP,9094/UDP 3m35s service/grafana ClusterIP 10.102.87.41 <none> 3000/TCP 3m33s service/kube-state-metrics ClusterIP None <none> 8443/TCP,9443/TCP 3m33s service/prometheus-operator ClusterIP None <none> 8443/TCP 5m2s NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/grafana 1/1 1 1 3m33s deployment.apps/kube-state-metrics 1/1 1 1 3m33s deployment.apps/prometheus-adapter 1/1 1 1 3m31s deployment.apps/prometheus-operator 1/1 1 1 5m3s
Step 4: If we go to the Kubernetes dashboard, we can see all the namespaces and custom resource definitions present on the cluster.
Step 5: Access the dashboard of Prometheus, Grafana using the below commands. Prometheus will be running on port 9090 and Grafana on 3000.
ubuntu@ubuntu:~/kube-prometheus$ kubectl --namespace monitoring port-forward svc/prometheus-k8s 9090 Forwarding from 127.0.0.1:9090 -> 9090
ubuntu@ubuntu:~/kube-prometheus$ kubectl --namespace monitoring port-forward svc/grafana 3000 Forwarding from 127.0.0.1:3000 -> 3000
Step 6: Monitor the cluster components and resources using Grafana.
Click on Manage.
Select the Default folder, you will get plenty of cluster resources to monitor. Choose the resources you want to monitor.
Finally, your cluster monitoring visualization will be ready.
In this snapshot, the Grafana dashboard monitors the cluster compute resources such as CPU utilization, memory limits, etc.
In this snapshot, the dashboard is monitoring the bandwidth used on the Kubernetes cluster.
We hope this article helped you in understanding the importance of cluster monitoring and how Prometheus Operator can be the one-stop solution necessary to monitor your Kubernetes clusters with ease.
Caylent provides a critical DevOps-as-a-Service function to high growth companies looking for expert support with Kubernetes, cloud security, cloud infrastructure, and CI/CD pipelines. Our managed and consulting services are a more cost-effective option than hiring in-house, and we scale as your team and company grow. Check out some of the use cases, learn how we work with clients, and read more about our DevOps-as-a-Service offering.