Kubernetes

Setting up Prometheus Monitoring On a Kubernetes Cluster

How to Setup Prometheus Monitoring On a Kubernetes Cluster

Kubernetes is the most popular orchestrator for running containerized workloads in production. It gives you a complete set of tools for deploying, scaling, and administering your containers.

Kubernetes alone isn’t enough to successfully operate apps, though. You also need visibility into cluster utilization, performance, and any errors that occur. Prometheus is an open-source monitoring system that collects metrics in a time series database, allowing you to answer these questions.

In this article, you’ll learn how to set up and use Prometheus with your Kubernetes cluster. We’ll cover the basics of installing Prometheus, querying data, setting up visual dashboards, and managing alerting rules. You’ll need Kubectl, Helm, and a Kubernetes cluster before you begin. Let’s get started!

Using Kube-Prometheus-Stack

The kube-prometheus-stack Helm chart is the simplest way to bring up a complete Prometheus stack inside your Kubernetes cluster. It bundles several different components in one automated deployment:

  • Prometheus – Prometheus is the time series database that scrapes, stores, and exposes the metrics from your Kubernetes environment and its applications.
  • Node-Exporter – Prometheus works by scraping data from a variety of configurable sources called exporters. Node-Exporter is an exporter which collects resource utilization data from the Nodes in your Kubernetes cluster. The kube-prometheus-stack chart automatically deploys this exporter and configures your Prometheus instance to scrape it.
  • Kube-State-MetricsKube-State-Metrics is another exporter that supplies data to Prometheus. It exposes information about the API objects in your Kubernetes cluster, such as Pods and containers.
  • Grafana – Although you can directly query Prometheus, this is often tedious and repetitive. Grafana is an observability platform that works with several data sources, including Prometheus databases. You can use it to create dashboards that surface your Prometheus data.
  • AlertmanagerAlertmanager is a standalone Prometheus component that provides notifications when metrics change. You can use it to get an email when CPU utilization spikes or a Slack notification if a Pod is evicted, for example.

Deploying, configuring, and maintaining all these components individually can be burdensome for administrators. Kube-Prometheus-Stack provides an automated solution that performs all the hard work for you.

Installing Kube-Prometheus-Stack

First register the chart’s repository in your Helm client:

$ helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
"prometheus-community" has been added to your repositories

Next, update your repository lists to discover the chart:

$ helm repo update

Now you can run the following command to deploy the chart into a new namespace in your cluster:

$ helm install kube-prometheus-stack \
  --create-namespace \
  --namespace kube-prometheus-stack \
  prometheus-community/kube-prometheus-stack
NAME: kube-prometheus-stack
LAST DEPLOYED: Tue Jan  3 14:26:18 2023
NAMESPACE: kube-prometheus-stack
STATUS: deployed
REVISION: 1
NOTES:
kube-prometheus-stack has been installed. Check its status by running:
  kubectl --namespace kube-prometheus-stack get pods -l "release=kube-prometheus-stack"

It can take a couple of minutes for the chart’s components to start. Run the following command to check how they’re progressing:

$ kubectl -n kube-prometheus-stack get pods
NAME                                                       READY   STATUS    RESTARTS      AGE
alertmanager-kube-prometheus-stack-alertmanager-0          2/2     Running   1 (66s ago)   83s
kube-prometheus-stack-grafana-5cd658f9b4-cln2c             3/3     Running   0             99s
kube-prometheus-stack-kube-state-metrics-b64cf5876-52j8l   1/1     Running   0             99s
kube-prometheus-stack-operator-754ff78899-669k6            1/1     Running   0             99s
kube-prometheus-stack-prometheus-node-exporter-vdgrg       1/1     Running   0             99s
prometheus-kube-prometheus-stack-prometheus-0              2/2     Running   0             83s

Once all the Pods show as Running, your monitoring stack is ready to use. The data exposed by the exporters will be automatically scraped by Prometheus.

Now you can start querying your metrics.

Running a Prometheus Query

Prometheus includes a web UI that you can use to query your data. This is not exposed automatically. You can access it by using Kubectl port forwarding to redirect local traffic to the service in your cluster:

$ kubectl port-forward -n kube-prometheus-stack svc/kube-prometheus-stack-prometheus 9090:9090
Forwarding from 127.0.0.1:9090 -> 9090
Forwarding from [::1]:9090 -> 9090

This command redirects traffic to localhost:9090 to the Prometheus service. Visiting this URL in your web browser will reveal the Prometheus UI:

prometheus monitring prometheus UI

The “Expression” input at the top of the screen is where you enter your queries as PromQL expressions. Start typing into the input to reveal autocomplete suggestions for the available metrics.

Try selecting the node_memory_Active_bytes metric, which surfaces the memory consumption of each of the Nodes in your cluster. Press the “Execute” button to run your query. The results will be displayed in a table that provides the query’s raw output:

Most metrics are easier to interpret as graphs.

Switch to the “Graph” tab at the top of the screen to see a visualization of the metric over time. You can use the controls above the graph to change the time period that’s displayed.

prometheus monitoring graphs

PromQL queries allow detailed interrogation of your data. Manually running individual queries in the Prometheus UI is an inefficient form of monitoring, however.

Next, let’s use Grafana to visualize metrics conveniently on live dashboards.

Using Grafana Dashboards

Start a new Kubectl port forwarding session to access the Grafana UI. Use port 80 as the target because this is what the Grafana service binds to.

You can map it to a different local port, such as 8080, in this example:

$ kubectl port-forward -n svc/kube-prometheus-stack-grafana 8080:80
Forwarding from 127.0.0.1:8080 -> 3000
Forwarding from [::1]:8080 -> 3000

Next visit http://localhost:8080 in your browser. You’ll see the Grafana login page. The default user account is admin with a password of prom-operator.

After you’ve logged in, you’ll initially reach the Grafana welcome screen:

Use the sidebar to switch to the Dashboards screen. Its icon is four squares arranged to resemble panes of glass. This is where all your saved dashboards can be found, including the prebuilt ones that come with Kube-Prometheus-Stack deployments.

prometheus monitoring Dashboards screen

Exploring the Grafana Prebuilt Dashboards

There are several included dashboards that contain the metrics scraped from Node-Exporter, Kube-State-Metrics, and various Kubernetes and Prometheus components. Here are a few notable ones:

Monitoring Cluster Utilization With “Kubernetes / Compute Resources / Cluster”

This dashboard provides an overview of the resource utilization for your entire cluster. Headline statistics are displayed at the top, with more detailed information presented in panels below.

prometheus Monitoring Cluster Utilization

Viewing a Node’s Resource Consumption With “Node Exporter / Nodes”

Data collected by Node-Exporter is provided by this dashboard. It shows detailed resource utilization information on a per-Node basis. You can change the selected Node using the “instance” dropdown at the top of the dashboard.

prometheus Viewing a Node's Resource Consumption

Viewing the Resource Consumption of Individual Pods With “Kubernetes / Compute Resources / Pod”

This dashboard shows the resource requests, limits, quotas, and utilization for individual Pods. You can select the namespace and Pod to view from the dropdowns at the top of the screen.

The time frame can be customized on all Grafana dashboards using the controls in the top-right corner of the screen. You can refresh the data or change the auto-refresh interval with the button next to the time frame selector.

Configuring Alerts With Alertmanager

Monitoring must be automated to be effective. You need to receive alerts when important metric stops meeting expectations, such as when a spike in memory consumption occurs. Otherwise, you have to continually check your dashboards or run queries to determine whether you need to take action.

Prometheus includes Alertmanager to send you a notification when your metrics trigger an alert. Alertmanager supports multiple receivers that act as destinations for your alerts, such as email, Slack, messaging apps, and your own webhooks.

Kube-Prometheus-Stack’s bundled Alertmanager is configured by merging in custom chart values when you deploy the stack with Helm. First, prepare a YAML file that nests your Alertmanager settings under the top-level alertmanager key. Here’s an example that sends all alerts to a webhook URL:

alertmanager:
  config:
    global:
      resolve_timeout: 5m
    route:
      receiver: demo-webhook
      group_wait: 5s
      group_interval: 10s
      repeat_interval: 1h
    receivers:
      - name: "null"
      - name: demo-webhook
        webhook_configs:
          - url: http://example.com/webhook
            send_resolved: true

The route section specifies that alerts should be directed to the demo-webhook receiver. This is configured to send a POST request to http://example.com/webhook each time an alert is triggered or resolved. The request’s payload is described in the Alertmanager documentation. Note that the extra "nullā€ receiver is required due to a bug that otherwise prevents your route from working.

Save your YAML file to alertmanager-config.yaml in your working directory. Next run the following command to redeploy the Prometheus stack and apply your Alertmanager settings:

$ helm upgrade --reuse-values \
  -f alertmanager-config.yaml \
  -n kube-prometheus-stack \
  kube-prometheus-stack
  prometheus-community/kube-prometheus-stack

Don’t worry – you won’t lose any of your existing data. The command performs an in-place upgrade of your deployment.

It could take a few minutes for Alertmanager to reload its configuration after the deployment completes. You’ll then begin to receive requests to your webhook URL, as alerts are triggered.

To send a test alert, first start a port forwarding session to your Alertmanager instance:

$ kubectl port-forward -n kube-prometheus-stack svc/kube-prometheus-stack-alertmanager 9093:9093

Next run the following command to simulate triggering a basic alert from a Kubernetes service in a specific namespace:

$ curl -H 'Content-Type: application/json' -d '[{"labels":{"alertname":"alert-demo","namespace":"demo","service":"demo"}}]' http://127.0.0.1:9093/api/v1/alerts

After a few moments, you should receive a request to your webhook URL. The request’s body will describe the alert’s details.

Key Points

Good observability is essential for Kubernetes clusters running production workloads. You need to understand resource utilization, see where Pods are being scheduled, and track the errors and logs emitted by your applications.

Kube-Prometheus-Stack is a convenient route to setting up monitoring for your cluster. It configures Prometheus, Grafana, Alertmanager, and vital metrics exporters for you, reducing maintenance overheads. The basic installation comes with useful prebuilt dashboards that you can extend with custom queries and metrics scraped from your own applications. Instrumenting a system for Prometheus is a complex topic, but you can get started by exploring the official client libraries for exporting metrics from your code.

Need an even simpler way to manage CI/CD pipelines on Kubernetes? Check out how Spacelift can help you cut down complexity and automate your infrastructure. It’s even got a Prometheus exporter ready to deliver metrics from your Spacelift account to your Grafana dashboards and other tools! Learn more with our tutorial on Monitoring Your Spacelift Account via Prometheus.

The Most Flexible CI/CD Automation Tool

Spacelift is an alternative to using homegrown solutions on top of a generic CI. It helps overcome common state management issues and adds several must-have capabilities for infrastructure management.

Start free trial