For a while now, we have received feedback from customers that they would like to be able to connect their Spacelift account to external monitoring systems. Today we are excited to announce the Prometheus Exporter for Spacelift!
The Prometheus exporter allows you to monitor various metrics about your Spacelift account over time. You can then use tools like Grafana to visualize those changes and Alertmanager to take actions based on account metrics. Several metrics are available, and you can find the complete list of available metrics here. Below are a few examples of the information the exporter currently provides:
- The number of runs pending and currently executing in both public and private worker pools.
- The number of workers in a pool.
- Usage information, including the number of public and private worker minutes used during the current billing period.
Once you have that information, it opens a number of possibilities, including visualizing information about your account via Grafana dashboards, alerting on events like a lack of private workers, as well as using that information to autoscale worker pools via the Horizontal Pod Autoscaler.
To give you a taste, here’s an example Grafana dashboard showing some of the information available from the exporter:
The Prometheus exporter is an adaptor between Prometheus and the Spacelift GraphQL API. Whenever Prometheus asks for the current metrics, the exporter makes a GraphQL request and converts it into the metrics format Prometheus expects.
The following diagram gives an overview of this process:
Ok, great, but how do I use the Prometheus Exporter?
To help with setup and deployment, we are going to guide you through the following:
- Deploy a Prometheus stack to a Kubernetes cluster.
- Deploy the exporter.
- Configure Prometheus to monitor it.
The Quick Start section in the exporter repo outlines several options for deploying the exporter. Review the best deployment option that makes sense, depending on your setup and requirements.
If you already have a Prometheus stack setup and have plenty of experience, feel free to skip to the “Installing the Prometheus Exporter” section.
We also assume that you already have a Kubernetes cluster provisioned (hopefully via Spacelift!) and available to complete the following steps. If not, a local installation like Minikube will work fine for illustration purposes.
Installing kube-prometheus-stack
The first step is to install the kube-prometheus-stack Helm chart. You can do that with the following commands:
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm install --create-namespace -n monitoring kube-prometheus prometheus-community/kube-prometheus-stack
That will install a Prometheus, Grafana, and Alertmanager stack into your cluster’s namespace called monitoring
. Run the kubectl get pods -n monitoring
command to see the installed components.
$ kubectl get pods -n monitoring
NAME READY STATUS RESTARTS AGE
alertmanager-kube-prometheus-kube-prome-alertmanager-0 2/2 Running 14 (58m ago) 20d
kube-prometheus-grafana-5cd6d47467-2rt88 3/3 Running 21 (58m ago) 20d
kube-prometheus-kube-prome-operator-54b7488f58-7fmfv 1/1 Running 7 (58m ago) 20d
kube-prometheus-kube-state-metrics-8ccff67b4-zlfx9 1/1 Running 10 (58m ago) 20d
kube-prometheus-prometheus-node-exporter-zv8zn 1/1 Running 7 (58m ago) 20d
prometheus-kube-prometheus-kube-prome-prometheus-0 2/2 Running 14 (58m ago) 20d
Getting API Credentials
The Prometheus exporter authenticates to the Spacelift GraphQL API using an API key. Follow the guide to create a new API key required by the explorer. Please make sure that the API key you create has Admin permissions to your root space.
After you create your key, take a note of the API Key ID and API Key Secret – you’ll need both when configuring the exporter.
Installing the Prometheus Exporter
The exporter is available via a Docker image published to the public.ecr.aws/spacelift/promex
container registry. To deploy the exporter to Kubernetes, we need to create the following resources:
- A Deployment – to run the exporter container.
- A Service – to allow Prometheus to scrape the exporter.
- A ServiceMonitor – to let Prometheus know that it needs to scrape the exporter.
The following is an example Deployment definition for running the exporter. Make sure to replace the <account name>
, <API Key>
and <API Secret>
placeholders with the correct values:
apiVersion: apps/v1
kind: Deployment
metadata:
name: spacelift-promex
labels:
app: spacelift-promex
spec:
replicas: 1
selector:
matchLabels:
app: spacelift-promex
template:
metadata:
labels:
app: spacelift-promex
spec:
containers:
- name: spacelift-promex
image: public.ecr.aws/spacelift/promex:latest
ports:
- name: metrics
containerPort: 9953
readinessProbe:
httpGet:
path: /health
port: metrics
periodSeconds: 5
env:
- name: "SPACELIFT_PROMEX_API_ENDPOINT"
value: "https://<account name>.app.spacelift.io"
- name: "SPACELIFT_PROMEX_API_KEY_ID"
value: "<API Key>"
- name: "SPACELIFT_PROMEX_API_KEY_SECRET"
value: "<API Secret>"
- name: "SPACELIFT_PROMEX_LISTEN_ADDRESS"
value: ":9953"
Note: The example above defines the API key and secret as normal environment variables. We would recommend that you use Kubernetes Secrets for anything other than testing purposes.
Next, create a Service to expose the exporter:
apiVersion: v1
kind: Service
metadata:
name: spacelift-promex
labels:
app: spacelift-promex
spec:
selector:
app: spacelift-promex
ports:
- name: http-metrics
protocol: TCP
port: 80
targetPort: metrics
And finally create your ServiceMonitor:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: app-monitor
labels:
app: app-monitor
release: kube-prometheus
spec:
jobLabel: app-monitor
selector:
matchExpressions:
- {key: app, operator: Exists}
namespaceSelector:
matchNames:
- monitoring
endpoints:
- port: http-metrics
interval: 15s
path: "/metrics"
The ServiceMonitor definition above tells Kubernetes to scrape any services that contain an app
label. The granularity of the metrics can be increased or decreased depending on requirements and organizational standards. The above example is configured for 15-second intervals.
Viewing your Metrics
Once you have your Prometheus stack up and running and have deployed the Spacelift exporter, you can use port-forwarding to access each component. First, let’s port-forward the exporter to port 8080 locally using the following command:
kubectl port-forward service/spacelift-promex -n monitoring 8080:80
Assuming all is well, you should be able to see the raw metrics output by accessing http://localhost:8080/metrics:
You can also port-forward to your Grafana instance to view the metrics in Grafana:
kubectl port-forward service/kube-prometheus-grafana -n monitoring 8081:80
You can then quickly discover the available metrics via the Grafana Explore view:
One of the things that I think is amazing about the Prometheus stack is the ability to use PromQL queries not just for monitoring but for defining alerts. For example, if we want to trigger an alert whenever our worker pool has no available workers for a specific time period, we can use a query like this:
max(spacelift_worker_pool_workers) by (worker_pool_id, worker_pool_name) <= 0
Similarly, if we want to alert when the number of queued runs for a pool gets too high, we can use a query like this:
max(spacelift_worker_pool_runs_pending) by (worker_pool_id, worker_pool_name) >= 10
NOTE: This section is intended to outline some of the possibilities that exist when using the Prometheus exporter and should NOT be used as a guide for a production-ready autoscaling solution. For example, it does not consider properly draining workers before scaling them down to avoid in-progress runs being terminated.
Since we have metrics related to queued runs, we can use them to autoscale private workers. We can use the spacelift_worker_pool_runs_pending
metric, which tells us how many runs for a given worker pool are waiting to be scheduled, to detect when we need to add more workers to our pool. Similarly, we can use the spacelift_worker_pool_workers
and spacelift_worker_pool_workers_busy
metrics to decide when to scale down.
We need to take both sets of metrics into account to avoid scaling down just because there are no queued runs. In this situation, we might still have runs in progress that need workers.
For this to work, we need the following components available:
- A working Prometheus stack with the Spacelift Prometheus Exporter running.
- A Spacelift worker pool.
- A prometheus-adapter installation.
- A Horizontal Pod Autoscaler resource to tell Kubernetes how to scale our worker pool.
For the sake of simplicity, I’m going to assume that you already have steps 1 and 2 covered, and let’s assume that you’ve deployed the Spacelift worker pool chart with the default settings. I’ll also assume that you have a standard installation of the kube-prometheus-stack chart.
The first thing we need to do is set up a Prometheus-adapter installation. The Prometheus-adapter works like a bridge between the Kubernetes metrics API and your Prometheus installation. It allows you to use Prometheus metrics to make autoscaling decisions within your cluster. We can visualize it something like this:
The Prometheus-adapter uses a configuration format to map between Prometheus queries and Kubernetes metrics. In our case, we can use something like the following to generate the two metrics we need:
prometheus:
# The URL points at the Kubernetes Service for Prometheus
url: "http://kube-prometheus-kube-prome-prometheus"
rules:
default: false
external:
# Define the spacelift_worker_pool_runs_pending metric
- seriesQuery: '{__name__=~"^spacelift_worker_pool_runs_pending$"}'
resources:
template: <<.Resource>>
name:
as: "spacelift_worker_pool_runs_pending"
metricsQuery: max(spacelift_worker_pool_runs_pending) by (worker_pool_id, worker_pool_name)
# Define the spacelift_worker_pool_utilization metric
- seriesQuery: '{__name__=~"^spacelift_worker_pool_workers$"}'
resources:
template: <<.Resource>>
name:
as: "spacelift_worker_pool_utilization"
metricsQuery: |
(max(spacelift_worker_pool_workers_busy) by (worker_pool_id, worker_pool_name)
/ max(spacelift_worker_pool_workers) by (worker_pool_id, worker_pool_name))
or vector(0)
After deploying the adapter, you can query your metrics via the Kubernetes API, using the following commands (assuming you’ve deployed everything to a namespace called monitoring
):
kubectl get --raw /apis/external.metrics.k8s.io/v1beta1/namespaces/monitoring/spacelift_worker_pool_runs_pending
kubectl get --raw /apis/external.metrics.k8s.io/v1beta1/namespaces/monitoring/spacelift_worker_pool_utilization
Finally, we can create a Horizontal Pod Autoscaler definition to scale our worker pool Deployment based on those metrics automatically:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: spacelift-worker-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: spacelift-worker
minReplicas: 1
maxReplicas: 15
metrics:
- type: External
external:
metric:
name: spacelift_worker_pool_runs_pending
selector:
matchLabels:
worker_pool_id: "01G8Y06VGHCT17453VEE9T4YBZ"
target:
type: AverageValue
averageValue: 1
- type: External
external:
metric:
name: spacelift_worker_pool_utilization
selector:
matchLabels:
worker_pool_id: "01G8Y06VGHCT17453VEE9T4YBZ"
target:
type: Value
value: 0.8
Notice the metrics are grouped by the worker_pool_id
; we can use this to target an individual worker pool when creating our HPA!
That’s it – bask in your autoscaling glory:
We are excited to learn how our customers use the exporter and the problems they solve with it! Feel free to provide feedback and comments via Issues in the Prometheus Exporter Github repository.
The most Flexible CI/CD Automation Tool
Spacelift is an alternative to using homegrown solutions on top of a generic CI. It helps overcome common state management issues and adds several must-have capabilities s for infrastructure management.