Autoscaling Native Kubernetes Workers

28 Aug 2024·7 min read

🚀 Level Up Your Infrastructure Skills

You focus on building. We’ll keep you updated. Get curated infrastructure insights that help you make smarter decisions.

Autoscaling is a feature that allows you to automatically adjust the number of workers in the cluster based on the current queue length.

This is useful when you want to scale workers in or out based on the number of jobs pending inside the Spacelift platform.

This tutorial combines a few open-source technologies to achieve this:

A Spacelift Worker Pool Controller is installed in the cluster to manage the worker pool.
A Prometheus Exporter is installed in the cluster to expose the queue length of the worker pool.
A KEDA ScaledObject is installed in the cluster to monitor the queue length and scale the worker pool based on the queue length from a Prometheus query.

Note: This feature is in beta and may have some limitations. Spacelift is asking for customer feedback on this feature to improve it. If you have any feedback, please reach out to your customer success manager or support.

Prerequisites

A Kubernetes cluster with kubectl configured
Helm 3.x
A Spacelift account with private workers
Keda installed in the cluster
Kube Prom Stack installed in the cluster
- Ensure Prometheus can discover PodMonitors. This can be done by running the following:

helm upgrade kube-prom-stack prometheus-community/kube-prometheus-stack --reuse-values --set prometheus.prometheusSpec.podMonitorSelectorNilUsesHelmValues=false

Step 1: Create the worker pool CSR

First, you need to create a worker pool in Spacelift. To do this, you need to create a Certificate Signing Request (CSR) and send it to Spacelift.

To create the CSR, run the following command:

openssl req -new -newkey rsa:4096 -nodes -keyout spacelift.key -out spacelift.csr

Fill out the details as required (the details from the CSR are not used by Spacelift so their full details, like country code, domain, company name, etc, don’t matter).

This will generate two files: spacelift.csr and spacelift.key. Save these files — you will need both of them later in the process.

Step 2: Create the worker pool

The second step is to actually create the worker pool in Spacelift using the CSR we generated in the previous step.

In Spacelift, head to the Worker pools screen and click Create worker pool.

On the next screen, give the worker pool a name, upload the spacelift.csr file you created previously, and assign a space to the worker pool.

Click Create to create the worker pool. A file will automatically download. Keep it safe with the CSR and Key files as you will need it later.

Step 3: Create an API key for the Prometheus Exporter

Next, we need to create an API key for the Prometheus exporter to authenticate with Spacelift.

In the Spacelift UI, head to Organization settings -> API Keys and click Create API Key.

Give the API key a name and Space, set the Type to Secret, then click Create. This will also download a file. Keep that file safe with your CSRs and worker pool files. Note the Key ID in the UI, as well.

Note: If you are using a login policy, you may need to allow the API key in the policy. Follow up with our Login Policy docs on how to do this.

Checkup One

At this point in the tutorial, you should have four files in total, and you should have noted the API key ID.

spacelift.csr
spacelift.key
The file that was downloaded when you created the worker pool
The file that was downloaded when you created the API key

If you do not have these four files, go back and complete the steps above to generate them.

💡 You might also like:

Step 4: Create the Kubernetes namespace and secrets for the worker pool controller and Prometheus exporter

Now we need to create a namespace in Kubernetes for the worker pool controller and Prometheus exporter as well as all the secrets those controllers will need to authenticate to Spacelift.

Create the namespace:

kubectl create ns spacelift-worker-pool-controller-system

Now, switch to the directory where you have the four files from the previous steps and run the following commands to create the secrets:

kubectl create secret generic spacelift-worker-pool-credentials \
    --namespace=spacelift-worker-pool-controller-system \
    --from-literal=privateKey=$(cat spacelift.key | base64) \
    --from-file=token={your-workerpool-config}.config

Note: Replace {your-workerpool-config}.config with the file that was downloaded when you created the worker pool.

kubectl create secret generic spacelift-prometheus-exporter-credentials \
    --namespace=spacelift-worker-pool-controller-system \
    --from-literal=SPACELIFT_PROMEX_API_KEY_SECRET=$(cat {your-api-key-config}.config | sed '4q;d')

Note: Replace {your-api-key-config}.config with the file that was downloaded when you created the API key.

Now when you run kubectl get secrets -n spacelift-worker-pool-controller-system you should see the two secrets you just created.

Step 5: Install the worker pool controller and Prometheus exporter

Now we need to install the worker pool controller and Prometheus exporter into the Kubernetes cluster using the Helm chart and the secrets we just created.

Add the Spacelift Helm repository:

helm repo add spacelift https://downloads.spacelift.io/helm
helm repo update

Then install the chart:

helm upgrade spacelift-worker-pool-controller spacelift/spacelift-workerpool-controller --install \
  --namespace spacelift-worker-pool-controller-system \
  --set spacelift-promex.apiEndpoint="https://{your-account-name}.app.spacelift.io" \
  --set spacelift-promex.apiKeyId="{your-noted-api-key-id}" \
  --set spacelift-promex.apiKeySecretName="spacelift-prometheus-exporter-credentials" \
  --set spacelift-promex.enabled=true

Note: Replace {your-account-name} with your Spacelift account name and {your-noted-api-key-id} with the API key ID you noted when you created the API key.

Checkup Two

You should now have two pods running in your cluster in the spacelift-worker-pool-controller-system namespace. You can check this by running kubectl get pods -n spacelift-worker-pool-controller-system.

$ kubectl get pods -n spacelift-worker-pool-controller-system
NAME                                                              READY   STATUS    RESTARTS   AGE
spacelift-worker-pool-controller-controller-manager-6dc789b5bh2k   1/1     Running   0          46s
spacelift-worker-pool-controller-spacelift-promex-98f4d7fb4sdgx5   1/1     Running   0          46s

If either of these pods is not spinning up, look at the logs for the pod to see if there is an issue communicating to the Spacelift API and ensure you’ve closely followed the tutorial up to this point.

Step 6: Create the worker pool in Kubernetes

Now we need to create the worker pool in Kubernetes, using the worker pool CRD that was installed with the controller.

kubectl apply -f - <<EOF
apiVersion: workers.spacelift.io/v1beta1
kind: WorkerPool
metadata:
  name: my-amazing-worker-pool
  namespace: spacelift-worker-pool-controller-system
spec:
  poolSize: 1
  token:
    secretKeyRef:
      name: spacelift-worker-pool-credentials
      key: token
  privateKey:
    secretKeyRef:
      name: spacelift-worker-pool-credentials
      key: privateKey
EOF

Once you create this CRD in Kubernetes, the worker pool controller should immediately pick it up and report one worker in your Spacelift account as READY.

Step 7: Create the Kube Prom Stack Pod Monitor

Now we need to create a pod monitor in the kube-prom-stack to monitor the Prometheus exporter pods.

kubectl apply -f - << EOF
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  name: spacelift-promex-monitor
  namespace: spacelift-worker-pool-controller-system
spec:
  selector:
    matchLabels:
      app.kubernetes.io/instance: spacelift-worker-pool-controller
      app.kubernetes.io/name: spacelift-promex
  namespaceSelector:
    matchNames:
      - spacelift-worker-pool-controller-system
  podMetricsEndpoints:
  - path: /metrics
    port: metrics
EOF

This will enable the Kubernetes Prometheus stack to monitor the Prometheus exporter pod, which is important because KEDA will query kube prom stack for the queue length.

Step 8: Create the KEDA ScaledObject

Finally, we need to create a KEDA ScaledObject to monitor the queue length and scale the worker pool based on the queue length.

The example below will always have 1 worker in the worker pool, and when there are more jobs in the queue than the number of workers, it will scale out to match the number of jobs, with a max number of 5. As soon as the number of jobs is reduced, the worker pool will scale in, reducing the number of workers to the number of existing jobs until it reaches a minimum number of 1.

You can change the query to scale based on your requirements.

kubectl apply -f - << EOF
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: workerpool-autoscaler
  namespace: spacelift-worker-pool-controller-system
spec:
  scaleTargetRef:
    apiVersion: workers.spacelift.io/v1beta1
    kind: WorkerPool
    name: my-amazing-worker-pool
  pollingInterval: 30
  cooldownPeriod: 10
  initialCooldownPeriod: 0
  minReplicaCount: 1
  maxReplicaCount: 5
  triggers:
    - type: prometheus
      metadata:
        serverAddress: http://kube-prom-stack-kube-prome-prometheus.kube-prom-stack.svc.cluster.local:9090
        query: max_over_time(spacelift_worker_pool_runs_pending{worker_pool_name='my-amazing-worker-pool'}[5m])
        threshold: '1.0'
        activationThreshold: '0'
        unsafeSsl: "true"
EOF

Notes:

You may need to update the Prometheus server address and/or worker pool name.
If you did not give your WorkerPool CRD the same name as your worker pool in Spacelift, you must update the worker_pool_name in the query to match the name in Spacelift.
By setting the maxReplicaCount, you can set the max scale of your workers, and in this way, you can avoid overages.

Once this is in place, you can assign your stack(s) to the worker pool.

Finish: Run a stack

When you trigger two stacks, they should queue for a few seconds. Then you can see Keda scale the worker pool to 2. Once one of the stacks is complete, the worker pool will scale back to 1.

You can see in the events of the ScaledObject when it scales (it will not notify you of scale in, but it will indeed scale in).

Events:
  Type    Reason                    Age                  From           Message
  ----    ------                    ----                 ----           -------
  Normal  KEDAScalersStarted        4m51s                keda-operator  Started scalers watch
  Normal  ScaledObjectReady         4m51s                keda-operator  ScaledObject is ready for scaling
  Normal  KEDAScalersStarted        17s (x7 over 4m51s)  keda-operator  Scaler prometheus is built.
  Normal  KEDAScaleTargetActivated  17s                  keda-operator  Scaled workers.spacelift.io/v1beta1.WorkerPool spacelift-worker-pool-controller-system/my-amazing-worker-pool from 1 to 2, triggered by prometheusScaler

The Most Flexible CI/CD Automation Tool

Spacelift is an alternative to using homegrown solutions on top of a generic CI. It helps overcome common state management issues and adds several must-have capabilities for infrastructure management.

Start free trial

Written by

Joey Stout

Joey Stout is a seasoned programmer with over a decade of experience using AWS, specializing in Site Reliability Engineering and DevOps. His interests include OpenTofu, Kubernetes, and the broader landscape of technology. Outside of his professional work, Joey is an avid outdoorsman, enjoying activities like hunting, fishing, and camping.