For a while now at Spacelift we’ve been hearing from users that they’d like to be able to run Spacelift worker pools in their Kubernetes clusters. As a first step towards that we decided to investigate whether we could run workers in Kubernetes using a Docker-in-Docker sidecar container. The rest of this post gives a rough overview of the architecture of Spacelift workers, and also explains how the sidecar container strategy works.
I’d just like to say thanks to Vadym Martsynovskyy for his great article about running Jenkins agents in Kubernetes. In that article he outlines the general approach taken here, and also describes some of the pros and cons of using Docker-in-Docker. It’s definitely well worth a read.
Before going into more detail, it’s worth explaining what a Spacelift worker is, and giving a quick overview of our architecture. At Spacelift, we provide a hybrid-SaaS model like many other CI/CD systems. What this means in practice is that the control plane that handles scheduling runs is managed by us, but the execution of runs happens in a separate process called a worker. We provide a public worker pool, but also allow you to self-host your own workers, providing complete control over your infrastructure.
Perhaps confusingly, if you follow the instructions to create a private worker pool, you’ll notice that it involves downloading something called the “launcher”. The launcher is responsible for communicating with the Spacelift mothership (control plane), and creating and destroying worker containers using Docker for each Spacelift run:
As you can see, the launcher process needs access to a Docker daemon in order to create a container for each run.
We found that the launcher architecture maps very nicely to Kubernetes using a sidecar container to run the Docker daemon. For each worker in the pool, we need to create a Kubernetes Pod with two containers: one for the launcher; and another for Docker. The following shows a stripped down Deployment definition for creating a worker pool with 4 workers:
apiVersion: apps/v1
kind: Deployment
metadata:
name: worker-pool-1-spacelift-worker
spec:
replicas: 4
template:
spec:
containers:
- name: launcher
image: "public.ecr.aws/spacelift/launcher:latest"
imagePullPolicy: Always
env:
- name: DOCKER_HOST
value: tcp://localhost:2375
- name: SPACELIFT_TOKEN
value: "..."
- name: SPACELIFT_POOL_PRIVATE_KEY
value: "..."
volumeMounts:
- name: launcher-storage
mountPath: /opt/spacelift
subPath: spacelift
- name: dind
image: "docker:dind"
imagePullPolicy: Always
command: ["dockerd", "--host", "tcp://127.0.0.1:2375"]
securityContext:
privileged: true
volumeMounts:
- name: launcher-storage
mountPath: /var/lib/docker
subPath: docker
- name: launcher-storage
mountPath: /opt/spacelift
subPath: spacelift
volumes:
- name: launcher-storage
emptyDir: {}
The launcher communicates with the Docker-in-Docker sidecar container via TCP, which is configured via the DOCKER_HOST
environment variable for the launcher container. This works because the containers in a Kubernetes Pod share the same IP address and port space, and can communicate with each other via localhost.
In addition, a shared volume called launcher-storage
is mounted into each container. This is used by the launcher to store the workspaces for runs, along with other things like cached tool binaries (for example Terraform). The Docker sidecar needs access to the run workspaces in order to mount them into the worker containers, and it also uses that volume to store its image cache.
The last thing worth pointing out is that the dind
container sets securityContext.privileged
to true
. This is required for Docker-in-Docker to function correctly.
The Spacelift launcher is distributed as a statically linked binary, making it simple to build an image. As you can see, the Dockerfile simply copies the launcher into an Alpine container, and then sets the startup command:
FROM alpine:3.14
COPY spacelift-launcher /usr/bin/spacelift-launcher
RUN chmod 755 /usr/bin/spacelift-launcher
CMD [ "/usr/bin/spacelift-launcher" ]
This image is rebuilt and published to our public ECR repository any time the launcher binary is updated.
We provide Terraform modules to make it really easy for users to deploy worker pools to AWS, Azure and GCP, and we wanted to provide a similar experience for Kubernetes. To achieve this, we created a Helm chart that makes it simple to deploy workers to Kubernetes clusters: https://github.com/spacelift-io/spacelift-workerpool-k8s.
Assuming you’ve already configured your worker pool in Spacelift and have access to the credentials for the pool, deploying this chart to your cluster simply requires two steps.
First, add the Spacelift Helm chart repository and update your local chart cache:
helm repo add spacelift https://downloads.spacelift.io/helm
helm repo update
Next, install the chart:
helm upgrade worker-pool-1 spacelift/spacelift-worker --install --set "credentials.token=<worker-pool-token>,credentials.privateKey=<worker-pool-private-key>"
Replace <worker-pool-token>
and <worker-pool-private-key>
with your own credentials, and make sure to base64-encode the private key.
If all goes well, you should be able to view the pods for your worker pool using kubectl get pods
:
kubectl get pods
NAME READY STATUS RESTARTS AGE
worker-pool-1-spacelift-worker-7fcfc9f594-f94tj 2/2 Running 0 22m
You should also be able to view the workers in your pool in Spacelift:
When implementing in a production environment, you may want to use an alternative approach to providing credentials, and may want to configure a custom storage volume for the worker Pods. For more information about configuring the chart, check out the README in the chart GitHub repo.
If you’re interested in running Spacelift workers in Kubernetes, we’d welcome any feedback about the approach, contributions to the Helm chart, and also any issues you encounter so that we can make improvements. Also, if you aren’t already using Spacelift but are interested in trying it, you can signup and start your free evaluation of Spacelift.
The most Flexible CI/CD Automation Tool
Spacelift is an alternative to using homegrown solutions on top of a generic CI. It helps overcome common state management issues and adds several must-have capabilities for infrastructure management.