In this article, we will take a look at liveness probes in Kubernetes (K8S), with some useful examples. Defining probes correctly can improve pod resilience and availability.
The liveness probe ensures that an application within a container is live and operational based on a specified test.
The kubelet uses liveness probes to know when to restart a container. Applications that error or transition to broken states will be picked up and can be fixed in many instances by being restarted.
If the configured liveness probe is successful, no action is taken, and no logs are recorded. If it fails, the event is logged, and the kubelet kills the container according to the configured
A liveness probe should be used when a pod may appear to be running, but the application may not function correctly. For example, in a deadlock situation, the pod may be running but will be unable to serve traffic and is effectively not working.
They are not necessary where the application is configured to crash the container on failure as the kubelet will check the
restartPolicy and will automatically restart the container if it is set to
OnFailure. In the case of the NGINX application, this starts up quickly and will exit if it runs into an error that stops it from serving pages. In this situation, we do not need a liveness probe.
This article will focus on the use of liveness probes, but you should be aware of the other types of probes available for use in Kubernetes:
Readiness probes monitor when the application becomes available. If it fails, no traffic will be sent to the Pod. These are used when an app needs configuration before it becomes ready. An application may also become overloaded with traffic and cause the probe to fail, preventing more traffic from being sent to it, and allowing it to recover. If it fails, the endpoints controller removes the Pod.
If the readiness probe fails but the liveness probe succeeds, the kubelet determines that the container is not ready to receive network traffic but is still working to become ready.
Startup probes are used by the kubelet to enable it to know when a container application has started. When these are configured, liveness and readiness checks are disabled until they are successful, ensuring startup probes don’t interfere with the application startup.
These are particularly useful with slow-starting containers, avoiding them getting killed by the kubelet before they are up and running when a liveness probe fails. If liveness probes are used on the same endpoint as a startup probe, set the
failureThreshold of the startup probe higher, to support long startup times.
If it fails, the event is logged, and the kubelet kills the container according to the configured
Probes are managed by the kubelet. The kubelet is the primary “node agent” that runs on each node.
To effectively use a Kubernetes probe, the application must support one of the following handlers:
- ExecAction handler — runs a command inside the container, and the diagnostic succeeds if the command completes with status code 0.
- TCPSocketAction handler — attempts a TCP connection to the IP address of the pod on a specific port. The diagnostic succeeds if the port is found to be open.
- HTTPGetAction handler — performs an HTTP GET request using the IP address of the pod, a specific port, and a specified path. The diagnostic succeeds if the response code returned is between 200–399.
- gRPC handler — As of Kubernetes version v.1.24, and if your application implements gRPC Health Checking Protocol, kubelet can be configured to use it for application liveness checks. You must enable the
GRPCContainerProbefeature gate in order to configure checks that rely on gRPC.
When the kubelet performs a probe on a container, it responds with either
Success, if the diagnostic passed,
Failure if it failed, or
Unknown, if the diagnosis did not complete for some reason.
In each example shown below, the
periodSeconds field specifies that the kubelet should perform a liveness probe every 5 seconds. The
initialDelaySeconds field tells the kubelet that it should wait 5 seconds before performing the first probe.
In addition to these options, you can also configure:
timeoutSeconds– Time to wait for the reply – default = 1.
successThreshold– Number of successful probe executions to mark the container healthy – default = 1.
failiureThreshold– Number of failed probe executions to mark the container unhealthy – default = 3.
These five parameters can be used in all types of liveness probes.
Before defining a probe, the system behavior and average startup times of the Pod and its containers should be observed so you can determine the correct thresholds. Also, the probe options should be updated as the infrastructure or application evolves. For example, the Pod may be configured to use more system resources which might affect the values that need to be configured for the probes.
ExecAction handler example
The below example shows a usage of the
exec command to check if a file exists at the path /usr/share/liveness/html/index.html by using the
cat command. If no file exists, then the liveness probe will fail and the container will be restarted.
apiVersion: v1 kind: Pod metadata: labels: test: liveness name: liveness-exec spec: containers: - name: liveness image: registry.k8s.io/liveness:0.1 ports: - containerPort: 8080 livenessProbe: exec: command: - cat - /usr/share/liveness/html/index.html initialDelaySeconds: 5 periodSeconds: 5
TCPSocketAction handler example
In this example, the liveness probe uses the TCP handler to check port 8080 is open and responding. With this configuration, the kubelet will attempt to open a socket to your container on the specified port. If the liveness probe fails, the container will be restarted.
apiVersion: v1 kind: Pod metadata: name: liveness labels: app: liveness-tcp spec: containers: - name: liveness image: registry.k8s.io/liveness:0.1 ports: - containerPort: 8080 livenessProbe: tcpSocket: port: 8080 initialDelaySeconds: 5 periodSeconds: 5
HTTPGetAction handler example
This example shows the HTTP handler, which will send an HTTP GET request on port 8080 to the /health path. If a code between 200–400 is returned, the probe is considered successful. If a code outside of this range is returned, the probe is unsuccessful, and the container is restarted. The httpHeaders option is used to define any custom headers you want to send.
apiVersion: v1 kind: Pod metadata: labels: test: liveness name: liveness-http spec: containers: - name: liveness image: registry.k8s.io/liveness:0.1 livenessProbe: httpGet: path: /health port: 8080 httpHeaders: - name: Custom-Header value: ItsAlive initialDelaySeconds: 5 periodSeconds: 5
gRPC handler example
This example shows the use of the gRPC health checking protocol to check port 2379 is responding. To use a gRPC probe,
port must be configured. If the health endpoint is configured on a non-default service, you must also specify the
service. All errors are considered as probe failures as there are no error codes for gRPC built-in probes.
apiVersion: v1 kind: Pod metadata: labels: test: liveness name: liveness-gRPC spec: containers: - name: liveness image: registry.k8s.io/liveness:0.1 ports: - containerPort: 2379 livenessProbe: grpc: port: 2379 initialDelaySeconds: 5 periodSeconds: 5
Combining liveness probes with readiness and startup probes correctly can improve pod resilience and availability by triggering an automatic restart of a container once a failure of a specified test is detected. In order to correctly define them, the application must be understood so the correct options can be specified.
Also, take a look at how Spacelift helps you manage the complexities and compliance challenges of using Kubernetes. Anything that can be run via kubectl can be run within a Spacelift stack. Find out more about how Spacelift works with Kubernetes, and get started on your journey by creating a free trial account.
The most Flexible CI/CD Automation Tool
Spacelift is an alternative to using homegrown solutions on top of a generic CI. It helps overcome common state management issues and adds several must-have capabilities s for infrastructure management.