In this article, we will take a look at liveness probes in Kubernetes (K8S), with some useful examples. Defining probes correctly can improve pod resilience and availability.
The liveness probe ensures that an application within a container is live and operational based on a specified test.
The kubelet uses liveness probes to know when to restart a container. Applications that error or transition to broken states will be picked up and can be fixed in many instances by being restarted.
If the configured liveness probe is successful, no action is taken, and no logs are recorded. If it fails, the event is logged, and the kubelet kills the container according to the configured restartPolicy
.
A liveness probe should be used when a pod may appear to be running, but the application may not function correctly. For example, in a deadlock situation, the pod may be running but will be unable to serve traffic and is effectively not working.
They are not necessary where the application is configured to crash the container on failure as the kubelet will check the restartPolicy
and will automatically restart the container if it is set to Always
or OnFailure
. In the case of the NGINX application, this starts up quickly and will exit if it runs into an error that stops it from serving pages. In this situation, we do not need a liveness probe.
This article will focus on the use of liveness probes, but you should be aware of the other types of probes available for use in Kubernetes:
Readiness probes
Readiness probes monitor when the application becomes available. If it fails, no traffic will be sent to the Pod. These are used when an app needs configuration before it becomes ready. An application may also become overloaded with traffic and cause the probe to fail, preventing more traffic from being sent to it, and allowing it to recover. If it fails, the endpoints controller removes the Pod.
If the readiness probe fails but the liveness probe succeeds, the kubelet determines that the container is not ready to receive network traffic but is still working to become ready.
Startup probes
Startup probes are used by the kubelet to enable it to know when a container application has started. When these are configured, liveness and readiness checks are disabled until they are successful, ensuring startup probes don’t interfere with the application startup.
These are particularly useful with slow-starting containers, avoiding them getting killed by the kubelet before they are up and running when a liveness probe fails. If liveness probes are used on the same endpoint as a startup probe, set the failureThreshold
of the startup probe higher, to support long startup times.
If it fails, the event is logged, and the kubelet kills the container according to the configured restartPolicy
.
Probes are managed by the kubelet. The kubelet is the primary “node agent” that runs on each node.
To effectively use a Kubernetes probe, the application must support one of the following handlers:
- ExecAction handler — runs a command inside the container, and the diagnostic succeeds if the command completes with status code 0.
- TCPSocketAction handler — attempts a TCP connection to the IP address of the pod on a specific port. The diagnostic succeeds if the port is found to be open.
- HTTPGetAction handler — performs an HTTP GET request using the IP address of the pod, a specific port, and a specified path. The diagnostic succeeds if the response code returned is between 200–399.
- gRPC handler — As of Kubernetes version v.1.24, and if your application implements gRPC Health Checking Protocol, kubelet can be configured to use it for application liveness checks. You must enable the
GRPCContainerProbe
feature gate in order to configure checks that rely on gRPC.
When the kubelet performs a probe on a container, it responds with either Success
, if the diagnostic passed, Failure
if it failed, or Unknown
, if the diagnosis did not complete for some reason.
In each example shown below, the periodSeconds
field specifies that the kubelet should perform a liveness probe every 5 seconds. The initialDelaySeconds
field tells the kubelet that it should wait 5 seconds before performing the first probe.
In addition to these options, you can also configure:
timeoutSeconds
– Time to wait for the reply – default = 1.successThreshold
– Number of successful probe executions to mark the container healthy – default = 1.failiureThreshold
– Number of failed probe executions to mark the container unhealthy – default = 3.
These five parameters can be used in all types of liveness probes.
Before defining a probe, the system behavior and average startup times of the Pod and its containers should be observed so you can determine the correct thresholds. Also, the probe options should be updated as the infrastructure or application evolves. For example, the Pod may be configured to use more system resources which might affect the values that need to be configured for the probes.
ExecAction handler example
The below example shows a usage of the exec
command to check if a file exists at the path /usr/share/liveness/html/index.html by using the cat
command. If no file exists, then the liveness probe will fail and the container will be restarted.
apiVersion: v1
kind: Pod
metadata:
labels:
test: liveness
name: liveness-exec
spec:
containers:
- name: liveness
image: registry.k8s.io/liveness:0.1
ports:
- containerPort: 8080
livenessProbe:
exec:
command:
- cat
- /usr/share/liveness/html/index.html
initialDelaySeconds: 5
periodSeconds: 5
TCPSocketAction handler example
In this example, the liveness probe uses the TCP handler to check port 8080 is open and responding. With this configuration, the kubelet will attempt to open a socket to your container on the specified port. If the liveness probe fails, the container will be restarted.
apiVersion: v1
kind: Pod
metadata:
name: liveness
labels:
app: liveness-tcp
spec:
containers:
- name: liveness
image: registry.k8s.io/liveness:0.1
ports:
- containerPort: 8080
livenessProbe:
tcpSocket:
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
HTTPGetAction handler example
This example shows the HTTP handler, which will send an HTTP GET request on port 8080 to the /health path. If a code between 200–400 is returned, the probe is considered successful. If a code outside of this range is returned, the probe is unsuccessful, and the container is restarted. The httpHeaders option is used to define any custom headers you want to send.
apiVersion: v1
kind: Pod
metadata:
labels:
test: liveness
name: liveness-http
spec:
containers:
- name: liveness
image: registry.k8s.io/liveness:0.1
livenessProbe:
httpGet:
path: /health
port: 8080
httpHeaders:
- name: Custom-Header
value: ItsAlive
initialDelaySeconds: 5
periodSeconds: 5
gRPC handler example
This example shows the use of the gRPC health checking protocol to check port 2379 is responding. To use a gRPC probe, port
must be configured. If the health endpoint is configured on a non-default service, you must also specify the service
. All errors are considered as probe failures as there are no error codes for gRPC built-in probes.
apiVersion: v1
kind: Pod
metadata:
labels:
test: liveness
name: liveness-gRPC
spec:
containers:
- name: liveness
image: registry.k8s.io/liveness:0.1
ports:
- containerPort: 2379
livenessProbe:
grpc:
port: 2379
initialDelaySeconds: 5
periodSeconds: 5
- Keep liveness probes simple and lightweight. Misconfigured probes can impact application performance if they run too frequently or cause containers to sit in an unhealthy state for extended periods of time. Some containers don’t need probes where they execute simple operations and terminate quickly, so avoid unnecessary probe configurations.
- Use a combination of readiness and liveness probes to ensure that your application is running properly and is able to handle incoming traffic.
- If a liveness probe takes too long to complete, Kubernetes may assume that the application is not running and restart it, even if it is actually still running, so setting a realistic and appropriate timeout value is recommended. Check how long the probe’s command, API request, or gRPC call takes to actually complete, then set a value with a small extra time buffer.
- If a liveness probe fails a certain number of times, Kubernetes will restart the application. Set a failure threshold for your liveness probes to avoid unnecessary restarts of your application.
- Check that your container restart policies are applied after probes. This means your containers need
restartPolicy: Always
(the default) orrestartPolicy: OnFailure
so Kubernetes can restart them after a failed probe. Using the Never policy will keep the container in a failed state. - Use the
kubectl
command-line tool to test your probes and make sure that they are correctly configured. - Choose the appropriate probe type for your application. For example, use an HTTP probe for web applications, while a TCP probe might be more appropriate for a database. The target of your probe’s command or HTTP request should generally be independent of your main application, so it can run to completion even during failure conditions.
- Monitor your liveness probes to ensure that they are working as expected. As changes are made to your application, be sure to update the probes to reflect any changes. Set up alerts to notify you if a probe fails, and monitor the logs for any errors related to your probes.
Combining liveness probes with readiness and startup probes correctly can improve pod resilience and availability by triggering an automatic restart of a container once a failure of a specified test is detected. In order to correctly define them, the application must be understood so the correct options can be specified.
Also, take a look at how Spacelift helps you manage the complexities and compliance challenges of using Kubernetes. Anything that can be run via kubectl can be run within a Spacelift stack. Find out more about how Spacelift works with Kubernetes, and get started on your journey by creating a free trial account.
The most Flexible CI/CD Automation Tool
Spacelift is an alternative to using homegrown solutions on top of a generic CI. It helps overcome common state management issues and adds several must-have capabilities s for infrastructure management.