Kubernetes Health Checks: Types, Configuration & Debugging

Kubernetes health checks monitor Pod states. They enable Kubernetes to detect when Pod failures occur so it can automatically restart the affected replicas or direct traffic to other healthy replicas. This ensures high availability for your workloads.

In this article, we take a detailed look at the various types of Kubernetes health checks. We’ll cover how the checks work and explain how to properly configure your Pods. This will allow you to improve the reliability of your deployments.

What we’ll cover:

How to check health in Kubernetes?

Pod health in Kubernetes is checked using liveness probes, readiness probes, and startup probes defined in pod specs. These probes can use HTTP, TCP, or command checks to determine if a container is functioning.

What is a Kubernetes health check (probe)?

Kubernetes health checks automate the process of verifying whether your containerized apps are still alive. In Kubernetes terms, these checks are referred to as “probes,” but they implement the same functionality as “health checks” in other systems. Kubernetes “probes” your Pods to determine their internal states.

Probes allow the applications within your containers to provide crucial information to your cluster’s control plane. If a probe runs successfully, then Kubernetes knows your app’s still alive. Conversely, it can infer that Pods with failing probes are no longer running correctly.

Types of Kubernetes probes

The probe mechanism is essential to successful Kubernetes high availability. Without using probes, Kubernetes has no way of knowing whether a running Pod is actually serving requests.

For instance, the app within the Pod could have experienced a bug that stops it from handling new traffic. Probes enable Kubernetes to detect this state so your apps stay healthy. They also allow Kubernetes to avoid sending traffic to Pods that aren’t ready, such as when your app is still starting up.

Kubernetes supports three types of probes for different use cases:

Liveness probes: Liveness probes run continually while your Pods are running to detect whether they have died. When a probe fails, then Kubernetes will restart the Pod.
Readiness probes: Readiness probes tell Kubernetes whether Pods can currently accept traffic. Pods that fail a readiness probe won’t be restarted, but they’ll stop receiving traffic from Services. This prevents application errors due to problems such as an upstream database server becoming unavailable.
Startup probes: Startup probes run when new Pods are created. They tell Kubernetes whether the application within the Pod has started up yet. Liveness and Readiness probes aren’t activated until after the startup probe completes, avoiding errors caused by your app’s core functions being unavailable while it initializes.

Most applications benefit from using all three types of probes. This provides the most comprehensive health check coverage. The probes work together to restart failed Pods while gracefully handling temporary problems.

Understanding Kubernetes probes vs Docker HEALTHCHECK

If you’re already running containers using Docker, then you may be familiar with the HEALTHCHECK Dockerfile instruction. This instructs the container runtime to periodically run a command inside the container to check that it is still alive. Docker reports container health statuses in its CLI and API, but it doesn’t automatically restart containers if they become unhealthy.

Compared with Docker’s HEALTHCHECK, Kubernetes probes are a more sophisticated health check mechanism designed for the needs of production apps running at scale. HEALTHCHECK only supports a single health check command, whereas Kubernetes uses multiple probes to distinguish between different liveness and readiness states. It then takes appropriate action to restart unhealthy Pods, or block their traffic.

These features mean Dockerfile HEALTHCHECK instructions are redundant in Kubernetes. They have no impact on your cluster, and you should configure probes instead, even if you’ve already configured a health check in your Dockerfile. Kubernetes won’t follow the instruction.

How to configure and use Kubernetes health check probes

Now that we’ve covered the basics of Kubernetes probes, let’s take a closer look at using them.

Probes are configured at the container-level within your Kubernetes Pod definitions, via the livenessProbe, readinessProbe, and startupProbe fields. Each probe must have an action it’ll run to check whether the container is healthy. The following action types are supported:

Command (exec): Kubernetes will run a command inside the container. If the command exits with a 0 success code, then the container’s considered to be healthy. The container will be marked as unhealthy if the command exits with a non-zero exit code or fails to complete.
HTTP (httpGet): Kubernetes will make an HTTP GET request to a specific container port. Your HTTP endpoint inside the container should respond with a 2xx status code to indicate your app is still healthy. 4xx and 5xx codes signal failures. This is the most commonly used probe type because it’s easy to implement for services that already expose an HTTP API.
TCP (tcp): TCP probes attempt to open a socket connection to a specified container port. The container is considered healthy if the connection is successfully opened.
gRPC (grpc): gRPC probes call the gRPC Health Checking Protocol on a configured container port.

We’ll show examples of each of these actions in the following sections. You can also find exhaustive documentation on the options supported for each action in the Kubernetes documentation.

Probe timing options

Probes share a common set of options for configuring their timing:

initialDelaySeconds: This sets the number of seconds after starting a container before the probe is first run. It allows you to account for startup processes that have a known fixed duration. The default value is zero seconds.
periodSeconds: This sets the interval at which the probe will run. It defaults to 10 seconds, meaning the probe will be repeated once every 10 seconds.
timeoutSeconds: The probe timeout controls how long Kubernetes waits to receive a response after triggering the probe’s command or making a request to the configured network endpoint. It defaults to one second.
failureThreshold: This setting allows you to avoid prematurely terminating containers that are experiencing temporary errors. It defines the number of consecutive probes that must fail before Kubernetes considers the container to be unhealthy. For example, if failureThreshold is 3 (the default), then the container only becomes unhealthy if three consecutive probes deliver a failure result.
successThreshold: This works similarly to failureThreshold, except that it specifies the number of successful probes that must complete for an unhealthy container to be considered healthy again. If successThreshold is 3, then three consecutive probes must complete successfully before any traffic is directed back to the container. This setting only applies to readiness probes.

The correct probe configuration depends on these settings being carefully tuned to your workload’s requirements. If in doubt, the default values are usually a good fit for common scenarios. However, if you find probes aren’t detecting failures (or are running too frequently) then you may need to change the periodSeconds value.

Similarly, if you think you’re experiencing false positive failures, then you should consider changing failureThreshold, timeoutSeconds, or initialDelaySeconds as required.

Now let’s look at some sample configurations for liveness, readiness, and startup probes.

Example 1. How to configure a liveness probe using a container command

Liveness probes are used to detect failed Pods and then automatically restart them. Therefore, your probe should only fail if the Pod is completely dead and cannot be recovered.

Don’t use liveness probes to check whether external services, such as databases, are available — those tests belong in readiness probes, as the services could become available again in the future.

Configure liveness probes using the livenessProbe field in a container spec, within a Pod or Deployment manifest:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: demo-deployment
spec:
  replicas: 3
  selector:
    matchLabels:
      app: demo-deployment
  template:
    metadata:
      labels:
        app: demo-deployment
    spec:
      containers:
        - name: demo
          image: nginx:latest
          livenessProbe:
            exec:
              command:
                - echo
                - "Healthy"
            periodSeconds: 5

The probe in the above example uses the exec action to run a command in the container every five seconds. The container will become unhealthy if the command fails.

Example 2. How to configure a readiness probe using an HTTP request

Readiness probes are configured in the same way as liveness probes, but using the readinessProbe container spec field. These probes let you indicate whether your application can currently serve traffic, so your implementations should test whether all required dependencies are available.

The following example demonstrates how to use the httpGet request action to probe an HTTP endpoint served by the container:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: demo-deployment
spec:
  replicas: 3
  selector:
    matchLabels:
      app: demo-deployment
  template:
    metadata:
      labels:
        app: demo-deployment
    spec:
      containers:
        - name: demo
          image: nginx:latest
          readinessProbe:
            httpGet:
              path: /
              port: 80
            initialDelaySeconds: 5
            periodSeconds: 10

This probe makes an HTTP request to the root URL of the service running on container port 80. The container is considered healthy if the endpoint returns a 2xx status code. The probe uses initialDelaySeconds to wait five seconds after the Pod is created before running the probe for the first time. This helps ensure the web server within the container is ready to serve the probe endpoint.

HTTP probes also allow you to send custom HTTP headers with your request. Set the httpHeaders option within the httpGet field:

httpGet:
  path: /
  port: 80
  httpHeaders:
    - name: Authorization
      value: Secret-Token

Readiness probe with gRPC

The following example is a variation of the one above. It defines a readiness probe that makes a gRPC health check request, instead of relying on HTTP.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: demo-deployment
spec:
  replicas: 3
  selector:
    matchLabels:
      app: demo-deployment
  template:
    metadata:
      labels:
        app: demo-deployment
    spec:
      containers:
        - name: demo
          image: example-image:latest
          readinessProbe:
            grpc:
              port: 2379
            initialDelaySeconds: 5
            periodSeconds: 10

The port field is mandatory when using gRPC requests. You must ensure your application exposes a correctly configured gRPC Health Checking endpoint on the specified port.

Example 3. How to configure a startup probe using a TCP Socket

As discussed above, startup probes are a special type of probe that run immediately after Pod creation. You should use them to report when your containerized app has started up completely. Kubernetes will then start running your liveness and readiness probes.

Startup probes are defined in the same way as the other probe types, but via the startupProbe container field. The following example uses the tcpSocket action to mark the container as healthy once Kubernetes can successfully open a socket to the container’s port 80:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: demo-deployment
spec:
  replicas: 3
  selector:
    matchLabels:
      app: demo-deployment
  template:
    metadata:
      labels:
        app: demo-deployment
    spec:
      containers:
        - name: demo
          image: nginx:latest
          startupProbe:
            tcpSocket:
              port: 80
            initialDelaySeconds: 5

The initialDelaySeconds option is often useful to start up probes. If you know your app will take at least five seconds to start, then setting this option prevents the probe from running unnecessarily early.

💡 You might also like:

How to use kubectl to monitor probes and detect failures

Kubernetes doesn’t provide detailed logging for each attempt to run a probe. However, you can understand what’s happening in your Pods using the information provided by Kubectl.

The following manifest defines a Deployment with a liveness check that will always fail (it targets an unknown HTTP path):

apiVersion: apps/v1
kind: Deployment
metadata:
  name: demo-deployment
spec:
  replicas: 3
  selector:
    matchLabels:
      app: demo-deployment
  template:
    metadata:
      labels:
        app: demo-deployment
    spec:
      containers:
        - name: demo
          image: nginx:latest
          livenessProbe:
            httpGet:
              path: /demo
              port: 80
            initialDelaySeconds: 5
            periodSeconds: 10

Inspecting the Pods created by this Deployment will reveal that the RESTARTS count is steadily increasing:

$ kubectl get pods
NAME                               READY   STATUS    RESTARTS      AGE
demo-deployment-64795d5498-9jh4p   1/1     Running   1 (7s ago)    38s
demo-deployment-64795d5498-dd898   1/1     Running   1 (10s ago)   41s
demo-deployment-64795d5498-zcwqm   1/1     Running   1 (5s ago)    35s

Kubernetes runs the liveness probe every five seconds, but it fails, causing the Pod to continually restart.

Eventually, too many restarts will occur, and the Pod will enter the CrashLoopBackOff status. Kubernetes will continue to attempt to restart it, but with increasingly long intervals between attempts. If you see a Pod with this status and a high restart count, it typically indicates the Pod has a failing liveness probe.

You should check your app’s logs and the probe’s configuration to find the cause of the problem.

$ kubectl get pods
NAME                               READY   STATUS             RESTARTS      AGE
demo-deployment-64795d5498-9jh4p   0/1     CrashLoopBackOff   4 (24s ago)   2m55s
demo-deployment-64795d5498-dd898   0/1     CrashLoopBackOff   4 (27s ago)   2m58s
demo-deployment-64795d5498-zcwqm   0/1     CrashLoopBackOff   4 (22s ago)   2m52s

Failing readiness and startup probes have a slightly subtler effect. Instead of restarting, the Pod will show as Running but with the affected containers failing to reach the Ready state:

$ kubectl get pods
NAME                               READY   STATUS    RESTARTS   AGE
demo-deployment-7f5c458b76-6h7pw   0/1     Running   0          5m13s
demo-deployment-7f5c458b76-fskvc   0/1     Running   0          5m13s
demo-deployment-7f5c458b76-jhx5p   0/1     Running   0          5m13s

Here, Pods with a single container are failing their readiness probe. As a result, Kubectl shows that zero of the Pod’s one container is ready in each Pod. The containers are still running, but they won’t be receiving any traffic. In this scenario, check your readiness and startup probes to identify the issue.

Kubernetes health checks: Debugging tips and best practices

Correctly configuring Kubernetes health checks is one key way to improve your service’s reliability. However, there are also some pitfalls you should look out for as you use liveness, readiness, and startup probes.

Here are some quick tips and best practices that’ll help you on your way to healthier apps and easier troubleshooting.

1. Check you’ve correctly configured your probe’s command or network endpoint

If you’re struggling to determine why probes are failing, check that you’ve correctly configured the probe’s command/request. Any typos could cause an error in the probe itself if it’s trying to run an unknown command.

2. Set appropriate probe intervals

It’s crucial to use correct interval settings for each of your probes. Very frequently repeated probes could cause Pods to be marked as unhealthy when they’ve actually just hit a transient failure or have not yet started up.

Frequent probes may also create an excess load on your application. However, an excessively long interval between probe checks can delay the detection of real issues.

3. Use lightweight, dedicated probe Commands and Endpoints

Lightweight probes run quickly without creating extra load on your application. It’s good practice to create dedicated health check commands or API endpoints that perform just the bare minimum tests needed to prove that your app is alive.

For instance, a liveness check could be a simple API endpoint that issues a 200 response code. If that endpoint is live, then the rest of your API should also be accessible.

4. Enable connection reuse to save resources in your cluster

Where possible, try configuring your containerized services to reuse connection pools for Kubernetes probes. This avoids opening a new connection each time a probe runs, saving resources and improving reliability.

5. Include health check probes every time you deploy pods in your cluster

It’s best to include liveness, readiness, and startup probes with every Kubernetes Pod you deploy. This enables reliable high availability and gives you clearer insights into Pod statuses. Without probes, it’s harder to debug operational issues because you can’t accurately see whether running Pods are actually serving traffic.

How to improve your Kubernetes workflows with Spacelift

If you need help managing your Kubernetes projects, consider Spacelift. It brings with it a GitOps flow, so your Kubernetes Deployments are synced with your Kubernetes Stacks, and pull requests show you a preview of what they’re planning to change.

With Spacelift, you get:

Policies to control what kind of resources engineers can create, what parameters they can have, how many approvals you need for a run, what kind of task you execute, what happens when a pull request is open, and where to send your notifications
Stack dependencies to build multi-infrastructure automation workflows with dependencies, having the ability to build a workflow that can combine Terraform with Kubernetes, Ansible, and other infrastructure-as-code (IaC) tools such as OpenTofu, Pulumi, and CloudFormation.
Self-service infrastructure via Blueprints, enabling your developers to do what matters – developing application code while not sacrificing control
Creature comforts such as contexts (reusable containers for your environment variables, files, and hooks), and the ability to run arbitrary code
Drift detection and optional remediation

If you want to learn more about Spacelift, create a free account today or book a demo with one of our engineers.

Key points

Kubernetes liveness, readiness, and startup probes implement powerful health checks for your deployed applications. Liveness probes allow Kubernetes to detect and restart failed Pods, while readiness probes prevent traffic from reaching Pods that can’t currently handle requests. Startup probes instruct Kubernetes not to make Pods available until the Pod indicates it’s fully initialized.

Adding probes to your Kubernetes Pods is usually quick and easy, as shown in the guide above. Taking the time to define them enables you to improve your app’s availability by allowing Kubernetes to understand what’s happening in your Pods. Probes also enhance your Kubernetes monitoring by giving you more detailed visibility into Pod lifecycles.

Manage Kubernetes easier and faster

Spacelift allows you to automate, audit, secure, and continuously deliver your infrastructure. It helps overcome common state management issues and adds several must-have features for infrastructure management.

Learn more

Frequently asked questions

What do Kubernetes liveness, readiness, and startup probes each verify?
Liveness probes verify that a container is still running properly – if it fails, Kubernetes restarts the container.

Readiness probes verify if a container is prepared to serve traffic, controlling whether it receives requests through the Service.

Startup probes ensure the application has started successfully, preventing premature liveness or readiness checks during initialization.
When should I use HTTP vs TCP vs gRPC vs exec probes for health checks?
Use HTTP probes when your application exposes a health endpoint over HTTP, TCP probes when only a TCP port is available for connectivity checks, gRPC probes for applications with gRPC health checking support, and exec probes for custom logic not exposed via network protocols.
How do I choose safe defaults for initialDelaySeconds, periodSeconds, timeoutSeconds, successThreshold, and failureThreshold?
A safe and effective set of defaults for Kubernetes liveness and readiness probes depends on the application’s startup time, response time, and availability requirements. For most applications, the following values provide a reliable baseline:
- initialDelaySeconds: 10 – Allows the container 10 seconds after starting before the first probe, which is sufficient for a typical app boot time.
- periodSeconds: 10 – Probes every 10 seconds, balancing frequency with resource use.
- timeoutSeconds: 1 – Limits probe duration to one second; if the app needs longer to respond, you should increase accordingly.
- successThreshold: 1 – One successful probe is enough to mark the container as ready or healthy; usually fine unless network flakiness is expected.
- failureThreshold: 3 – Three consecutive failures before restart or unready status, avoiding false positives from transient issues.
Adjust initialDelaySeconds for slow-starting apps and timeoutSeconds for apps with known latency. Production tuning should be based on real-world response characteristics under load.
Why are my probes failing (e.g., CrashLoopBackOff or 503s) and what’s the fastest way to debug with kubectl?
Probes fail due to misconfigured readiness or liveness settings, application startup delays, or actual service errors, leading to issues like CrashLoopBackOff or HTTP 503s. The fastest way to debug is using kubectl describe pod and kubectl logs.
Do I still need a Dockerfile HEALTHCHECK if I’m using Kubernetes probes?
No, you do not need a Dockerfile HEALTHCHECK if you are using Kubernetes probes. Kubernetes liveness and readiness probes fully replace the functionality provided by the Docker HEALTHCHECK directive.