Kubernetes

Troubleshoot and Fix Kubernetes CrashLoopBackoff Status

CrashLoopBackoff

The status of a pod in your Kubernetes (K8S) cluster may show the ‘CrashLoopBackoff’ error. This is shown when a pod has crashed and attempted to restart multiple times.

In this article, we will run through how to spot this error, how to fix it, and some reasons why it might occur.

What Exactly is the Crashloopbackoff Error?

The K8S kubelet will wait an increasing ‘backoff’ time between crashes before attempting to start the pod again. The K8S restartPolicy is set to ‘always’ by default. This can also occur if the restartPolicy has been changed to ‘onFailure.’

This time between restarts gives us time to troubleshoot to attempt to fix the issue. The maximum ‘backoff’ time is 5 minutes and starts at 10 seconds, increasing exponentially until it hits the maximum. When a pod shows the CrashLoopBackoff status, it means it is waiting for the given time specified by the ‘backoff’ to restart and will likely fail again. The container can briefly appear as ‘running’ before crashing again.

How to Find the ‘CrashLoopBackoff’ Error

To show the status of your pods, run the following command:

kubectl get pods -n <namespace>

The status section will show the pod status. If the pod has the CrashLoopBackOff status, it will show as not ready, (as shown below 0/1), and will show more than 0 restarts.

NAME                     READY     STATUS             RESTARTS   AGE
nginx-5796d5bc7d-xtl6q   0/1       CrashLoopBackOff   4         1m

‘Normal’ statuses include:

  • Running

The pod is running without any issues.

  • Waiting

The pod is still starting up, it may be pulling the container image or receiving secret data. Once finished, it should transition to the running state.

  • Terminated

Pods with this status either ran to completion or failed for some reason.

Troubleshooting Pods With CrashLoopBackoff Status

The four kubectl commands listed below are the recommended way to start troubleshooting your errored pods.

  1. kubectl describe pod
  2. kubectl logs
  3. kubectl get events

For more kubectl commands, see our: Kubernetes Cheat Sheet.

Starting with the kubectl describe command, you can use this to get more detail.

kubectl describe pod <pod name> -n <namespace>

The pod status section will show any error messages associated with the pod.

The events section of the output will give you information on the pod’s status. Look for entries containing ‘Back-off restarting failed container’ as shown in the example below.

Name:         pod name
Namespace:    default
Priority:     0
State:        Waiting
Reason:       CrashLoopBackOff
Last State:   Terminated
Reason:       Error
Warning  BackOff                1m (x5 over 1m)   kubelet, ip-10-0-9-132.us-east-2.compute.internal  Back-off restarting failed container

Next, check the logs of the failed pod with the kubectl logs command. The -p (or --previous) flag will retrieve the logs from the last failed instance of the pod, which is helpful for seeing what is happening at the application level.

The logs from all containers or just one container can be specified using the --all-containers flag. You can view the last portion of the log file by adding the ---tail flag.

kubectl logs <pod name> -n <namespace> -p
kubectl logs <pod name> -n <namespace> --previous
kubectl logs <pod name> -n <namespace> --all-containers
kubectl logs <pod name> -n <namespace> -c mycontainer
kubectl logs <pod name> -n <namespace> --tail 50

The output will be shown in a list, example below:

kube-system 60m Normal Pulling pod/node-problem-detector-vmcf2                        pulling image "k8s.gcr.io/node-problem-detector:v0.7.0"
kube-system 60m Normal Pulled pod/node-problem-detector-vmcf2                        Successfully pulled image "k8s.gcr.io/node-problem-detector:v0.7.0"
kube-system 60m Normal Created pod/node-problem-detector-vmcf2                        Created container
kube-system 60m Normal Started pod/node-problem-detector-vmcf2

The Causes of the CrashLoopBackOff Error

There are many causes of the CrashLoopBackOff error. Listed below are a few common ones:

  1. Misconfiguration of the container — check for typos or misconfigured values in the configuration files.
  2. Out of memory or resources — check the resource limits are correctly specified. This can be caused by a sudden or unexpected increase in traffic or activity. Check the “resources: limits” section of the configuration file.
  3. Two or more containers are configured to use the same port, which will cause the error if they’re in the same pod.
  4. The pods are attempting to connect to a file or database that is locked due to other pods using it.
  5. The pods may be referencing non-existent resources or packages, such as scripts that can be found in the container or a persistent storage volume.
  6. General error deploying the software — any bugs and exceptions specific to your software.
  7. Command line arguments may be incorrect or missing.
  8. The liveness probes are not configured correctly — check the configuration files.
  9. Incorrectly specified permissions or not enough permissions have been granted.
  10. The filesystem or folder the pod is trying to write to is read-only.
  11. Connection issues — networking configuration is incorrect, or DNS is unreachable. kube-dns may not be running, and the container cannot contact the external service.
  12. Incorrect environment variables are set — Use env to inspect the environment variables.
  13. In cases where identities are assigned to the pod, such as in Azure Kubernetes Service, where a managed identity from Azure Active Directory is incorrectly assigned or cannot be accessed.

Key Points

The CrashLoopBackoff status is a notification that the pod is being restarted due to an error and is waiting for the specified ‘backoff’ time until it will try to start again.

Running through the steps detailed above should help you get to the root cause of the status and rectify the problem.

If you need any assistance with managing your Kubernetes projects, take a look at Spacelift. It brings with it a GitOps flow, so your Kubernetes Deployments are synced with your Kubernetes Stacks, and pull requests show you a preview of what they’re planning to change. It also has an extensive selection of policies, which lets you automate compliance checks and build complex multi-stack workflows. You can check it for free by creating a trial account.

The Most Flexible CI/CD Automation Tool

Spacelift is an alternative to using homegrown solutions on top of a generic CI. It helps overcome common state management issues and adds several must-have capabilities s for infrastructure management.

Start free trial