The status of a pod in your Kubernetes (K8S) cluster may show the ‘CrashLoopBackoff’ error. This is shown when a pod has crashed and attempted to restart multiple times.
In this article, we will run through how to spot this error, how to fix it, and some reasons why it might occur.
kubelet will wait an increasing ‘backoff’ time between crashes before attempting to start the pod again. The K8S
restartPolicy is set to ‘always’ by default. This can also occur if the
restartPolicy has been changed to ‘onFailure.’
This time between restarts gives us time to troubleshoot to attempt to fix the issue. The maximum ‘backoff’ time is 5 minutes and starts at 10 seconds, increasing exponentially until it hits the maximum. When a pod shows the CrashLoopBackoff status, it means it is waiting for the given time specified by the ‘backoff’ to restart and will likely fail again. The container can briefly appear as ‘running’ before crashing again.
To show the status of your pods, run the following command:
kubectl get pods -n <namespace>
The status section will show the pod status. If the pod has the CrashLoopBackOff status, it will show as not ready, (as shown below 0/1), and will show more than 0 restarts.
NAME READY STATUS RESTARTS AGE nginx-5796d5bc7d-xtl6q 0/1 CrashLoopBackOff 4 1m
‘Normal’ statuses include:
The pod is running without any issues.
The pod is still starting up, it may be pulling the container image or receiving secret data. Once finished, it should transition to the running state.
Pods with this status either ran to completion or failed for some reason.
The four kubectl commands listed below are the recommended way to start troubleshooting your errored pods.
kubectl describe pod
kubectl get events
For more kubectl commands, see our: Kubernetes Cheat Sheet.
Starting with the
kubectl describe command, you can use this to get more detail.
kubectl describe pod <pod name> -n <namespace>
The pod status section will show any error messages associated with the pod.
The events section of the output will give you information on the pod’s status. Look for entries containing ‘Back-off restarting failed container’ as shown in the example below.
Name: pod name Namespace: default Priority: 0 … State: Waiting Reason: CrashLoopBackOff Last State: Terminated Reason: Error … Warning BackOff 1m (x5 over 1m) kubelet, ip-10-0-9-132.us-east-2.compute.internal Back-off restarting failed container …
Next, check the logs of the failed pod with the
kubectl logs command. The
--previous) flag will retrieve the logs from the last failed instance of the pod, which is helpful for seeing what is happening at the application level.
The logs from all containers or just one container can be specified using the
--all-containers flag. You can view the last portion of the log file by adding the
kubectl logs <pod name> -n <namespace> -p kubectl logs <pod name> -n <namespace> --previous kubectl logs <pod name> -n <namespace> --all-containers kubectl logs <pod name> -n <namespace> -c mycontainer kubectl logs <pod name> -n <namespace> --tail 50
The output will be shown in a list, example below:
kube-system 60m Normal Pulling pod/node-problem-detector-vmcf2 pulling image "k8s.gcr.io/node-problem-detector:v0.7.0" kube-system 60m Normal Pulled pod/node-problem-detector-vmcf2 Successfully pulled image "k8s.gcr.io/node-problem-detector:v0.7.0" kube-system 60m Normal Created pod/node-problem-detector-vmcf2 Created container kube-system 60m Normal Started pod/node-problem-detector-vmcf2
There are many causes of the CrashLoopBackOff error. Listed below are a few common ones:
- Misconfiguration of the container — check for typos or misconfigured values in the configuration files.
- Out of memory or resources — check the resource limits are correctly specified. This can be caused by a sudden or unexpected increase in traffic or activity. Check the “resources: limits” section of the configuration file.
- Two or more containers are configured to use the same port, which will cause the error if they’re in the same pod.
- The pods are attempting to connect to a file or database that is locked due to other pods using it.
- The pods may be referencing non-existent resources or packages, such as scripts that can be found in the container or a persistent storage volume.
- General error deploying the software — any bugs and exceptions specific to your software.
- Command line arguments may be incorrect or missing.
- The liveness probes are not configured correctly — check the configuration files.
- Incorrectly specified permissions or not enough permissions have been granted.
- The filesystem or folder the pod is trying to write to is read-only.
- Connection issues — networking configuration is incorrect, or DNS is unreachable.
kube-dnsmay not be running, and the container cannot contact the external service.
- Incorrect environment variables are set — Use env to inspect the environment variables.
- In cases where identities are assigned to the pod, such as in Azure Kubernetes Service, where a managed identity from Azure Active Directory is incorrectly assigned or cannot be accessed.
The CrashLoopBackoff status is a notification that the pod is being restarted due to an error and is waiting for the specified ‘backoff’ time until it will try to start again.
Running through the steps detailed above should help you get to the root cause of the status and rectify the problem.
If you need any assistance with managing your Kubernetes projects, take a look at Spacelift. It brings with it a GitOps flow, so your Kubernetes Deployments are synced with your Kubernetes Stacks, and pull requests show you a preview of what they’re planning to change. It also has an extensive selection of policies, which lets you automate compliance checks and build complex multi-stack workflows. You can check it for free by creating a trial account.
The Most Flexible CI/CD Automation Tool
Spacelift is an alternative to using homegrown solutions on top of a generic CI. It helps overcome common state management issues and adds several must-have capabilities s for infrastructure management.