In this article, we will examine the common OOMKilled error in Kubernetes, also denoted by exit code 137, learn what it means and its common causes, and, more importantly, learn how to fix it.
We will cover:
What is exit code 137?
Exit code 137 indicates that a process was forcibly terminated using signal 9 (SIGKILL). In Unix/Linux systems, when a process exits due to a signal, its exit code equals 128 plus the signal number. Since SIGKILL is signal 9, the resulting exit code is 128 + 9 = 137. This typically happens when the system or a user forcibly kills a process, often due to resource constraints like running out of memory.
In Kubernetes, when a container uses more memory than its assigned memory limit
, the Linux Out-Of-Memory (OOM) killer forcibly stops the process. This results in the container exiting with code 137, which corresponds to SIGKILL (signal 9) + 128
. The container is then marked as OOMKilled
in the Pod’s status.
This often happens when:
- The container exceeds its memory limit (as set in resources.limits.memory), triggering an OOMKill.
- The node runs critically low on memory, and the kernel OOMKiller terminates one or more containers.
To resolve it, either optimize memory usage or increase the container’s memory limit. Monitoring with tools like Prometheus or using kubectl describe pod
can help identify patterns.
The status of your pods will show ‘OOMKilled’ if they encounter the error, which you can view using the command:
kubectl get pods
Check out also how to fix exit code 127 in Kubernetes.
The Out-Of-Memory Killer (OOMKiller) is a mechanism in the Linux kernel (not native Kubernetes) that is responsible for preventing a system from running out of memory by killing processes that consume too much memory. When the system runs out of memory, the kernel invokes the OOMKiller to choose a process to kill in order to free up memory and keep the system running.
The OOMKiller works by selecting the process that consumes the most memory and is considered to be the least essential to the system’s operation. This selection process is based on several factors, including the process’s memory usage, its priority level, and the amount of time it has been running.
Once the OOMKiller selects a process, it immediately sends a SIGKILL (signal 9), which cannot be caught or ignored, and does not allow for graceful shutdown. If the process does not respond to the signal, the kernel forcibly terminates the process and frees up its memory.
Note: A pod that is killed due to a memory issue is not necessarily evicted from a node if the restart policy on the node is set to “Always”. It will instead try to restart the pod.
The OOMKiller is a last-resort mechanism that is only invoked when the system is in danger of running out of memory. While it can help to prevent a system from crashing due to memory exhaustion, it is important to note that killing processes can result in data loss and system instability. As such, it is recommended to configure your system to avoid OOM situations, for example, by monitoring memory usage, setting resource limits, and optimizing memory usage in your applications.
Going under the hood, the Linux kernel maintains an oom_score
for each process running on the host. The chance that the process will be killed is based on how high the score is.
A oom_score_adj
value allows users to customize the OOM process and define when processes should be terminated. Kubernetes uses the oom_score_adj
value when defining a Quality of Service (QoS) class for a pod.
There are three QoS classes that can be assigned to a pod, each with a matching value for oom_score_adj
:
- Guaranteed: -997
- BestEffort: 1000
- Burstable: min(max(2, 1000 — (1000 * memoryRequestBytes) / machineMemoryCapacityBytes), 999)
Because pods with the QOS value of Guaranteed have a lower value of -997, they are the last to be killed on a node that is running out of memory. BestEffort pods are the first to be killed as they have the highest value of 1000.
To see the QoS class of a pod, run the following command:
kubectl get pod -o jsonpath='{.status.qosClass}'
Run kubectl exec <podname> -it /bin/bash
to connect to the pod.
To see the oom_score
, run ps -ef
to get a list of all the processes, and make a note of all the process IDs. Then run cat /proc/$PID/oom_score
to see the the score.
Read more about the kubectl exec command.
Here are the common causes that can bring up a 137 exit code:
- Memory limits are too low – The container’s
memory
limit is insufficient for its actual workload, especially during peak operations or spikes in data processing. - Memory leaks – Applications that allocate memory without releasing it (e.g., due to coding bugs in languages like Java, Node.js, or Python) will gradually consume more memory, eventually hitting the limit.
- Inefficient resource configuration – Missing or misaligned
requests.memory
andlimits.memory
settings can lead to overcommitment at the node level, increasing the chance of OOMKills during contention. - Large buffers or in-memory caches – Applications that use in-memory caches (e.g., Redis, Elasticsearch) or handle large payloads (e.g., file uploads, large JSONs) may exceed memory limits unexpectedly.
- Java heap size misconfiguration – JVM-based applications may allocate more heap than the pod limit allows, unless explicitly capped with
-Xmx
settings, leading to OOMKills.
Regular monitoring with tools like Prometheus and Grafana, combined with memory profiling and autoscaling policies, can help prevent OOMKilled errors by ensuring appropriate memory provisioning.
Follow the steps below to diagnose the OOMKilled error:
Step 1: Check the pod logs
The first step in diagnosing an OOMKilled error is to check the pod logs to see if there are any error messages that indicate a memory issue.
Run kubectl describe pod <pod-name>
. Look for OOMKilled
under Last State
in the container status and relevant messages in Events
.
kubectl describe pod <podname>
State: Running
Started: Fri, 12 May 2023 11:14:13 +0200
Last State: Terminated
Reason: OOMKilled
Exit Code: 137
...
Use kubectl logs <pod-name>
to get logs, or inspect logs using container runtime tools (like crictl logs
) if you have direct node access.
Check out also how to view Kubernetes pod logs files with kubectl.
Step 2: Analyze resource limits
Inspect the pod definition and compare the memory limit to actual usage from monitoring tools:
resources:
limits:
memory: "512Mi"
Step 3: Check usage over time
Use Kubernetes monitoring tools such as Prometheus and Grafana, or kubectl top pod
to see memory usage patterns. This can help you identify which containers are consuming too much memory and triggering the OOMKilled error.
Look for spikes approaching or exceeding the limit.
Step 4: Tune memory management
If justified, increase memory limits, optimize code for lower usage, or implement retries if the operation can be broken into smaller chunks.
Below are the common causes of the OOMKilled Kubernetes error and their resolutions.
1. Increase memory limits in resource configuration
When containers exceed their memory limits, Kubernetes kills them with SIGKILL
, resulting in Exit Code 137. If your pod is running close to or beyond its assigned memory (limits.memory
), increasing this limit ensures the container has enough room to operate without being terminated. Update your pod’s or deployment’s resource specification:
resources:
requests:
memory: "512Mi"
limits:
memory: "1Gi"
Ensure that the new limits are realistic for the container’s workload. Monitor memory usage using metrics-server, Prometheus, or kubectl top
to establish an appropriate baseline. Be cautious not to over-allocate memory, as this can starve other pods on the same node.
2. Identify and fix memory leaks in the application
If increasing memory limits is only a temporary fix, the root cause might be a memory leak in the application code.
Use profiling tools specific to your application’s language (e.g., Valgrind
, Go pprof
, VisualVM
) to inspect heap usage over time. Analyze memory retention and allocation patterns to pinpoint and resolve leaks.
Memory leaks can occur gradually, making them harder to detect without sustained monitoring. Consider adding periodic memory consumption logging and alert thresholds, especially for long-running containers. Fixing these issues at the application level ensures consistent performance and stability regardless of Kubernetes configuration.
3. Use a liveness probe with graceful shutdown handling
A pod killed with SIGKILL
due to failing liveness probes or readiness probes under memory pressure can exit. To mitigate this, configure a proper livenessProbe
that allows enough time for the app to respond and include a preStop
hook or signal trap to allow graceful shutdowns:
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "sleep 10"]
This ensures the app can finish current tasks or flush memory before termination. Make sure your app listens to SIGTERM
and releases memory cleanly. Proper termination logic avoids abrupt kills and reduces the likelihood of 137 exit codes caused by probe misbehavior.
4. Evict pods from overcommitted nodes
On nodes running many pods, system memory can become overcommitted. Kubernetes may then invoke the Out-of-Memory Killer (OOMKiller), terminating containers with the highest memory consumption.
Use node allocatable monitoring and implement pod anti-affinity rules or taints to balance memory-intensive pods across nodes better:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
app: memory-heavy
topologyKey: "kubernetes.io/hostname"
This avoids concentrating high-memory pods on a single node. Additionally, you should enforce resource quotas at the namespace level, and the Kubelet eviction policy should be tuned if the node frequently runs low on memory.
Check out also how to fix CreateContainerConfigError and ImagePullBackOff error in Kubernetes.
There are a couple of ways in which you can prevent OOMKilled errors:
- Set appropriate memory limits: The maximum amount of memory a container is allowed to use shouldn’t be lower than your default workflow memory consumption.
For that, you will need to use metrics and monitoring to determine the typical memory usage of your application. Overestimating can lead to higher costs due to inefficient resource utilization (in this case, you must expand your nodes), but underestimating leads to frequent OOMKilled errors. - Leverage horizontal pod autoscaling: It is best practice to leverage the Kubernetes HPA Horizontal Pod Autoscaler to automatically increase the number of pod replicas when the memory demand is high for applications that can be scaled horizontally.
- Ensure node resource allocation: Ensure your node has enough resources to handle workloads. This means avoiding overcommitment and ensuring node autoscaling is configured when applicable.
- Optimize application memory usage: Monitor your application and refactor it, if possible, to reduce memory consumption.
- Avoid memory leaks in your application: On the application side, you should regularly check and fix memory leaks.
If you need help managing your Kubernetes projects, consider Spacelift. It brings with it a GitOps flow, so your Kubernetes Deployments are synced with your Kubernetes Stacks, and pull requests show you a preview of what they’re planning to change.
With Spacelift, you get:
- Policies to control what kind of resources engineers can create, what parameters they can have, how many approvals you need for a run, what kind of task you execute, what happens when a pull request is open, and where to send your notifications
- Stack dependencies to build multi-infrastructure automation workflows with dependencies, having the ability to build a workflow that can combine Terraform with Kubernetes, Ansible, and other infrastructure-as-code (IaC) tools such as OpenTofu, Pulumi, and CloudFormation,
- Self-service infrastructure via Blueprints that lets you declare YAML templates to easily configure all the aspects related to your stacks. These templates translate into a form that your engineers can easily use, even if they don’t have any experience with infrastructure as code tools. Blueprints also integrate natively with ServiceNow, enabling your developers to do what matters: developing application code while not sacrificing control
- Creature comforts such as contexts (reusable containers for your environment variables, files, and hooks), and the ability to run arbitrary code
- Drift detection and optional remediation
If you want to learn more about Spacelift, create a free account today or book a demo with one of our engineers.
OOM Killed in Kubernetes with Exit Code 137 means a container was terminated because it exceeded its memory limit.
To avoid the OOMKilled error, it is recommended to monitor memory usage in Kubernetes pods and containers, set resource limits to prevent containers from consuming too much memory, and optimize application code to reduce memory consumption.
Additionally, consider increasing the memory resources allocated to the pod or using horizontal pod autoscaling to scale up the number of pods in response to increased workload demands.
The most flexible CI/CD automation tool
Spacelift is an alternative to using homegrown solutions on top of a generic CI. It helps overcome common state management issues and adds several must-have capabilities for infrastructure management.