[Demo Webinar] ⛏️ How to build a user-friendly infra self-service portal with Spacelift

➡️ Register Now

Kubernetes

Exit Code 137 – Fixing OOMKilled Kubernetes Error

OOMKilled Kubernetes Error (Exit Code 137)

🚀 Level Up Your Infrastructure Skills

You focus on building. We’ll keep you updated. Get curated infrastructure insights that help you make smarter decisions.

In this article, we will examine the common OOMKilled error in Kubernetes, also denoted by exit code 137, learn what it means and its common causes, and, more importantly, learn how to fix it.

We will cover:

  1. What is the OOMKilled Kubernetes Error (exit code 137)?
  2. How does the OOMKiller mechanism work?
  3. Exit Code 137 common causes
  4. OOMKilled (exit code 137) diagnosis
  5. How to troubleshoot and fix exit code 137
  6. How to prevent OOMKilled errors

What is exit code 137?

Exit code 137 indicates that a process was forcibly terminated using signal 9 (SIGKILL). In Unix/Linux systems, when a process exits due to a signal, its exit code equals 128 plus the signal number. Since SIGKILL is signal 9, the resulting exit code is 128 + 9 = 137. This typically happens when the system or a user forcibly kills a process, often due to resource constraints like running out of memory.

What is the OOMKilled Kubernetes error (exit code 137)?

In Kubernetes, when a container uses more memory than its assigned memory limit, the Linux Out-Of-Memory (OOM) killer forcibly stops the process. This results in the container exiting with code 137, which corresponds to SIGKILL (signal 9) + 128. The container is then marked as OOMKilled in the Pod’s status.

This often happens when:

  • The container exceeds its memory limit (as set in resources.limits.memory), triggering an OOMKill.
  • The node runs critically low on memory, and the kernel OOMKiller terminates one or more containers.

To resolve it, either optimize memory usage or increase the container’s memory limit. Monitoring with tools like Prometheus or using kubectl describe pod can help identify patterns.

The status of your pods will show ‘OOMKilled’ if they encounter the error, which you can view using the command:

kubectl get pods

How does the OOMKiller mechanism work?

The Out-Of-Memory Killer (OOMKiller) is a mechanism in the Linux kernel (not native Kubernetes) that is responsible for preventing a system from running out of memory by killing processes that consume too much memory. When the system runs out of memory, the kernel invokes the OOMKiller to choose a process to kill in order to free up memory and keep the system running.

The OOMKiller works by selecting the process that consumes the most memory and is considered to be the least essential to the system’s operation. This selection process is based on several factors, including the process’s memory usage, its priority level, and the amount of time it has been running.

Once the OOMKiller selects a process, it immediately sends a SIGKILL (signal 9), which cannot be caught or ignored, and does not allow for graceful shutdown. If the process does not respond to the signal, the kernel forcibly terminates the process and frees up its memory.

Note: A pod that is killed due to a memory issue is not necessarily evicted from a node if the restart policy on the node is set to “Always”. It will instead try to restart the pod.

The OOMKiller is a last-resort mechanism that is only invoked when the system is in danger of running out of memory. While it can help to prevent a system from crashing due to memory exhaustion, it is important to note that killing processes can result in data loss and system instability. As such, it is recommended to configure your system to avoid OOM situations, for example, by monitoring memory usage, setting resource limits, and optimizing memory usage in your applications.

Going under the hood, the Linux kernel maintains an oom_score for each process running on the host.  The chance that the process will be killed is based on how high the score is.

A oom_score_adj value allows users to customize the OOM process and define when processes should be terminated. Kubernetes uses the oom_score_adj value when defining a Quality of Service (QoS) class for a pod.

There are three QoS classes that can be assigned to a pod, each with a matching value for oom_score_adj:

  • Guaranteed: -997
  • BestEffort: 1000
  • Burstable: min(max(2, 1000 — (1000 * memoryRequestBytes) / machineMemoryCapacityBytes), 999)

Because pods with the QOS value of Guaranteed have a lower value of -997, they are the last to be killed on a node that is running out of memory. BestEffort pods are the first to be killed as they have the highest value of 1000.

To see the QoS class of a pod, run the following command:

kubectl get pod -o jsonpath='{.status.qosClass}'

Run kubectl exec <podname> -it /bin/bash to connect to the pod.

To see the oom_score, run ps -ef to get a list of all the processes, and make a note of all the process IDs. Then run cat /proc/$PID/oom_score to see the the score.

Read more about the kubectl exec command.

Common causes of exit code 137

Here are the common causes  that can bring up a 137 exit code:

  • Memory limits are too low – The container’s memory limit is insufficient for its actual workload, especially during peak operations or spikes in data processing.
  • Memory leaks – Applications that allocate memory without releasing it (e.g., due to coding bugs in languages like Java, Node.js, or Python) will gradually consume more memory, eventually hitting the limit.
  • Inefficient resource configuration – Missing or misaligned requests.memory and limits.memory settings can lead to overcommitment at the node level, increasing the chance of OOMKills during contention.
  • Large buffers or in-memory caches – Applications that use in-memory caches (e.g., Redis, Elasticsearch) or handle large payloads (e.g., file uploads, large JSONs) may exceed memory limits unexpectedly.
  • Java heap size misconfiguration – JVM-based applications may allocate more heap than the pod limit allows, unless explicitly capped with -Xmx settings, leading to OOMKills.

Regular monitoring with tools like Prometheus and Grafana, combined with memory profiling and autoscaling policies, can help prevent OOMKilled errors by ensuring appropriate memory provisioning.

OOMKilled (Exit Code 137) diagnosis

Follow the steps below to diagnose the OOMKilled error:

Step 1: Check the pod logs

The first step in diagnosing an OOMKilled error is to check the pod logs to see if there are any error messages that indicate a memory issue.

Run kubectl describe pod <pod-name>. Look for OOMKilled under Last State in the container status and relevant messages in Events.

kubectl describe pod <podname>
State:          Running
       Started:      Fri, 12 May 2023 11:14:13 +0200
       Last State:   Terminated
       Reason:       OOMKilled
       Exit Code:    137
       ...

Use kubectl logs <pod-name> to get logs, or inspect logs using container runtime tools (like crictl logs) if you have direct node access.

Check out also how to view Kubernetes pod logs files with kubectl.

Step 2: Analyze resource limits

Inspect the pod definition and compare the memory limit to actual usage from monitoring tools:

resources:
  limits:
    memory: "512Mi"

Step 3: Check usage over time

Use Kubernetes monitoring tools such as Prometheus and Grafana, or kubectl top pod to see memory usage patterns. This can help you identify which containers are consuming too much memory and triggering the OOMKilled error.

Look for spikes approaching or exceeding the limit.

Step 4: Tune memory management

If justified, increase memory limits, optimize code for lower usage, or implement retries if the operation can be broken into smaller chunks.

How to troubleshoot and fix Exit Code 137

Below are the common causes of the OOMKilled Kubernetes error and their resolutions.

1. Increase memory limits in resource configuration

When containers exceed their memory limits, Kubernetes kills them with SIGKILL, resulting in Exit Code 137. If your pod is running close to or beyond its assigned memory (limits.memory), increasing this limit ensures the container has enough room to operate without being terminated. Update your pod’s or deployment’s resource specification:

resources:
  requests:
    memory: "512Mi"
  limits:
    memory: "1Gi"

Ensure that the new limits are realistic for the container’s workload. Monitor memory usage using metrics-server, Prometheus, or kubectl top to establish an appropriate baseline. Be cautious not to over-allocate memory, as this can starve other pods on the same node.

2. Identify and fix memory leaks in the application

If increasing memory limits is only a temporary fix, the root cause might be a memory leak in the application code. 

Use profiling tools specific to your application’s language (e.g., Valgrind, Go pprof, VisualVM) to inspect heap usage over time. Analyze memory retention and allocation patterns to pinpoint and resolve leaks.

Memory leaks can occur gradually, making them harder to detect without sustained monitoring. Consider adding periodic memory consumption logging and alert thresholds, especially for long-running containers. Fixing these issues at the application level ensures consistent performance and stability regardless of Kubernetes configuration.

3. Use a liveness probe with graceful shutdown handling

A pod killed with SIGKILL due to failing liveness probes or readiness probes under memory pressure can exit. To mitigate this, configure a proper livenessProbe that allows enough time for the app to respond and include a preStop hook or signal trap to allow graceful shutdowns:

lifecycle:
  preStop:
    exec:
      command: ["/bin/sh", "-c", "sleep 10"]

This ensures the app can finish current tasks or flush memory before termination. Make sure your app listens to SIGTERM and releases memory cleanly. Proper termination logic avoids abrupt kills and reduces the likelihood of 137 exit codes caused by probe misbehavior.

4. Evict pods from overcommitted nodes

On nodes running many pods, system memory can become overcommitted. Kubernetes may then invoke the Out-of-Memory Killer (OOMKiller), terminating containers with the highest memory consumption. 

Use node allocatable monitoring and implement pod anti-affinity rules or taints to balance memory-intensive pods across nodes better:

affinity:
  podAntiAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchLabels:
            app: memory-heavy
        topologyKey: "kubernetes.io/hostname"

This avoids concentrating high-memory pods on a single node. Additionally, you should enforce resource quotas at the namespace level, and the Kubelet eviction policy should be tuned if the node frequently runs low on memory. 

Check out also how to fix CreateContainerConfigError and ImagePullBackOff error in Kubernetes.

How to prevent OOMKilled errors

There are a couple of ways in which you can prevent OOMKilled errors:

  1. Set appropriate memory limits: The maximum amount of memory a container is allowed to use shouldn’t be lower than your default workflow memory consumption.
    For that, you will need to use metrics and monitoring to determine the typical memory usage of your application. Overestimating can lead to higher costs due to inefficient resource utilization (in this case, you must expand your nodes), but underestimating leads to frequent OOMKilled errors.
  2. Leverage horizontal pod autoscaling: It is best practice to leverage the Kubernetes HPA Horizontal Pod Autoscaler to automatically increase the number of pod replicas when the memory demand is high for applications that can be scaled horizontally.
  3. Ensure node resource allocation: Ensure your node has enough resources to handle workloads. This means avoiding overcommitment and ensuring node autoscaling is configured when applicable.
  4. Optimize application memory usage: Monitor your application and refactor it, if possible, to reduce memory consumption.
  5. Avoid memory leaks in your application: On the application side, you should regularly check and fix memory leaks.

Managing Kubernetes with Spacelift

If you need help managing your Kubernetes projects, consider Spacelift. It brings with it a GitOps flow, so your Kubernetes Deployments are synced with your Kubernetes Stacks, and pull requests show you a preview of what they’re planning to change. 

With Spacelift, you get:

  • Policies to control what kind of resources engineers can create, what parameters they can have, how many approvals you need for a run, what kind of task you execute, what happens when a pull request is open, and where to send your notifications
  • Stack dependencies to build multi-infrastructure automation workflows with dependencies, having the ability to build a workflow that can combine Terraform with Kubernetes, Ansible, and other infrastructure-as-code (IaC) tools such as OpenTofu, Pulumi, and CloudFormation,
  • Self-service infrastructure via Blueprints that lets you declare YAML templates to easily configure all the aspects related to your stacks. These templates translate into a form that your engineers can easily use, even if they don’t have any experience with infrastructure as code tools. Blueprints also integrate natively with ServiceNow, enabling your developers to do what matters: developing application code while not sacrificing control
  • Creature comforts such as contexts (reusable containers for your environment variables, files, and hooks), and the ability to run arbitrary code
  • Drift detection and optional remediation

If you want to learn more about Spacelift, create a free account today or book a demo with one of our engineers.

Key points

OOM Killed in Kubernetes with Exit Code 137 means a container was terminated because it exceeded its memory limit.

To avoid the OOMKilled error, it is recommended to monitor memory usage in Kubernetes pods and containers, set resource limits to prevent containers from consuming too much memory, and optimize application code to reduce memory consumption.

Additionally, consider increasing the memory resources allocated to the pod or using horizontal pod autoscaling to scale up the number of pods in response to increased workload demands.

The most flexible CI/CD automation tool

Spacelift is an alternative to using homegrown solutions on top of a generic CI. It helps overcome common state management issues and adds several must-have capabilities for infrastructure management.

Start free trial

Kubernetes Commands Cheat Sheet

Grab our ultimate cheat sheet PDF

for all the kubectl commands you need.

k8s book
Share your data and download the cheat sheet