The Practitioner’s Guide to Scaling Infrastructure as Code

➡️ Download Now

Kubernetes

What Are Kubernetes Jobs? Use Cases, Types & How to Run

What Are Kubernetes Jobs

Kubernetes Jobs run one-off tasks in your cluster by automatically creating Pods and managing their lifecycles until they terminate successfully. The Job’s only complete once a specified number of Pods have terminated.

In this guide, we’ll explain why Jobs are needed, how they work, and the different types supported by Kubernetes. We’ll also show some simple examples of how to create your own Jobs in your clusters.

  1. What is a Kubernetes Job?
  2. Kubernetes Job use cases
  3. Types of Kubernetes Jobs
  4. How to create a Kubernetes Job
  5. Running Jobs on a schedule with CronJobs

What is a Kubernetes Job?

Jobs are Kubernetes resources used to execute one-off tasks that must be reliably run to completion. A Job creates one or more Pods from a template and waits for at least a specified number of them to terminate successfully. The Job is then marked as complete.

Jobs enable scenarios where you need to run extra tasks separately from your main application. For example, many apps have asynchronous processes, such as background database migrations, that need to be completed after a new deployment is launched. You can use a Job to run these processes without blocking your app’s main Pods from starting up.

You can use Jobs to implement work queues, message processing systems, and other types of background or standalone activity for the apps deployed in your Kubernetes cluster. They allow you to easily run tasks independently of the long-lived Pods that serve your main application, preserving the proper separation of concerns for your containers.

Kubernetes Jobs vs Pods

Pods are the basic compute unit used in Kubernetes: they’re a group of one or more containers running on a Node in your cluster. Jobs are an abstraction used to start Pods from a template and then wait for them to terminate when their container processes exit.

Jobs are useful because starting a bare Pod doesn’t provide any lifecycle management capabilities. If the Node that’s running the Pod fails, then the Pod won’t be rescheduled anywhere else in your cluster. Using a Job lets you be sure a process will execute to completion, as Kubernetes will keep rescheduling Pods upon failures until the desired number of Pods has terminated successfully.

Kubernetes Job use cases

Jobs are suitable for any situation where you want to run a specific task in your cluster and be sure that it completes. Unlike application Pods that run long-lived server processes indefinitely, the container task that’s executed by your Jobs should be designed to operate once and then exit.

Typical Job use cases include:

  • Seeding app deployments — Jobs can be used to run app provisioning tasks such as database migrations. They’re a more versatile alternative to init containers when your task doesn’t need to block app startup.
  • On-demand actions — Some container images include utilities that can be used to run app maintenance tasks; executing these within a Job ensures they’ll complete reliably.
  • Work queues, batch processing, and message systems — Jobs can be used to help implement various background processing mechanisms that perform discrete actions separately from your app’s main Pods. This could be to clear a cache, flush a mail spool, or ingest data. Your app could use the Kubernetes API to create new Jobs when key events occur.

Jobs can also function as a basic workflow engine for general process automation, but they’re not primarily designed for this scenario. They don’t support multiple steps, dependencies, or automatically connected inputs and outputs, so you should consider a dedicated job processing solution like Argo Workflows if you require these features.

Types of Kubernetes Jobs

Kubernetes Jobs can be used to implement three main types of processes

  1. Non-parallel processes
  2. Multiple tasks in parallel (work queue)
  3. Multiple tasks in parallel (fixed completion count)

Non-parallel processes are simple Jobs that run one Pod and wait for it to complete. There’s no parallelism involved.

Multiple tasks in parallel (work queue) start and run several Pods in parallel. They’re used when your process consists of several tasks, but none are dependent on each other. This pattern allows the implementation of work queue systems, but this requires the use of an external service that coordinates what each Pod works on. The Job is complete once any one of the Pods terminates with success and all its peers have also exited.

Multiple tasks in parallel (fixed completion count) start and run several Pods in parallel. The Job continues running until a specified number of successful Pod completions have occurred. The Job is then marked as complete.

A Job’s type is determined based on the values you assign to the spec.completions and spec.parallelism fields in its manifest. The spec.completions field is only relevant to the Fixed Completion Count Job type, whereas spec.parallelism controls the number of Pods that will run in parallel during the Job.

Here’s a reference table that explains how to set these fields for the different job types:

Job Type spec.completions spec.parallelism
Non-Parallel Must be 1 or unset Must be 1 or unset
Multiple Tasks in Parallel (Work Queue) Must be unset Number of Pods to run in parallel
Multiple Tasks in Parallel (Fixed Completions) Number of Pod completions needed Optional — Number of Pods to run in parallel

Both fields default to a value of 1 when not set, resulting in a Non-Parallel Job being created.

How to create a Kubernetes Job

Jobs are created from YAML manifest files in the same way as other Kubernetes objects. You must set the kind field to Job to indicate that your manifest contains a Job object.

The following example defines a Job that runs a simple command inside a container running the busybox:latest image:

apiVersion: batch/v1
kind: Job
metadata:
  name: demo-job
spec:
  template:
    spec:
      containers:
        - name: demo-job
          image: busybox:latest
          command: ["/bin/sh", "-c", "echo 'Running job';"]
      restartPolicy: OnFailure

Within the manifest, the spec.template field provides the configuration for the Pods that will be created by the Job. The template can accept the same fields as a regular Pod manifest.

The spec.template.spec.restartPolicy field is required for Pods created by Jobs. It can be set to either OnFailure or Never:

  • OnFailure — If a container in the Pod fails, the Job controller will recreate the container within the same Pod after an increasing back off delay. This is the default behavior.
  • Never — This policy prevents individual containers from being restarted. The entire Pod will be marked as failed instead, causing the Job controller to start a new Pod that replaces it.

Once you’ve written your Job manifest, you can add it to your cluster using Kubectl:

$ kubectl apply -f job.yaml
job.batch/demo-job created

You can monitor your Jobs with Kubectl’s get jobs command:

$ kubectl get jobs
NAME       COMPLETIONS   DURATION   AGE
demo-job   1/1           6s         5m

The COMPLETIONS field is the most important output column—in this example, you can see that one Pod has been created and it has already completed successfully. To retrieve the Pod’s logs, use the kubectl logs command:

$ kubectl logs job/demo-job
Running job

You don’t need to respond to individual Pod failures manually. Kubernetes will keep creating new Pods until the required number of successful completions has been achieved.

Controlling Job parallelism

As explained above, parallelism is controlled by the spec.completions and spec.parallelism manifest fields.

The following Job requires three Pods to terminate successfully:

apiVersion: batch/v1
kind: Job
metadata:
  name: demo-job
spec:
  completions: 3
  template:
    spec:
      containers:
        - name: demo-job
          image: busybox:latest
          command: ["/bin/sh", "-c", "echo 'Running job';"]
      restartPolicy: OnFailure

If you apply this Job to your cluster, you’ll see that Kubernetes now starts three Pods in sequence:

$ kubectl get jobs
NAME       COMPLETIONS   DURATION   AGE
demo-job   3/3           12s        12s

Additionally setting the spec.parallelism field allows the Pods to be started and run concurrently, reducing the overall Job execution time. In the following example, up to three Pods can run together:

apiVersion: batch/v1
kind: Job
metadata:
  name: demo-job
spec:
  completions: 3
  parallelism: 3
  template:
    spec:
      containers:
        - name: demo-job
          image: busybox:latest
          command: ["/bin/sh", "-c", "echo 'Running job';"]
      restartPolicy: OnFailure

Handling failures

In an ideal environment, every Job would complete successfully. However, this won’t always happen in practice, so it’s important to correctly configure your Jobs to gracefully handle failures. There are a few best practices to follow:

  • Make your tasks resilient to being restarted — Kubernetes will recreate your Pods if they fail, so it’s important your tasks can withstand being restarted in a different Pod. This applies even for single-Pod non-parallel Jobs, as on rare occasions Kubernetes may still have to start the Pod multiple times.
  • Ensure your tasks support concurrency — If you’re using one of the parallel execution Job modes, then the tasks running in your Pods should be resilient to the possibility of other instances being live at the same time.
  • Define explicit failure conditions — It’s possible for Jobs to get stuck, such as if your app depends on a connection to an external service that’s unavailable when the Job runs. You can control this by setting the spec.backoffLimit field in your Pod manifest—it instructs Kubernetes to abort the Job once the number of failed Pods exceeds the defined back off limit. The field has a default value of 6.

These measures will help ensure your Jobs operate reliably with predictable results.

Suspending and resuming jobs

Kubernetes allows you to suspend and resume Jobs by patching the manifests added to your cluster. To suspend a Job, set its spec.suspend field to true, then reapply the manifest:

apiVersion: batch/v1
kind: Job
metadata:
  name: demo-job
spec:
  suspend: true
  template:
    spec:
      ...

You can resume the Job by either removing the spec.suspend field or reverting its value to false.

When a Job’s suspended, any running Pods will be terminated. No more Pods will be started until you resume the Job.

Cleaning up old jobs

Kubernetes doesn’t automatically delete old Jobs or their Pods. You can delete them manually using the kubectl delete job command:

$ kubectl delete job <job-name>

The deletion will cascade down to include the Pods that the Job created. You can optionally disable this behavior by setting the --cascade=orphan option. This will preserve the Pods while removing the Job object.

$ kubectl delete job <job-name> --cascade=orphan

To have Kubernetes delete old Jobs for you, you can set the spec.ttlSecondsAfterFinished field in your manifest files. Available since Kubernetes v1.23, this field marks the Job as eligible for automatic removal once it’s completed and a set number of seconds have passed. The following example will clean up the Job approximately five minutes after it completes:

apiVersion: batch/v1
kind: Job
metadata:
  name: demo-job
spec:
  ttlSecondsAfterFinished: 300
  template:
    spec:
      containers:
        - name: demo-job
          image: busybox:latest
          command: ["/bin/sh", "-c", "echo 'Running job';"]
      restartPolicy: OnFailure

Set spec.ttlSecondsAfterFinished to 0 to delete Jobs immediately after they finish.

Running Jobs on a schedule with CronJobs

Jobs only run a task once. Once the specified number of Pods has terminated successfully, the Job completes and does not repeat.

Kubernetes CronJobs are a separate mechanism for starting Jobs on a recurring schedule. You can use familiar cron syntax to regularly create a new Job from a template. This is ideal for backup routines, maintenance operations, and periodic batch processing of data.

The following CronJob manifest defines a Job that automatically runs once every hour:

apiVersion: batch/v1
kind: CronJob
metadata:
  name: demo-cronjob
spec:
  schedule: "0 * * * *"
  jobTemplate:
    spec:
      template:
        spec:
          containers:
            - name: demo-job
              image: busybox:latest
              command: ["/bin/sh", "-c", "echo 'Scheduled job';"]
          restartPolicy: OnFailure

Key points

We’ve looked at Kubernetes Jobs and how they allow you to reliably run one-off tasks in your cluster. Jobs can be used to implement queue systems, background processing logic, and seeding mechanisms for your apps. They can also be wrapped as CronJobs to run tasks on a recurring schedule conveniently.

Want to learn more about Kubernetes and its features? Check out the other topics on the Spacelift blog, such as our guide to using ConfigMaps to supply config values to your Pods and Jobs.

And take a look at how Spacelift helps you manage the complexities and compliance challenges of using Kubernetes. Anything that can be run via kubectl can be run within a Spacelift stack. Find out more about how Spacelift works with Kubernetes.

The Most Flexible CI/CD Automation Tool

Spacelift is an alternative to using homegrown solutions on top of a generic CI. It helps overcome common state management issues and adds several must-have capabilities s for infrastructure management.

Start free trial

Kubernetes Commands Cheat Sheet

Grab our ultimate cheat sheet PDF

for all the kubectl commands you need.

k8s book
Share your data and download the cheat sheet