Kubernetes Jobs run one-off tasks in your cluster by automatically creating Pods and managing their lifecycles until they terminate successfully. The Job’s only complete once a specified number of Pods have terminated.
In this guide, we’ll explain why Jobs are needed, how they work, and the different types supported by Kubernetes. We’ll also show some simple examples of how to create your own Jobs in your clusters.
Jobs are Kubernetes resources used to execute one-off tasks that must be reliably run to completion. A Job creates one or more Pods from a template and waits for at least a specified number of them to terminate successfully. The Job is then marked as complete.
Jobs enable scenarios where you need to run extra tasks separately from your main application. For example, many apps have asynchronous processes, such as background database migrations, that need to be completed after a new deployment is launched. You can use a Job to run these processes without blocking your app’s main Pods from starting up.
You can use Jobs to implement work queues, message processing systems, and other types of background or standalone activity for the apps deployed in your Kubernetes cluster. They allow you to easily run tasks independently of the long-lived Pods that serve your main application, preserving the proper separation of concerns for your containers.
Kubernetes Jobs vs Pods
Pods are the basic compute unit used in Kubernetes: they’re a group of one or more containers running on a Node in your cluster. Jobs are an abstraction used to start Pods from a template and then wait for them to terminate when their container processes exit.
Jobs are useful because starting a bare Pod doesn’t provide any lifecycle management capabilities. If the Node that’s running the Pod fails, then the Pod won’t be rescheduled anywhere else in your cluster. Using a Job lets you be sure a process will execute to completion, as Kubernetes will keep rescheduling Pods upon failures until the desired number of Pods has terminated successfully.
Jobs are suitable for any situation where you want to run a specific task in your cluster and be sure that it completes. Unlike application Pods that run long-lived server processes indefinitely, the container task that’s executed by your Jobs should be designed to operate once and then exit.
Typical Job use cases include:
- Seeding app deployments — Jobs can be used to run app provisioning tasks such as database migrations. They’re a more versatile alternative to init containers when your task doesn’t need to block app startup.
- On-demand actions — Some container images include utilities that can be used to run app maintenance tasks; executing these within a Job ensures they’ll complete reliably.
- Work queues, batch processing, and message systems — Jobs can be used to help implement various background processing mechanisms that perform discrete actions separately from your app’s main Pods. This could be to clear a cache, flush a mail spool, or ingest data. Your app could use the Kubernetes API to create new Jobs when key events occur.
Jobs can also function as a basic workflow engine for general process automation, but they’re not primarily designed for this scenario. They don’t support multiple steps, dependencies, or automatically connected inputs and outputs, so you should consider a dedicated job processing solution like Argo Workflows if you require these features.
Kubernetes Jobs can be used to implement three main types of processes
- Non-parallel processes
- Multiple tasks in parallel (work queue)
- Multiple tasks in parallel (fixed completion count)
Non-parallel processes are simple Jobs that run one Pod and wait for it to complete. There’s no parallelism involved.
Multiple tasks in parallel (work queue) start and run several Pods in parallel. They’re used when your process consists of several tasks, but none are dependent on each other. This pattern allows the implementation of work queue systems, but this requires the use of an external service that coordinates what each Pod works on. The Job is complete once any one of the Pods terminates with success and all its peers have also exited.
Multiple tasks in parallel (fixed completion count) start and run several Pods in parallel. The Job continues running until a specified number of successful Pod completions have occurred. The Job is then marked as complete.
A Job’s type is determined based on the values you assign to the spec.completions
and spec.parallelism
fields in its manifest. The spec.completions
field is only relevant to the Fixed Completion Count Job type, whereas spec.parallelism
controls the number of Pods that will run in parallel during the Job.
Here’s a reference table that explains how to set these fields for the different job types:
Job Type | spec.completions |
spec.parallelism |
Non-Parallel | Must be 1 or unset |
Must be 1 or unset |
Multiple Tasks in Parallel (Work Queue) | Must be unset | Number of Pods to run in parallel |
Multiple Tasks in Parallel (Fixed Completions) | Number of Pod completions needed | Optional — Number of Pods to run in parallel |
Both fields default to a value of 1
when not set, resulting in a Non-Parallel Job being created.
Jobs are created from YAML manifest files in the same way as other Kubernetes objects. You must set the kind
field to Job
to indicate that your manifest contains a Job object.
The following example defines a Job that runs a simple command inside a container running the busybox:latest
image:
apiVersion: batch/v1
kind: Job
metadata:
name: demo-job
spec:
template:
spec:
containers:
- name: demo-job
image: busybox:latest
command: ["/bin/sh", "-c", "echo 'Running job';"]
restartPolicy: OnFailure
Within the manifest, the spec.template
field provides the configuration for the Pods that will be created by the Job. The template can accept the same fields as a regular Pod manifest.
The spec.template.spec.restartPolic
y field is required for Pods created by Jobs. It can be set to either OnFailure
or Never
:
OnFailure
— If a container in the Pod fails, the Job controller will recreate the container within the same Pod after an increasing back off delay. This is the default behavior.Never
— This policy prevents individual containers from being restarted. The entire Pod will be marked as failed instead, causing the Job controller to start a new Pod that replaces it.
Once you’ve written your Job manifest, you can add it to your cluster using Kubectl:
$ kubectl apply -f job.yaml
job.batch/demo-job created
You can monitor your Jobs with Kubectl’s get jobs
command:
$ kubectl get jobs
NAME COMPLETIONS DURATION AGE
demo-job 1/1 6s 5m
The COMPLETIONS
field is the most important output column—in this example, you can see that one Pod has been created and it has already completed successfully. To retrieve the Pod’s logs, use the kubectl logs
command:
$ kubectl logs job/demo-job
Running job
You don’t need to respond to individual Pod failures manually. Kubernetes will keep creating new Pods until the required number of successful completions has been achieved.
Controlling Job parallelism
As explained above, parallelism is controlled by the spec.completions
and spec.parallelism
manifest fields.
The following Job requires three Pods to terminate successfully:
apiVersion: batch/v1
kind: Job
metadata:
name: demo-job
spec:
completions: 3
template:
spec:
containers:
- name: demo-job
image: busybox:latest
command: ["/bin/sh", "-c", "echo 'Running job';"]
restartPolicy: OnFailure
If you apply this Job to your cluster, you’ll see that Kubernetes now starts three Pods in sequence:
$ kubectl get jobs
NAME COMPLETIONS DURATION AGE
demo-job 3/3 12s 12s
Additionally setting the spec.parallelism
field allows the Pods to be started and run concurrently, reducing the overall Job execution time. In the following example, up to three Pods can run together:
apiVersion: batch/v1
kind: Job
metadata:
name: demo-job
spec:
completions: 3
parallelism: 3
template:
spec:
containers:
- name: demo-job
image: busybox:latest
command: ["/bin/sh", "-c", "echo 'Running job';"]
restartPolicy: OnFailure
Handling failures
In an ideal environment, every Job would complete successfully. However, this won’t always happen in practice, so it’s important to correctly configure your Jobs to gracefully handle failures. There are a few best practices to follow:
- Make your tasks resilient to being restarted — Kubernetes will recreate your Pods if they fail, so it’s important your tasks can withstand being restarted in a different Pod. This applies even for single-Pod non-parallel Jobs, as on rare occasions Kubernetes may still have to start the Pod multiple times.
- Ensure your tasks support concurrency — If you’re using one of the parallel execution Job modes, then the tasks running in your Pods should be resilient to the possibility of other instances being live at the same time.
- Define explicit failure conditions — It’s possible for Jobs to get stuck, such as if your app depends on a connection to an external service that’s unavailable when the Job runs. You can control this by setting the
spec.backoffLimit
field in your Pod manifest—it instructs Kubernetes to abort the Job once the number of failed Pods exceeds the defined back off limit. The field has a default value of6
.
These measures will help ensure your Jobs operate reliably with predictable results.
Suspending and resuming jobs
Kubernetes allows you to suspend and resume Jobs by patching the manifests added to your cluster. To suspend a Job, set its spec.suspend
field to true
, then reapply the manifest:
apiVersion: batch/v1
kind: Job
metadata:
name: demo-job
spec:
suspend: true
template:
spec:
...
You can resume the Job by either removing the spec.suspend
field or reverting its value to false
.
When a Job’s suspended, any running Pods will be terminated. No more Pods will be started until you resume the Job.
Cleaning up old jobs
Kubernetes doesn’t automatically delete old Jobs or their Pods. You can delete them manually using the kubectl delete job
command:
$ kubectl delete job <job-name>
The deletion will cascade down to include the Pods that the Job created. You can optionally disable this behavior by setting the --cascade=orphan
option. This will preserve the Pods while removing the Job object.
$ kubectl delete job <job-name> --cascade=orphan
To have Kubernetes delete old Jobs for you, you can set the spec.ttlSecondsAfterFinished
field in your manifest files. Available since Kubernetes v1.23, this field marks the Job as eligible for automatic removal once it’s completed and a set number of seconds have passed. The following example will clean up the Job approximately five minutes after it completes:
apiVersion: batch/v1
kind: Job
metadata:
name: demo-job
spec:
ttlSecondsAfterFinished: 300
template:
spec:
containers:
- name: demo-job
image: busybox:latest
command: ["/bin/sh", "-c", "echo 'Running job';"]
restartPolicy: OnFailure
Set spec.ttlSecondsAfterFinished
to 0
to delete Jobs immediately after they finish.
Jobs only run a task once. Once the specified number of Pods has terminated successfully, the Job completes and does not repeat.
Kubernetes CronJobs are a separate mechanism for starting Jobs on a recurring schedule. You can use familiar cron syntax to regularly create a new Job from a template. This is ideal for backup routines, maintenance operations, and periodic batch processing of data.
The following CronJob manifest defines a Job that automatically runs once every hour:
apiVersion: batch/v1
kind: CronJob
metadata:
name: demo-cronjob
spec:
schedule: "0 * * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: demo-job
image: busybox:latest
command: ["/bin/sh", "-c", "echo 'Scheduled job';"]
restartPolicy: OnFailure
We’ve looked at Kubernetes Jobs and how they allow you to reliably run one-off tasks in your cluster. Jobs can be used to implement queue systems, background processing logic, and seeding mechanisms for your apps. They can also be wrapped as CronJobs to run tasks on a recurring schedule conveniently.
Want to learn more about Kubernetes and its features? Check out the other topics on the Spacelift blog, such as our guide to using ConfigMaps to supply config values to your Pods and Jobs.
And take a look at how Spacelift helps you manage the complexities and compliance challenges of using Kubernetes. Anything that can be run via kubectl can be run within a Spacelift stack. Find out more about how Spacelift works with Kubernetes.
The Most Flexible CI/CD Automation Tool
Spacelift is an alternative to using homegrown solutions on top of a generic CI. It helps overcome common state management issues and adds several must-have capabilities s for infrastructure management.