Argo Workflows is a Kubernetes-native workflow execution engine. Workflows are defined as a series of parallel steps, each of which runs in a container.
Argo Workflows is designed to be lightweight, scalable, and easy to use, yet still offers robust functionality for demanding workflow configurations. It’s a popular way to implement containerized workflow pipelines without manually configuring job processing infrastructure.
This article will explain what Argo Workflows is and how it works, then guide you through setting up a simple workflow that you can run in your own cluster.
Argo Workflows is a Kubernetes-native workflow engine that enables the orchestration of parallel jobs using custom resource definitions (CRDs). It allows you to define workflows as a series of steps or tasks written in YAML that execute containerized processes in a specific sequence or in parallel.
Each step in the workflow is run as a Kubernetes pod, making it highly scalable and cloud-native. Argo supports advanced patterns like DAGs (Directed Acyclic Graphs) and loops, making it suitable for CI/CD pipelines, data processing, and machine learning workflows.
Because it runs entirely within Kubernetes, Argo Workflows integrates seamlessly with other Kubernetes resources and supports automation, retry logic, and output passing between steps.
Argo Workflows key features include:
- DAG and Step-based workflows: Supports both directed acyclic graphs and step-based workflows to define complex job dependencies
- Conditional execution: Allows workflow steps to run based on dynamic conditions, enabling intelligent branching logic
- Timeouts and retries: Configure automatic retries and timeouts for individual steps or entire workflows to improve fault tolerance
- Suspend and resume: Pause workflows at specific steps and resume them later, useful for manual approvals or external dependencies
- Cron scheduling: Supports cron syntax to schedule recurring workflows without needing external tools
- Web UI: Offers a built-in interface to visualize, monitor, and manage workflows and their relationships in real time
- REST and gRPC API: Provides programmatic access to workflow operations through a flexible API layer supporting both HTTP and gRPC
- Artifact and parameter passing: Share data between workflow steps easily using parameters or artifact storage
- Scalability: Designed to handle high-volume, parallel workloads across distributed environments
What is the difference between Argo Workflows and Kubernetes jobs?
Kubernetes Jobs are objects that create a set number of Pods, run a command in them, and wait for a successful termination. Kubernetes also offers CronJobs, which allow you to automatically create Jobs on a schedule. However, these built-in resources are mainly suitable for simple self-contained jobs, such as creating a backup of your application.
Kubernetes Jobs are designed for simple, standalone batch processes that run to completion. Argo Workflows, on the other hand, are a more advanced tool for orchestrating multi-step workflows, where each step can depend on the output of the previous one.
Jobs in Argo Workflows can be much more complex. They provide pipeline-like functionality for workflow processes with multiple steps and dependencies. The output of each step can be used as the input of the next, such as in data processing pipelines where multiple applications are involved in producing the final output.
What is the difference between Argo Workflows and Argo CD?
The Argo project provides a range of open-source tools that support Kubernetes and GitOps operations. Argo Workflows and Argo CD are two such tools.
Argo CD is a declarative GitOps-powered CI/CD pipeline tool for deploying apps into your Kubernetes clusters. It connects to your source repositories and automatically syncs them into Kubernetes when changes occur.
Argo Workflows has a different purpose: It’s designed to run workflows in Kubernetes, independently of your code repositories. It focuses on providing mechanisms for modeling process-based operations in Kubernetes, including job orchestration and scheduling. This differs from Argo CD’s narrower focus on software delivery workflows.
In essence, Argo Workflows handles job orchestration, while Argo CD manages application deployment and lifecycle. Together, they can be used to build robust CI/CD pipelines, but they solve different parts of the problem.
Argo Workflows is controlled using the Workflow
Kubernetes CRD that it provides. Creating a Workflow object in your cluster will automatically run the process it defines. Importantly, Argo stores the run’s state within the object, so each instance represents not only a new workflow, but also a run of that workflow.
Below, you can find the architecture diagram for the Argo Workflows.
Argo Workflows templates
Argo Workflows uses YAML templates to define steps and tasks in a Kubernetes-native way. These templates are reusable, composable units that define what each step in a workflow does. There are six supported template types:
- Container template — This starts a new container in your Kubernetes cluster. Containers are configured using a regular Kubernetes container manifest spec, so you can use any image and command in your workflow.
- name: run-container
container:
image: alpine
command: ["echo"]
args: ["Hello from Argo!"]
- Script template — This starts a container in your Kubernetes cluster and automatically runs a specified script using a process in that container. For example, you can pass a JavaScript snippet directly to a container that’s running a Node.js image.
- name: run-script
script:
image: python:3.9
command: [python]
source: |
print("Hello from script!")
- Resource — Resource templates are used to perform actions on Kubernetes objects. You provide a Kubernetes object manifest and instruct Argo whether to get, create, apply, delete, replace, or patch it in your cluster.
- name: create-pod
resource:
action: create
manifest: |
apiVersion: v1
kind: Pod
metadata:
name: my-pod
spec:
containers:
- name: main
image: alpine
command: ["sleep", "60"]
- Suspend — When encountered within a workflow, this template instruction will make Argo suspend further execution until you either manually resume the process or a specified time elapses.
- name: wait-for-approval
suspend: {}
- Steps template — Step templates define sequential stages in your workflow. Each step contains a list of nested templates that will execute in parallel. Execution only moves to the next step once all the nested templates have completed.
- name: step-group
steps:
- - name: step1
template: run-script
- - name: step2
template: run-container
- DAG — DAG (directed acyclic graph) templates are how you configure dependencies between tasks. They’re a more versatile alternative to steps as they allow tasks to start running as soon as a specific named dependency has completed.
- name: dag-example
dag:
tasks:
- name: A
template: run-script
- name: B
dependencies: [A]
template: run-container
These core concepts are all you need to understand to begin modeling your processes in Argo Workflows.
Argo Workflows can be used in various scenarios where a chain of jobs must be run in order, with correct dependency resolution. Although you can adapt the tool to run any workflow, it’s particularly useful in the following scenarios:
1. Batch processing and data ingestion
Data is rarely delivered to organizations in a consumption-ready format. Collecting data and transforming it into the required structure is often a complex process that involves multiple apps. Argo Workflows lets you perform the entire procedure in your Kubernetes cluster.
2. Machine learning model training
Similarly, training new ML models can be laborious. You need to gather your training data, prepare it for evaluation, feed it into your model, and then collect the results before you can assess your model’s effectiveness. Argo Workflows can automate this process in Kubernetes, reducing costs and overheads compared to virtualized or proprietary cloud solutions.
3. Infrastructure automation
Argo Workflows can also be used to automate infrastructure processes, such as setting up cloud accounts and then provisioning new resources. It’s a good generalized alternative to regular IaC solutions when you have complex dependency chains or need to schedule infrastructure interactions.
All these workflows require multiple stages to achieve the final output. Argo Workflows lets you accurately model the entire process as a sequence of independent steps, where the output from one job becomes the next job’s input.
Below are some simple examples of Argo Workflows in YAML format.
Example 1: Hello World
Here is a very basic workflow that prints “hello world”.
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: hello-world-
spec:
entrypoint: whalesay
templates:
- name: whalesay
container:
image: docker/whalesay
command: [cowsay]
args: ["hello world"]
Example 2: Sequential Steps
The workflow below runs two steps in sequence: one prints “A”, the next prints “B”.
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: sequential-steps-
spec:
entrypoint: sequential-example
templates:
- name: sequential-example
steps:
- - name: step-a
template: echo
arguments:
parameters:
- name: message
value: "A"
- - name: step-b
template: echo
arguments:
parameters:
- name: message
value: "B"
- name: echo
inputs:
parameters:
- name: message
container:
image: alpine:latest
command: [sh, -c]
args: ["echo {{inputs.parameters.message}}"]
Example 3: DAG Workflow
This workflow runs tasks using a Directed Acyclic Graph (DAG) structure.
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: dag-example-
spec:
entrypoint: dag-example
templates:
- name: dag-example
dag:
tasks:
- name: A
template: echo
arguments:
parameters:
- name: message
value: "Task A"
- name: B
dependencies: [A]
template: echo
arguments:
parameters:
- name: message
value: "Task B (after A)"
- name: C
dependencies: [A]
template: echo
arguments:
parameters:
- name: message
value: "Task C (after A)"
- name: D
dependencies: [B, C]
template: echo
arguments:
parameters:
- name: message
value: "Task D (after B and C)"
- name: echo
inputs:
parameters:
- name: message
container:
image: alpine
command: [sh, -c]
args: ["echo {{inputs.parameters.message}}"]
Now, let’s look at a simple example of how you can implement a workflow in your cluster. We’ll configure a workflow that contrives a data processing pipeline.
To follow along with this guide, you’ll need Kubectl installed and configured with an active cluster connection.
Step 1: Install Argo Workflows
The first step is to install Argo Workflows in your cluster.
Use Kubectl to create a new namespace for the deployment:
$ kubectl create namespace argo
namespace/argo created
Next, head to the Argo Workflows GitHub releases page and check the latest release version. We’re using v3.5.0 for this guide.
Substitute the version number into the following command to install Argo Workflows in your Kubernetes cluster:
$ kubectl apply -n argo -f https://github.com/argoproj/argo-workflows/releases/download/v<VERSION>/install.yaml
Next, run the following command to bypass the Argo Server’s authentication requirements for testing purposes. This will allow you to access the Argo web UI without having to supply proper authentication, which is out of the scope of this tutorial. Switching to the server authentication mode is safe for this scenario because the web UI isn’t exposed publicly by default.
$ kubectl patch deployment \
argo-server \
--namespace argo \
--type='json' \
-p='[{"op": "replace", "path": "/spec/template/spec/containers/0/args", "value": [
"server",
"--auth-mode=server"
]}]'
deployment.apps/argo-server patched
Next, you must set up a Kubernetes RBAC role binding that allows Argo to interact with the resources in your Kubernetes cluster. The following command allows the Argo service account to interact with objects in your argo namespace with admin-level privileges:
$ kubectl create rolebinding argo-default-admin --clusterrole=admin --serviceaccount=argo:default -n argo
For real-world use, you should avoid assigning Argo’s service account the admin cluster role. Instead, create your own precisely scoped roles that let Argo access the specific namespaces and resources you require in your workflows.
Finally, use kubectl port-forward
to open a port-forwarding session to your Argo Server deployment. This will allow you to access the web UI in your browser.
$ kubectl -n argo port-forward deployment/argo-server 2746:2746
Visit http://localhost:2746
to load the web UI.
You’ll need to manually approve a browser security prompt—Argo defaults to using a self-signed TLS certificate that your browser won’t be able to verify.
Step 2: Install the Argo CLI
Argo has a CLI, which provides a convenient interface for submitting, monitoring, and recording your workflows. You can download the CLI from GitHub Releases. Use the version that matches the Argo release installed in your Kubernetes cluster.
The following steps can be used to install the CLI on Linux systems—remember to substitute the correct version number into the first command:
Install the Argo CLI
Argo has a CLI which provides a convenient interface for submitting, monitoring, and recording your workflows. You can download the CLI from GitHub Releases. Use the version that matches the Argo release installed in your Kubernetes cluster.
The following steps can be used to install the CLI on Linux systems—remember to substitute the correct version number into the first command:
This downloads the archive, extracts the binary inside, makes it executable, and moves it into your PATH
. Check that the CLI is functioning by testing a command:
$ argo version
argo: v3.5.0
BuildDate: 2023-10-13T14:43:06Z
Now you’re ready to create a workflow.
Step 3: Create a Workflow
Argo Workflows are defined like any other Kubernetes object: you write a YAML manifest (using the Workflow
CRD), then apply it to your cluster. Argo Server will then automatically run the workflow.
Here’s a minimal Workflow that defines a basic data processing sequence. The first job outputs a JSON array of user IDs ([1, 2, 3]
); the second step then transforms this data into an array of objects providing each user’s id
and name
.
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
name: demo-workflow
spec:
entrypoint: get-user-data
templates:
- name: get-user-data
steps:
- - name: get-user-ids
template: get-user-ids
- - name: transform-user-ids-to-objects
template: transform-user-ids-to-objects
arguments:
parameters:
- name: user-ids
value: "{{steps.get-user-ids.outputs.parameters.user-ids}}"
- name: get-user-ids
container:
image: busybox:latest
command: ["sh", "-c"]
args: ["echo '[1,2,3]' > /tmp/user-ids"]
outputs:
parameters:
- name: user-ids
valueFrom:
path: /tmp/user-ids
- name: transform-user-ids-to-objects
inputs:
parameters:
- name: user-ids
container:
image: alpine:latest
command: ["sh", "-c"]
args:
- |
apk add jq
echo "{{inputs.parameters.user-ids}}" | jq ".[] | {id: ., name: \"User \(.)\"}" > /tmp/users
outputs:
parameters:
- name: users
valueFrom:
path: /tmp/users
By inspecting the workflow’s definition, you can see the two sequential steps:
- The
get-user-ids
step runs a container that writes a file to/tmp/user-ids
. The content of this file is defined as an output parameter from the template, allowing it to be referenced by later jobs. - The
transform-user-ids-to-objects
step receives the user IDs parameter from the previous step. It uses thejq
JSON processor to transform the data, then produce a new output.
The top-level get-user-data
template is designated as the workflow’s starting point by the spec.entrypoint
field.
Step 4: Run the Workflow
To run your workflow, you can use the Argo CLI’s submit
command:
$ argo submit -n argo --watch workflow.yaml
The -n
flag specifies the namespace to run the workflow in, while --watch
means the workflow’s progress will be emitted to your terminal:
The output shows how the two steps in the workflow were executed in sequence.
Step 5: View the workflow in Argo UI
You can also view the workflow within the Argo web UI:
Clicking the workflow will show its details, including the dependency graph of the job’s steps:
Clicking one of the steps, such as the final transform-user-ids-to-objects
job, will reveal a flyout with its details. These include the input and output parameters. Here you can see that the job successfully transformed the input array into a collection of objects.
You’ve run your first workflow in Argo Workflows! To repeat the workflow run, click the “Resubmit” button in the web UI or run argo resubmit -n argo demo-workflow
in your terminal. Argo will then start a new run through the workflow.
Argo Workflows is an effective solution for running multi-step processes in Kubernetes. However, if you decide it’s unsuitable for your requirements, check out some of these alternatives instead:
- Apache Airflow – Apache’s Airflow project is a popular workflow system that supports DAG-based tasks and precise scheduling. It’s an extensible Python project that supports several different providers and job executors, including Kubernetes.
- Conductor – This microservices-oriented workflow orchestrator, developed by Netflix, lets you implement job workers in multiple programming languages, is enterprise-ready, and is scalable to millions of parallel processes.
- Prefect – Prefect is a dedicated workflow orchestration platform that allows you to schedule, run, retry, and debug processes from local development through to production. Prefect emphasizes control and observability; you can express workflows as pure Python code, without having to learn templates or DAG relationships.
- Google Cloud Workflows – Workflow functionality is also available as a commercial service from major cloud providers. Google Cloud Workflows is Google’s option for creating automated, scalable pipelines that consume and output data. Existing Google Cloud customers may find this is a quick way to set up processes, without needing to learn Kubernetes or other new technologies.
The best option for you will depend on the level of control you require and the environments you’re targeting. Because Argo Workflows is Kubernetes-native, it’s a great option for developers and technical teams already using containers. However, the processing requirements of DataOps teams and business execs could be better served by a more approachable, domain-specific platform.
What is the difference between Argo workflows and AWS step functions?
Argo Workflows is a Kubernetes-native workflow engine designed for containerized jobs, making it ideal for DevOps and ML pipelines running in Kubernetes clusters. It uses YAML to define workflows as a series of steps or DAGs.
AWS Step Functions, on the other hand, is a fully managed AWS service for orchestrating serverless functions and AWS services using a state machine model. It’s tightly integrated with the AWS ecosystem and often used for application workflows in cloud-native architectures.
In summary, Argo is best suited for Kubernetes-centric environments, while Step Functions are optimized for orchestrating AWS services in a serverless architecture.
Spacelift is an IaC management platform that uses GitOps to automate CI/CD for your infrastructure components. It supports OpenTofu, Terraform, Terragrunt, CloudFormation, Pulumi, Kubernetes, and Ansible.
The power of Spacelift lies in its fully automated hands-on approach. Once you’ve created a Spacelift stack for your project, changes to the IaC files in your repository will automatically be applied to your infrastructure.
Spacelift’s pull request integrations keep everyone informed of what will change by displaying which resources are going to be affected by new merges. Spacelift also allows you to enforce policies and automated compliance checks that prevent dangerous oversights from occurring.
Spacelift includes drift detection capabilities that periodically check your infrastructure for discrepancies compared to your repository’s state. It can then launch reconciliation jobs to restore the correct state, ensuring your infrastructure operates predictably and reliably.
IaC and immutable infrastructure are really important concepts to Kin. They chose Terraform as their platform and very quickly adopted a full-blown GitOps workflow. When you shift to treating infrastructure like a software project, you need all of the same components that a software project would have. That means having a CI/CD platform in place, and most aren’t suited to the demands of IaC. Kin discovered that Spacelift was purpose-built to fill that gap.
With Spacelift, you get:
- Policies to control what kind of resources engineers can create, what parameters they can have, how many approvals you need for a run, what kind of task you execute, what happens when a pull request is open, and where to send your notifications
- Stack dependencies to build multi-infrastructure automation workflows with dependencies, having the ability to build a workflow that, for example, generates your EC2 instances using Terraform and combines it with Ansible to configure them
- Self-service infrastructure via Blueprints, or Spacelift’s Kubernetes operator, enabling your developers to do what matters – developing application code while not sacrificing control
- Creature comforts such as contexts (reusable containers for your environment variables, files, and hooks), and the ability to run arbitrary code
- Drift detection and optional remediation
If you want to learn more about Spacelift, create a free account today or book a demo with one of our engineers.
We’ve explored Argo Workflows, a powerful job execution engine that runs jobs using containers in your Kubernetes cluster. Argo offers parallel execution, DAG support for dependencies, and a range of features for collecting job output, generating reports, and monitoring execution activity.
Argo Workflows is a general-purpose tool that’s best used to implement complex processes outside your software development lifecycle. If you mostly need to automate app deployments into Kubernetes, check out our guide to Argo CD instead.
The most flexible CI/CD automation tool
Spacelift is an alternative to using homegrown solutions on top of a generic CI. It helps overcome common state management issues and adds several must-have capabilities for infrastructure management.