Argo Workflows is a Kubernetes-native workflow execution engine. Workflows are defined as a series of parallel steps, each of which runs in a container.
Argo Workflows is designed to be lightweight, scalable, and easy to use, yet still offer robust functionality for demanding workflow configurations. It’s a popular way to implement containerized workflow pipelines, without having to manually configure job processing infrastructure yourself.
This article will explain what Argo Workflows is and how it works, then guide you through setting up a simple workflow that you can run in your own cluster.
Argo Workflows is a workflow engine for Kubernetes clusters. It allows you to easily orchestrate parallel jobs in your cluster, where each job needs to be run in a separate container. Argo Workflows models workflows as a directed acyclic graph (DAG), facilitating quick execution of complex workflows where jobs have multiple inter-dependencies.
Once installed in your cluster, Argo Workflows provides a Kubernetes CRD—the Workflow resource—that you can use to define and configure your workflows. Argo then executes your jobs by creating containers inside your cluster. This is simpler than traditional job engines, which typically use heavy virtual machines to run each job step.
Workflow features supported by Argo Workflows include conditional steps, timeouts, retries, suspend and resume, and scheduling using cron syntax. Argo also comes with a web UI that you can use to visualize and manage workflows with their dependencies. A separate REST API capable of both HTTP and gRPC interactions enables programmatic interactions with your workflows.
What is the difference between Argo Workflows and Kubernetes jobs?
Kubernetes Jobs are objects that create a set number of Pods, run a command in them, and wait for a successful termination. Kubernetes also offers CronJobs, which allow you to automatically create Jobs on a schedule. However, these built-in resources are mainly suitable for simple self-contained jobs, such as creating a backup of your application.
Jobs in Argo Workflows can be much more complex. They provide pipeline-like functionality for workflow processes that have multiple steps and dependencies. The output of each step can be used as the input of the next, such as in data processing pipelines where multiple applications are involved in producing the final output.
What is the difference between Argo Workflows and Argo CD?
The Argo project provides a range of open source tools that support Kubernetes and GitOps operations. Argo Workflows is one of these tools, while Argo CD is another.
Argo CD is a declarative GitOps-powered CI/CD pipeline tool for deploying apps into your Kubernetes clusters. It connects to your source repositories and automatically syncs them into Kubernetes when changes occur. Argo Workflows has a different purpose: it’s designed to run workflows in Kubernetes, independently of your code repositories. It focuses on providing mechanisms for modeling process-based operations in Kubernetes, including job orchestration and scheduling. This differs from Argo CD’s narrower focus on software delivery workflows.
Argo Workflows is controlled using the Workflow
Kubernetes CRD that it provides. Creating a Workflow object in your cluster will automatically run the process it defines. Importantly, Argo stores the run’s state within the object, so each instance not only represents a new workflow, but also a run of that workflow.
Below, you can find the architecture diagram for the Argo Workflows.
Argo Workflows are comprised of templates that determine which operations will be carried out. There are six supported template types:
- Container — Starts a new container in your Kubernetes cluster. Containers are configured using a regular Kubernetes container manifest spec, so you can use any image and command in your workflow.
- Script — Starts a container in your Kubernetes cluster and automatically runs a specified script using a process in that container. For example, you can pass a JavaScript snippet directly to a container that’s running a Node.js image.
- Resource — Resource templates are used to perform actions on Kubernetes objects. You provide a Kubernetes object manifest and instruct Argo whether to get, create, apply, delete, replace, or patch it in your cluster.
- Suspend — When encountered within a workflow, this template instruction will make Argo suspend further execution until you either manually resume the process or a specified time elapses.
- Steps — Step templates define sequential stages in your workflow. Each step contains a list of nested templates that will execute in parallel. Execution only moves to the next step once all the nested templates have completed.
- DAG — DAG (directed acyclic graph) templates are how you configure dependencies between tasks. They’re a more versatile alternative to steps as they allow tasks to start running as soon as a specific named dependency has completed.
These core concepts are all you need to understand to begin modeling your processes in Argo Workflows.
Argo Workflows can be used in a variety of scenarios where a chain of jobs must be run in order, with correct dependency resolution. Although you can adapt the tool to run any workflow, it’s particularly useful in the following scenarios:
1. Batch processing and data ingest
Data is rarely delivered to organizations in a format that’s ready to consume. Collecting data, then transforming it into the required structure is often a complex process that includes multiple apps. Argo Workflows lets you perform the entire procedure in your Kubernetes cluster.
2. Machine learning model training
Similarly, training new ML models can be laborious. You need to gather your training data, prepare it for evaluation, feed it into your model, and then collect the results before you can assess your model’s effectiveness. Argo Workflows can automate this process in Kubernetes, reducing costs and overheads compared to virtualized or proprietary cloud solutions.
3. Infrastructure automation
Argo Workflows can also be used to automate infrastructure processes, such as setting up cloud accounts, then provisioning new resources in them. It’s a good generalized alternative to regular IaC solutions when you have complex dependency chains or need to schedule infrastructure interactions.
All these workflows require multiple stages to achieve the final output. Argo Workflows lets you accurately model the entire process as a sequence of independent steps, where the output from one job becomes the next job’s input.
Now, let’s look at a simple example of how you can implement a workflow in your cluster. We’ll configure a workflow that contrives a data processing pipeline.
To follow along with this guide, you’ll need Kubectl installed and configured with an active cluster connection.
Step 1: Install Argo Workflows
The first step is to install Argo Workflows in your cluster.
Use Kubectl to create a new namespace for the deployment:
$ kubectl create namespace argo
namespace/argo created
Next, head to the Argo Workflows GitHub releases page and check the latest release version. We’re using v3.5.0 for this guide.
Substitute the version number into the following command to install Argo Workflows in your Kubernetes cluster:
$ kubectl apply -n argo -f https://github.com/argoproj/argo-workflows/releases/download/v<VERSION>/install.yaml
Next, run the following command to bypass the Argo Server’s authentication requirements for testing purposes. This will allow you to access the Argo web UI without having to supply proper authentication, which is out of the scope of this tutorial. Switching to the server authentication mode is safe for this scenario because the web UI isn’t exposed publicly by default.
$ kubectl patch deployment \
argo-server \
--namespace argo \
--type='json' \
-p='[{"op": "replace", "path": "/spec/template/spec/containers/0/args", "value": [
"server",
"--auth-mode=server"
]}]'
deployment.apps/argo-server patched
Next, you must set up a Kubernetes RBAC role binding that allows Argo to interact with the resources in your Kubernetes cluster. The following command allows the Argo service account to interact with objects in your argo
namespace with admin-level privileges:
$ kubectl create rolebinding argo-default-admin --clusterrole=admin --serviceaccount=argo:default -n argo
For real-world use, you should avoid assigning Argo’s service account the admin cluster role. Instead, create your own precisely scoped roles that let Argo access the specific namespaces and resources you require in your workflows.
Finally, use kubectl port-forward
to open a port-forwarding session to your Argo Server deployment. This will allow you to access the web UI in your browser.
$ kubectl -n argo port-forward deployment/argo-server 2746:2746
Visit http://localhost:2746
to load the web UI. You’ll need to manually approve a browser security prompt—Argo defaults to using a self-signed TLS certificate that your browser won’t be able to verify.
Step 2: Install the Argo CLI
Argo has a CLI, which provides a convenient interface for submitting, monitoring, and recording your workflows. You can download the CLI from GitHub Releases. Use the version that matches the Argo release installed in your Kubernetes cluster.
The following steps can be used to install the CLI on Linux systems—remember to substitute the correct version number into the first command:
Install the Argo CLI
Argo has a CLI which provides a convenient interface for submitting, monitoring, and recording your workflows. You can download the CLI from GitHub Releases. Use the version that matches the Argo release installed in your Kubernetes cluster.
The following steps can be used to install the CLI on Linux systems—remember to substitute the correct version number into the first command:
This downloads the archive, extracts the binary inside, makes it executable, and moves it into your PATH
. Check that the CLI is functioning by testing a command:
$ argo version
argo: v3.5.0
BuildDate: 2023-10-13T14:43:06Z
Now you’re ready to create a workflow.
Step 3: Create a Workflow
Argo Workflows are defined like any other Kubernetes object: you write a YAML manifest (using the Workflow
CRD), then apply it to your cluster. Argo Server will then automatically run the workflow.
Here’s a minimal Workflow that defines a basic data processing sequence. The first job outputs a JSON array of user IDs ([1, 2, 3]
); the second step then transforms this data into an array of objects providing each user’s id
and name
.
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
name: demo-workflow
spec:
entrypoint: get-user-data
templates:
- name: get-user-data
steps:
- - name: get-user-ids
template: get-user-ids
- - name: transform-user-ids-to-objects
template: transform-user-ids-to-objects
arguments:
parameters:
- name: user-ids
value: "{{steps.get-user-ids.outputs.parameters.user-ids}}"
- name: get-user-ids
container:
image: busybox:latest
command: ["sh", "-c"]
args: ["echo '[1,2,3]' > /tmp/user-ids"]
outputs:
parameters:
- name: user-ids
valueFrom:
path: /tmp/user-ids
- name: transform-user-ids-to-objects
inputs:
parameters:
- name: user-ids
container:
image: alpine:latest
command: ["sh", "-c"]
args:
- |
apk add jq
echo "{{inputs.parameters.user-ids}}" | jq ".[] | {id: ., name: \"User \(.)\"}" > /tmp/users
outputs:
parameters:
- name: users
valueFrom:
path: /tmp/users
By inspecting the workflow’s definition, you can see the two sequential steps:
- The
get-user-ids
step runs a container that writes a file to/tmp/user-ids
. The content of this file is defined as an output parameter from the template, allowing it to be referenced by later jobs. - The
transform-user-ids-to-objects
step receives the user IDs parameter from the previous step. It uses thejq
JSON processor to transform the data, then produce a new output.
The top-level get-user-data
template is designated as the workflow’s starting point by the spec.entrypoint
field.
To run your workflow, you can use the Argo CLI’s submit
command:
$ argo submit -n argo --watch workflow.yaml
The -n
flag specifies the namespace to run the workflow in, while --watch
means the workflow’s progress will be emitted to your terminal:
The output shows how the two steps in the workflow were executed in sequence.
You can also view the workflow within the Argo web UI:
Clicking the workflow will show its details, including the dependency graph of the job’s steps:
Clicking one of the steps, such as the final transform-user-ids-to-objects
job, will reveal a flyout with its details. These include the input and output parameters. Here you can see that the job successfully transformed the input array into a collection of objects.
You’ve run your first workflow in Argo Workflows! To repeat the workflow run, click the “Resubmit” button in the web UI or run argo resubmit -n argo demo-workflow
in your terminal. Argo will then start a new run through the workflow.
Argo Workflows is an effective solution for running multi-step processes in Kubernetes. However, if you decide it’s unsuitable for your requirements, you could check out some of these alternatives instead:
Apache Airflow
Apache’s Airflow project is a popular workflow system that supports DAG-based tasks and precise scheduling. It’s an extensible Python project with support for several different providers and job executors, including Kubernetes.
Conductor
A microservices-oriented workflow orchestrator that’s developed by Netflix. It lets you implement job workers in multiple programming languages, is enterprise-ready, and scalable to millions of parallel processes.
Prefect
Prefect is a dedicated workflow orchestration platform that allows you to schedule, run, retry, and debug processes from local development through to production. Prefect emphasizes control and observability; you can express workflows as pure Python code, without having to learn templates or DAG relationships.
Google Cloud Workflows
Workflow functionality is also available as a commercial service from major cloud providers. Google Cloud Workflows is Google’s option for creating automated scalable pipelines that consume and output data. Existing Google Cloud customers may find this is a quick way to set up processes, without needing to learn Kubernetes or other new technologies.
The best option for you will depend on the level of control you require, as well as the environments you’re targeting. Because Argo Workflows is Kubernetes-native, it’s a great option for developers and technical teams who are already using containers. But the processing requirements of DataOps teams and business execs could be better served by a more approachable, domain-specific platform.
We’ve explored Argo Workflows, a powerful job execution engine that runs jobs using containers in your Kubernetes cluster. Argo offers parallel execution, DAG support for dependencies, and a range of features for collecting job output, generating reports, and monitoring execution activity.
Argo Workflows is a general-purpose tool that’s best used to implement complex processes outside your software development lifecycle. If you mostly need to automate app deployments into Kubernetes, then you might want to check out our guide to Argo CD instead.
Try Spacelift’s CI/CD platform to collaborate on infrastructure using multiple IaC providers, including Kubernetes, Ansible, and Terraform. Spacelift lets you visualize your resources, prevent drift, and help developers ship fast within precise policy-driven guardrails.
The Most Flexible CI/CD Automation Tool
Spacelift is an alternative to using homegrown solutions on top of a generic CI. It helps overcome common state management issues and adds several must-have capabilities for infrastructure management.