General

YAML Tutorial : A Complete Language Guide with Examples

What is YAML

What is YAML?

YAML is one of the most popular data serialization languages. Its popularity stems from its simplicity, as well as the fact that it is human-readable and simple to understand.

In addition to being a powerful format for writing configuration files, it finds its uses in data persistence, internet messaging, cross-language data sharing, and many more places.

YAML is a recursive acronym that stands for YAML Ain’t Markup Language. It is designed with flexibility and accessibility in mind, so it works with all modern programming languages and is widely used for cross-data sharing.

YAML files either have the extension .yaml or .yml 

All of these factors contribute to YAML’s popularity as a configuration language in the DevOps domain, where it is widely used with well-known tools such as Kubernetes, Ansible, Terraform, and many others.

What Does A Regular YAML File Look Like?

---
# A sample yaml file
company: spacelift
domain:
 - devops
 - devsecops
tutorial:
  - yaml:
      name: "YAML Ain't Markup Language"
      type: awesome
      born: 2001
  - json:
      name: JavaScript Object Notation
      type: great
      born: 2001
  - xml:
      name: Extensible Markup Language
      type: good
      born: 1996
author: omkarbirade
published: true

Basic YAML Syntax

A YAML format primarily uses 3 node types:

  1. Maps/Dictionaries (YAML calls it mapping):
    The content of a mapping node is an unordered set of key/value node pairs, with the restriction that each of the keys is unique. YAML places no further restrictions on the nodes.
  2. Arrays/Lists (YAML calls them sequences):
    The content of a sequence node is an ordered series of zero or more nodes. In particular, a sequence may contain the same node more than once. It could even contain itself.
  3. Literals (Strings, numbers, boolean, etc.):
    The content of a scalar node is an opaque datum that can be presented as a series of zero or more Unicode characters.

Let us try and identify where these appear in the sample YAML file we saw earlier.

---
# key: value [mapping]
company: spacelift
# key: value is an array [sequence]
domain:
 - devops
 - devsecops
tutorial:
  - yaml:
      name: "YAML Ain't Markup Language" #string [literal]
      type: awesome #string [literal]
      born: 2001 #number [literal]
  - json:
      name: JavaScript Object Notation #string [literal]
      type: great #string [literal]
      born: 2001 #number [literal]
  - xml:
      name: Extensible Markup Language #string [literal]
      type: good #string [literal]
      born: 1996 #number [literal]
author: omkarbirade
published: true

Indentation

A YAML file relies on whitespace and indentation to indicate nesting. Notice the hierarchy and nesting is visible through a Python-like indentation style. It is critical to note that tab characters cannot be used for indentation in YAML files; only spaces can be used. The number of spaces used for indentation doesn’t matter as long as they are consistent.

tutorial:  #nesting level 1
  - yaml:  #nesting level 2 (2 spaces used for indentation)
      name: "YAML Ain't Markup Language" #string [literal] #nesting level 3 (4 spaces used for indentation)
      type: awesome #string [literal]
      born: 2001 #number [literal]

Mapping

Mappings are used to associate key/value pairs that are unordered. Maps can be nested by increasing the indentation, or new maps can be created at the same level by resolving the previous one.

name: "YAML Ain't Markup Language" #mapping
type: awesome
born: 2001

Sequences

Sequences in YAML are represented by using the hyphen (-) and space. They are ordered and can be embedded inside a map using indentation.

languages:
#Sequence 
  - YAML
  - JAVA
  - XML
  - Python
  - C

Tip: Remember that the order matters with sequences but not with mappings.

Literals — Strings

The string literals do not require to be quoted. It is only important to quote them when they contain a value that can be mistaken as a special character.

Here is an example where the string has to be quoted as & is a special character.

message1: YAML & JSON # breaks as a & is a special character
message2: "YAML & JSON" # Works as the string is quoted
  1. Folding Strings
    Strings can also be written in blocks and be interpreted without the new line characters using the fold operator (greater than).
message: >
 even though
 it looks like
 this is a multiline message,
 it is actually not

The above YAML snippet is interpreted as below.

message: "even though it looks like this is a multiline message,it is actually not"
  1. Block strings
    Strings can be interpreted as blocks using the block (pipe) character.
message: |
 this is
 a real multiline
 message

This is interpreted with the new lines (\n) as below.

message: this is
 a real multiline
 message
  1. Chomp characters
    Multiline strings may end with whitespaces. Preserve chomp(+) and strip chomp operators can be used either to preserve or strip the whitespaces. They can be used with block and pipe characters.
  • Preserving new line character
message: >+
 This block line
 Will be interpreted as a single
 line with a newline character at the 
 end

The above snippet is interpreted as below in JSON

{
  "message": "This block line Will be interpreted as a single line with a newline character at the  end\n"
}
  • Stripping new line character
message: >-
 This block line
 Will be interpreted as a single
 line without the newline character at the
 end

The above snippet is interpreted as below in JSON.

{
  "message": "This block line Will be interpreted as a single line without the newline character at the end"
}

Comments 

YAML file also supports comments, unlike JSON. A comment starts with #.

---
# Comments inside a YAML file can be added followed by the '#' character
company: spacelift

Advanced YAML Syntax

Documents

The above YAML snippet that we saw is called a document. A single YAML file can have more than one document. Each document can be interpreted as a separate YAML file which means multiple documents can contain the same/duplicate keys which are not allowed in the same document.

The beginning of a document is denoted by three hyphens —.

A YAML file with multiple documents would look like this, where each new document is indicated by ---.

---
# document 1
codename: YAML
name: YAML ain't markup language
release: 2001
---
# document 2
uses:
 - configuration language
 - data persistence
 - internet messaging
 - cross-language data sharing
---
# document 3
company: spacelift
domain:
 - devops
 - devsecops
tutorial:
   - name: yaml
   - type: awesome
   - rank: 1
   - born: 2001
author: omkarbirade
published: true
...

Finally, triple dots are used to end a document without starting a new one ...

Before we learn more about YAML, this is a good time to practice writing your own YAML file. They can be validated here.

Now that we have seen an online YAML parser in action, it’s time we learn about schemas and tags.

Schemas and Tags

Let’s take a moment to consider how YAML will interpret the given document. Is the sequence’s first literal a string or a boolean?

literals:
 - true
 - random

You are correct if you answer that the first item on the list is a boolean, and you are also correct if you answer that it is a string. The way it is resolved is determined by the YAML schema that the parser has implemented. But what exactly are schemas?

Schemas can be thought of as the way a parser resolves or understands nodes (values) present in a YAML file. There are primarily 3 default schemas:

  1. FailSafe Schema: It only understands maps, sequences and strings and is guaranteed to work for any YAML file.
  2. JSON schema: It understands all types supported within JSON including boolean, null, int and float as well as the ones in the FailSafe schema.
  3. Core schema: It is an extension of the JSON schema, making it more human-readable supporting the same types but in multiple forms.
    For e.g: 1. null | Null | NULL will all be resolved to the same type null and true | True | TRUE will all be resolved to the same boolean value.

Note: It is also possible to create your own custom schemas based on the above default schema.

So coming back to the original question, if the parser supports only the basic schema (FailSafe Schema), the first item will be evaluated as a string. Otherwise, it will be evaluated as a boolean.

Read more about YAML schemas here.

This leads to the next question: What if we explicitly want a value to be parsed in a specific way?

Let’s say from the same example that we want the first true value to be parsed as a string instead of a boolean, even when the parser uses the JSON or the core schema.

This is where tags come into the picture. Tags can be thought of as types in YAML. 

Even though we explicitly didn’t mention the tags/types in any of the YAML snippets we saw so far, they are inferred automatically by the YAML parser. For instance, the maps have the tag/type as tag:yaml.org,2002:map, sequences are tag:yaml.org,2002:seq and strings are tag:yaml.org,2002:str 

The below snippet works perfectly fine, even when we specify the tags. It can be validated here.

---
# A sample yaml file
company: !!str spacelift
domain:
 - !!str devops
 - !!str devsecops
tutorial:
   - name: !!str yaml
   - type: !!str awesome
   - rank: !!int 1
   - born: !!int 2001
author: !!str omkarbirade
published: !!bool true

We can use these tags to explicitly specify a type. For our example, all we have to do is specify the type as a string, and the YAML parser will parse it as a string.

scalars:
 - !!str true
 - random

Anchors and Alias

With a lot of configuration, configuration files can become quite large.

In YAML files, anchors (&) and aliases (*) are used to avoid duplication. When writing large configurations in YAML, it is common for a specific configuration to be repeated. For example, the vars config is repeated for all three services in the following YAML snippet.

---
vars:
   service1:
       config:
           env: prod
           retries: 3
           version: 4.8
   service2:
       config:
           env: prod
           retries: 3
           version: 4.8
   service3:
       config:
           env: prod
           retries: 3
           version: 4.8
...

As more and more things are repeated for large configuration files, this becomes tedious.

Anchors and aliases allow us to rewrite the same snippet without having to repeat any configuration.

Anchors (&) are used to define a chunk of configuration, and aliases are used to refer to that chunk at a different part of the configuration.

---
vars:
   service1:
       config: &service_config
           env: prod
           retries: 3
           version: 4.8
   service2:
       config: *service_config
   service3:
       config: *service_config
...

Anchors and aliases here helped us cut down the repeated configuration.

But practically, configurations won’t be completely identical they would vary here and there. For instance, what if all the above services are running on different versions? Does this mean we have re-write and repeat the whole config again?

This is where overrides (<<:) come to the rescue. We can still use aliases and make the changes that we need.

---
vars:
   service1:
       config: &service_config
           env: prod
           retries: 3
           version: 4.8
   service2:
       config:
           <<: *service_config
           version: 5
   service3:
       config:
           <<: *service_config
           version: 4.2
...

YAML files treat : , { , } , [ , ] , , , & , * , # , ? , | , -- , < , > , = , ! , % , @ , \, etc, as special characters. But what if these special characters are actually a part of the data/value? How do we escape them?

Special characters can be escaped in various different ways:

Entity Escapes

  • space: &#x20;
  • colon: &#58;
  • ampersand: &amp;

Unicode Escapes

  • space: "\u0020"
  • single-quote: "\u0027"
  • double quote: "\u0022"

Quoted Escapes

  1. Double quote in a single quote: ‘YAML is the “best” configuration language’
  2. Single quote in a double quote: “ Yes, the ‘best’ “

YAML vs JSON

How is YAML different from JSON? Let’s try to figure it out.

Check out the below code snippet of Kubernetes configuration written in JSON. Don’t pay attention to what it does just observe the file.

{
 "description": "APIService represents a server for a particular GroupVersion. Name must be \"version.group\".",
 "properties": {
   "apiVersion": {
     "description": "APIVersion defines the versioned schema of this representation of an object. Servers should convert recognized schemas to the latest internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources",
     "type": [
       "string",
       "null"
     ]
   },
   "kind": {
     "description": "Kind is a string value representing the REST resource this object represents. Servers may infer this from the endpoint the client submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds",
     "type": [
       "string",
       "null"
     ],
     "enum": [
       "APIService"
     ]
   },
   "metadata": {
     "$ref": "https://kubernetesjsonschema.dev/master/_definitions.json#/definitions/io.k8s.apimachinery.pkg.apis.meta.v1.ObjectMeta"
   },
   "spec": {
     "$ref": "https://kubernetesjsonschema.dev/master/_definitions.json#/definitions/io.k8s.kube-aggregator.pkg.apis.apiregistration.v1beta1.APIServiceSpec",
     "description": "Spec contains information for locating and communicating with a server"
   },
   "status": {
     "$ref": "https://kubernetesjsonschema.dev/master/_definitions.json#/definitions/io.k8s.kube-aggregator.pkg.apis.apiregistration.v1beta1.APIServiceStatus",
     "description": "Status contains derived information about an API server"
   }
 },
 "type": "object",
 "x-kubernetes-group-version-kind": [
   {
     "group": "apiregistration.k8s.io",
     "kind": "APIService",
     "version": "v1beta1"
   }
 ],
 "$schema": "http://json-schema.org/schema#"
}

Doesn’t it look like a pure JSON file? Let’s see if we can validate it in our YAML parser.

It’s odd that the YAML parser didn’t report the file as invalid. Does this imply that JSON is also YAML?

YAML is, in fact, a superset of JSON. All JSON files are valid YAML files, but not the other way around.

Can we combine JSON and YAML? Is it still a valid YAML file? Let’s put this hypothesis to the test. Let us change some of the above snippet to make it look more like the YAML we are familiar with 😉

description: "APIService represents a server for a particular GroupVersion. Name must be \"version.group\"."
"properties": {
 "apiVersion": {
   "description": "APIVersion defines the versioned schema of this representation of an object. Servers should convert recognized schemas to the latest internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources",
   "type": [
     "string",
     "null"
   ]
 },
 "kind": {
   "description": "Kind is a string value representing the REST resource this object represents. Servers may infer this from the endpoint the client submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds",
   "type": [
     "string",
     "null"
   ],
   "enum": [
     "APIService"
   ]
 },
 "metadata": {
   "$ref": "https://kubernetesjsonschema.dev/master/_definitions.json#/definitions/io.k8s.apimachinery.pkg.apis.meta.v1.ObjectMeta"
 },
 "spec": {
   "$ref": "https://kubernetesjsonschema.dev/master/_definitions.json#/definitions/io.k8s.kube-aggregator.pkg.apis.apiregistration.v1beta1.APIServiceSpec",
   "description": "Spec contains information for locating and communicating with a server"
 },
 "status": {
   "$ref": "https://kubernetesjsonschema.dev/master/_definitions.json#/definitions/io.k8s.kube-aggregator.pkg.apis.apiregistration.v1beta1.APIServiceStatus",
   "description": "Status contains derived information about an API server"
 }
}
"type": "object"
"x-kubernetes-group-version-kind": [
 {
   "group": "apiregistration.k8s.io",
   "kind": "APIService",
   "version": "v1beta1"
 }
]
"$schema": "http://json-schema.org/schema#"

Notice that there isn’t a root JSON wrapper {} anymore, there are just maps at the root level, but most of it is still JSON. Validate the file once more in a YAML parser. It is a valid YAML file, but when we try to validate it in a JSON parser, it says it is invalid. That’s because the file is no longer JSON, but rather YAML. This demonstrates that YAML is, in fact, the superset of JSON.

Where is YAML Used?

We learned a lot about YAML and saw that it works great as a configuration language. Let us see it in action with some of the most famous tools.

Ansible

Ansible playbooks are used to automate repeated tasks that execute actions automatically.

Playbooks are expressed in YAML format and perform any action defined in plays.

Here is a simple Ansible playbook that installs Nginx, applies the specified template to replace the existing default Nginx landing page, and finally enables TCP access on port 80.

To learn more about Ansible playbooks, see our article: Working with Ansible Playbooks – Tips & Tricks with Examples.

---
- hosts: all
  become: yes
  vars:
    page_title: Spacelift
    page_description: Spacelift is a sophisticated CI/CD platform for Terraform, CloudFormation, Pulumi, and Kubernetes.
  tasks:
    - name: Install Nginx
      apt:
        name: nginx
        state: latest

    - name: Apply Page Template
      template:
        src: files/spacelift-intro.j2
        dest: /var/www/html/index.nginx-debian.html

    - name: Allow all access to tcp port 80
      ufw:
        rule: allow
        port: '80'
        proto: tcp

Kubernetes

Kubernetes, also known as K8s, is an open-source system for automating the deployment, scaling, and management of containerized applications.

Kubernetes works based on a state model where it tries to reach the desired state from the current state in a declarative way. Kubernetes uses YAML files to define the Kubernetes object, which is applied to the cluster to create resources like pods, services, and deployments.

Here is a YAML file that describes a deployment that runs Nginx.

apiVersion: apps/v1
kind: Deployment
metadata:
 name: nginx-deployment
spec:
 selector:
   matchLabels:
     app: nginx
replicas: 2 # tells deployment to run 2 pods matching the template
template:
   metadata:
     labels:
       app: nginx
spec:
     containers:
       - name: nginx
image: nginx:1.14.2
ports:
   - containerPort: 80

Interesting Things About YAML

YAML works great as a configuration language, but it is important to be aware of certain challenges as well when using it.

The curious case of the Norway problem

Imagine listing the abbreviation of all the countries where it snows

countries:
- GB # Great britain
- IE # Ireland
- FR # France
- DE # Denmark
- NO # Norway

All looks good, right? But when you try to read this YAML file in python, we see NO being read False instead of ‘NO’

>>> from pyyaml import load
>>> load(the_configuration)
{'countries': ['GB', 'IE', 'FR', 'DE', False]}

So why does this happen?

Remember the core schema which interprets NULL | null the same way? The same schema interprets FALSE | F | NO the same way. So instead of parsing NO as a string, it parses it as a boolean. This is can be easily solved by quoting NO.

countries:
- GB # Great Britain
- IE # Ireland
- FR # France
- DE # Denmark
- 'NO' # Norway

But instead, to avoid any such kinds of surprises, we can use StrictYAML, which parses everything as a string by default.

Source here.

Key Points

Congratulations on completing the article. You are now on your way to becoming a YAML expert. Isn’t YAML fantastic?

YAML is an important language that finds its uses almost everywhere where writing configuration is required. Kubernetes, Ansible, docker-compose, and other tools are excellent examples.

Then you can use Spacelift to mix and match Terraform, Pulumi, AWS CloudFormation, Kubernetes, and Ansible Stacks and have them talk to one another. For example, you can set up Terraform Stacks to provision the required infrastructure (like an ECS/EKS cluster with all its dependencies) and then deploy the following via a Kubernetes Stack. Check it out for free by creating a trial account. Or sign up here if you want to be part of the Ansible integration beta version testing!

Hope you enjoyed reading 🙂

The most Flexible CI/CD Automation Tool

Spacelift is an alternative to using homegrown solutions on top of a generic CI. It helps overcome common state management issues and adds several must-have capabilities s for infrastructure management.

Start free trial