Infrastructure-as-code (IaC) is a very important concept to understand in the DevOps world today. It has become almost ubiquitous across the industry and is absolutely key to modern engineering roles.
In this post, I will explain what IaC is, what its benefits are, some of the tooling options available, and point out some best practices along the way. I will then look at some code examples for each tool to compare and contrast.
You will learn:
Put in the simplest terms, using code, you define the infrastructure that needs to be deployed in a descriptive model. Similar to code for applications, the code for the infrastructure becomes part of the project and is stored inside your version control system (or VCS).
For example, consider that you have developed a web application. That web application needs to be hosted on infrastructure somewhere to be consumed. Using IaC you could define things like where the infrastructure is deployed to, such as a public cloud provider like Microsoft Azure, Amazon AWS, or Google Cloud, and what type of service your web application will run on, such as an Azure web app, or an AWS S3 Storage account. Further to this, you can then define the settings required for the web app, which might include things like how much server compute power is required (in terms of CPU and memory), how the networking is secured, and how the domain name for your app will be exposed, to name a few among many other considerations.
IaC solves many common problems with provisioning Infrastructure.
- New environments or infrastructure can be provisioned easily from your IaC configuration code. Infrastructure deployments with IaC are repeatable.
- Manually configured environments are difficult to scale. With environments provisioned using IaC, they can be deployed and scaled rapidly.
- If you want to make changes to the existing infrastructure that has been deployed with IaC, this can be done in code, and the changes will be tracked.
- When IaC is used with a declarative tool (it describes the state you want your environment to look like), you can detect and correct environment drift. If a part of the infrastructure is modified manually outside of the code, it can be brought back in line with the desired state on the next run.
- Changes can be applied multiple times without changing the result beyond the initial application. This is known as idempotence.
- Avoid manual configuration of environments which can typically introduce mistakes due to human error. With IaC, these can be avoided.
- IaC is a means to achieve consistency across environments and infrastructure. The code can be reused.
- Infrastructure costs are lowered as the time to deploy and effort to manage, administer and maintain environments decrease.
- IaC can be used in Continuous Integration / Continuous Deployment (or CI/CD) pipelines. The main benefit of doing this is to automate your Infrastructure deployments.
- DevOps teams can test applications in production-like environments early in the development cycle.
- With your Infrastructure configuration code held in your version control system alongside your application source code, commonly in the same repository. Now everything can be held together.
- Productivity will increase due to a combination of all the benefits of using IaC.
- As the code is held in your version control system, it gains all the benefits of the VCS. More on that in the next section.
The adoption of IaC is not without its challenges.
Typically, traditional infrastructure or operations teams within organizations may not be familiar with version control systems, use of Git, or be comfortable using tools for code editing such as visual studio code.
Further to this, there is certainly a learning curve when it comes to the adoption of new technology within an organization. Training will be required and time will be needed to develop the appropriate skills.
Skills in IaC and DevOps are currently highly sought after in the industry and so therefore it may be difficult to hire staff with these skills.
The journey to managing your environments using IaC will typically start with a small deployment of new resources to your chosen cloud platform, and from there once adoption within the organization grows, more of your infrastructure can start to be deployed and managed in code. Eventually, when your organization is mature and comfortable with the principles and operation of your chosen IaC systems and tooling, existing resources can be brought under IaC control.
Without diving too deeply into the benefits of using a VCS in this article, we will summarise them here as storing your IaC in a VCS automatically gives you a raft of additional benefits.
In summary, a VCS enables developers and organizations to work more efficiently in improving the quality of their product while recording and evaluating a detailed history of their improvements that will lead to a successful end product. In short, using a VCS enables governance, versioning, and increases collaboration.
1. Efficiency
As configuration files are amended incrementally with changes, testing becomes easier as rollback to previous versions is possible. New features can be built up over time.
2. Tracking & Versioning
Track who made the change, and in which version of the file. Comments can be added to each code change (or commit).
3. Collaboration
Multiple people can work on the configuration files at the same time using branching. Changes can be merged together using pull requests.
4. Governance & Compliance
Tracking changes automatically gives you a powerful audit trail, enabling risk management.
5. Management
A management overview becomes possible through being aware of the author of the configuration, how long it took to make changes, the timeline and its impact.
6. Reduces Duplication
Multiple and out-of-date configuration files are reduced.
7. Backup
Users will typically clone the repository their configuration code is held in and work on it locally. The code then exists locally and in the VCS. The VCS itself is also backed up.
There are 2 approaches to the templates used when writing IaC configuration files. Different tools use different approaches, which are listed in the next section.
Declarative — you define the desired state of the final solution.
The tool or automation platform determines how the goal is reached, the step-by-step executions are handled by the tool and hidden from the user. Declarative tools are the most popular and most dominant in the IaC space. They are most useful when changes or updates need to be made to your solution.
Declarative tools are idempotent because you are defining the required state of the solution. Idempotency refers to a process that can be executed multiple times with the same result.
Imperative — you define the steps to execute in order to reach the desired solution.
An imperative approach allows you to build up multiple layers of commands to reach the end goal. Imperative tools give you more control over how the goal is reached. These tools are most useful when you need to deploy and not update or change the solution in the future.
Imperative approaches may not lead to idempotency, the end goal may be different depending on the starting point, as a series of steps form the process. Consider a process with 10 steps, for example, starting from step 1 would lead to a different result compared to starting with step 6.
So you’ve decided to pursue IaC due to the myriad of benefits it enables. Now you need to decide which tool to use! This will depend on a number of factors, including your engineering capability and current use of cloud platforms.
You should also consider the use of a Continuous Integration / Continuous Delivery (CI/CD) platform and the tools that it supports before deciding. The most flexible platforms, such as Spacelift, support multiple tools, including Terraform, Ansible, Pulumi, and CloudFormation. For example, with Spacelift, you can set up Terraform Stacks to provision required infrastructure (like an ECS/EKS cluster with all its dependencies) and then connect that to a CloudFormation Stack which then transactionally deploys your services there using trigger policies and the Spacelift provider run resources for workflow orchestration and Contexts to export Terraform outputs as CloudFormation input parameters.
First, we will outline three cloud-agnostic options, Terraform, OpenTofu and Pulumi, before moving on to the Microsoft Azure cloud-specific ARM templates and Bicep. Lastly, we will briefly discuss other options for AWS, CloudFormation, and CDK, as well as other tools such as Ansible, Chef, Salt, and Puppet.
Terraform
Terraform takes the declarative approach. It uses its own language called Hashicorp Configuration Language (HCL), which is considered by most as very easy to pick up and is very human-readable. HCL is based on Go.
Terraform can be used with almost any cloud provider, as well as on-premise infrastructure through its provider mechanism, of which there are thousands available. It is for this reason that many choose Terraform over other options. If you want to use the same language and formatting to configure infrastructure across multiple cloud platforms and infrastructure, it is a natural choice.
In my opinion, even if you only want to use it for one cloud platform (e.g., Microsoft Azure), it is still the best option, due mainly to the maturity of the language and its supporting ecosystem.
Terraform maintains a state file. The state file describes the existing state of the infrastructure and allows Terraform to query, build, maintain, and change the infrastructure as defined in your configuration files. Maintaining the state file introduces challenges around its security, but many solutions exist to the problem.
Dependency mapping between resources is done in the background automatically by Terraform and is largely hidden from the user, but can be controlled if required. Dependency mapping refers to the order resources are created in. Consider the creation of a virtual machine, first, you must have a VNET, subnet, disk, and key vault for disk encryption in place before it can be created successfully. Terraform makes sure resources are created in the correct order. If these need to be manually manipulated, the depends_on
met-argument can be utilized in the configuration to achieve this.
Terraform has a raft of great features, including allowing you to deploy to different environments using workspaces, and the ability to run plans to show what will change without making any alterations to name a couple.
Terraform does have a slight delay in terms of the features it supports. For example, when a new feature or service is released in Azure, this capability must be added to the azurerm
provider by the team that maintains it before the feature will be able to be referenced using Terraform. However, the delay is usually very small and almost all features that are available to a user in the Azure portal and available through Terraform configuration language. If your team lives on the cutting edge and utilizes lots of preview services, this might be an issue. This is not always the case for all providers, as when it comes to the aws
Terraform provider, some new features have been implemented before the AWS native CloudFormation!
Learn more about how you can automate your infrastructure provisioning with Terraform.
OpenTofu
OpenTofu is an open-source version of Terraform that will expand on Terraform’s existing concepts and offerings. It is a viable alternative to HashiCorp’s Terraform, being forked from Terraform version 1.5.6. OpenTofu retained all the features and functionalities that had made Terraform popular among developers, while also introducing improvements and enhancements. The project is part of the Linux Foundation, with the ultimate goal of joining the Cloud Native Computing Foundation (CNCF).
There are no differences between Terraform (versions prior to 1.5.6) and OpenTofu, but this will change as new versions emerge. Initially, it works exactly the same as Terraform, with OpenTofu being a drop-in replacement for it. OpenTofu is not going to have its own providers and modules, but it is going to use its own registry for them.
The community dictates the direction of OpenTofu, which will offer greater flexibility in the development of new features, considering what is important for the users. There are a couple of interesting feature requests already in the repository, and other features are planned, such as support for OCI provider registries and state encryption.
OpenTofu works with your existing Terraform state file so you won’t have any issues when you are migrating to it.
Contributing to OpenTofu can be easily done by checking the Contribution Guide. The easiest way you can contribute is by opening an issue, and every major change will be done through an RFC.
OpenTofu is the future of the Terraform ecosystem and having a truly open-source project to support all your IaC needs is the main priority.
Pulumi
Pulumi is another declarative IaC tool. Instead of having its own language, it allows teams to use an existing programming language, something which can be highly desirable if your team already has strong preferences or skills. Pulumi supports TypeScript, JavaScript, Python, Go, and C#.
Like Terraform, Pulumi also supports multiple clouds, infrastructure, and providers.
Note that any existing Terraform or ARM configuration files you have can be converted into Pulumi files.
Pulumi is harder for coding novices to get to grips with compared to Terraform or Bicep. If no prior coding skills exist within your team, it will take time to learn.
As Pulumi configuration files are coded with the programming language of your choice, this means you can use the same testing tools as you use for your main code, another major advantage.
Microsoft Azure ARM Templates
ARM templates are the native templating option for all resources in Azure. ARM templates are written in JSON format.
Getting started using ARM templates is easy as ARM templates can be downloaded for any existing resource straight from the Azure Portal and can be modified as needed. These templates can allow you to redeploy resources easily. Because ARM templates are the native option in Azure, they are well documented and supported by Microsoft. They can also be used for any resource in Azure from the day of release, an advantage over Terraform.
However, when writing ARM templates from scratch there is a lot of ‘boilerplate’ code that is required, something that tools like Terraform and Bicep extract away from the user, making the code easier to both read and write. JSON language is picky about formatting and can be a chore to learn. Because of the additional code required for ARM templates compared to other options, the configuration files can get very large, complex, and hard to follow, and any syntax errors can become tricky to troubleshoot. There is no concept of ‘state’ when using ARM templates, so any changes that are applied can be breaking. In addition to these disadvantages, there is no plan, or ‘dry-run’ option available when using ARM templates, making it difficult to confirm that what you are deploying will result in the desired outcome.
Read more about using Azure for Infrastructure as Code.
Microsoft Azure Bicep
Azure Bicep is a declarative abstraction language used to simplify .json ARM templates that are used to provision infrastructure in Azure. Bicep is a Domain-specific language (DSL), so if you want to use the same language across multiple clouds, this isn’t the option for you.
Bicep aims to take away some of the unnecessary .json boilerplating, leaving simpler, cleaner code, similar to Hashicorps HCL. You can use modules, something not really available with ARM templates, allowing you to simplify complex deployments. Bicep can query the state directly from Azure, so there is no need to manage a state file, which is a great advantage over Terraform.
Bicep is still in its relatively early days, having come to prominence in the last couple of years, and so the community around it is not as large as that of Terraform. However, Bicep is fully supported by Azure Support teams meaning you can raise a ticket through the Azure portal for Bicep-related issues, similar to how you would raise a ticket for any Azure resource problems you might encounter. In theory, anything you can do with ARM templates you can do with Azure Bicep (and more) although some limitations still exist during to the ongoing development of the tool.
Bicep is part of the Azure CLI. Note that you can convert existing ARM templates into bicep files. As an example running the command below in cloud shell will convert your .json ARM template into a .bicep file:
az bicep decompile -f .\arm-example.json
Amazon Web Services (AWS)
CloudFormation is the native AWS IaC tool and takes the declarative approach. CloudFormation templates are written in JSON or YAML format. CloudFormation is a managed AWS service and will check the infrastructure it has provisioned to detect if it is maintaining the described state, no state management is required which is a benefit over Terraform. Similar to modules in Terraform, AWS uses a concept called ‘nested stacks’, which enables templates to be called from other templates. Note that interestingly it is possible to provision resources on other clouds or non-AWS resources using CloudFormation, as the custom resources feature enables this. However, this involves additional templating and is certainly not as easy as using Terraform. CloudFormation support is included in AWS support plans.
Similar to Pulumi but not cloud-agnostic, the AWS Cloud Development Kit (AWS CDK) is an open-source software development framework to define your cloud application resources using familiar programming languages. CDK would be generally considered as taking an imperative approach.
Ansible, Salt, Chef, and Puppet
These are generally considered as configuration management as code tools, but can also be used to provision and manage infrastructure.
Puppet takes the declarative approach. Chef takes the imperative approach. Ansible and Salt are mostly declarative but offer some support for imperative commands.
See a more detailed list of the most useful IaC deployment tools.
In this section we will compare the code required to create a Storage account in Azure, using Terraform, Pulumi, Bicep and ARM templates. All four examples will create the same result in Azure, a storage account with a container called ‘logs’.
Terraform
Notice how easy to read the HCL language is, making it arguably the most accessible to the beginner of the four compared options.
resource "azurerm_storage_account" "storage_acount" {
name = "storageAccountName"
resource_group_name = azurerm_resource_group.example.name
location = azurerm_resource_group.example.location
account_tier = "Standard"
account_replication_type = "LRS"
}
resource "azurerm_storage_container" "storage_account_container" {
name = "logs"
storage_account_name = azurerm_storage_account.example.name
container_access_type = "private"
}
See a more detailed guide on how to manage Infrastructure as Code (IaC) with Terraform.
Pulumi
The Pulumi example can be coded in many different programming languages, here I have shown the Python equivalent as it will be a commonly used and understood language across infrastructure teams. In Python, it actually comes in slightly shorter than the terraform equivalent but is harder to read for the beginner.
import pulumi
import pulumi_azure as azure
storage_acount = azure.storage.Account("storageAcount",
resource_group_name=azurerm_resource_group["example"]["name"],
location=azurerm_resource_group["example"]["location"],
account_tier="Standard",
account_replication_type="LRS")
storage_account_container = azure.storage.Container("storageAccountContainer",
storage_account_name=azurerm_storage_account["example"]["name"],
container_access_type="private")
Existing Terraform can be converted to Pulumi files using tf2pulumi, a great resource that lets you view the code side-by-side and convert it to Typescript, Python, Go, or C#.
ARM Templates
The ARM template example is much more lengthy and verbose, highlighting the issues with ARM templates that Bicep aims to resolve.
{
"$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",
"contentVersion": "1.0.0.0",
"metadata": {
"_generator": {
"name": "bicep",
"version": "0.4.1008.15138",
"templateHash": "8636947863337745424"
}
},
"parameters": {
"storageAccountName": {
"type": "string"
},
"containerName": {
"type": "string",
"defaultValue": "logs"
},
"location": {
"type": "string",
"defaultValue": "[resourceGroup().location]"
}
},
"functions": [],
"resources": [
{
"type": "Microsoft.Storage/storageAccounts",
"apiVersion": "2019-06-01",
"name": "[parameters('storageAccountName')]",
"location": "[parameters('location')]",
"sku": {
"name": "Standard_LRS"
},
"kind": "StorageV2",
"properties": {
"accessTier": "Hot"
}
},
{
"type": "Microsoft.Storage/storageAccounts/blobServices/containers",
"apiVersion": "2019-06-01",
"name": "[format('{0}/default/{1}', parameters('storageAccountName'), parameters('containerName'))]",
"dependsOn": [
"[resourceId('Microsoft.Storage/storageAccounts', parameters('storageAccountName'))]"
]
}
]
}
Bicep
The Bicep example is much more concise are easier to read than the ARM equivalent, although not quite as short as the Terraform.
param storageAccountName string
param containerName string = 'logs'
param location string = resourceGroup().location
resource sa 'Microsoft.Storage/storageAccounts@2019-06-01' = {
name: storageAccountName
location: location
sku: {
name: 'Standard_LRS'
}
kind: 'StorageV2'
properties: {
accessTier: 'Hot'
}
}
resource container 'Microsoft.Storage/storageAccounts/blobServices/containers@2019-06-01' = {
name: '${sa.name}/default/${containerName}'
}
The Bicep playground is a great resource that allows you to directly compare ARM templates and Bicep configuration files. Use the ‘Sample Template’ button to select a template to compare side-by-side. You will notice that Bicep files are always shorter, remove the ARM boilerplating, and as a result are easier to use.
Automating your Infrastructure using IaC can pay dividends, enabling your team to deliver more and be more agile, saving you time and money.
Selecting the correct IaC tooling for your team can be a daunting task as there are plenty of options available. Comparing the pros and cons of each, along with an evaluation of your requirements and available skill sets in your team, should give you a good starting point.
If you are just starting out with IaC, I would personally recommend looking at Terraform first as it is a highly sought-after skill in the industry today, and is the prevalent IaC tool. There are lots of learning materials available on the web and the community support is great.
Continuous Integration and Deployment for your IaC
Spacelift allows you to automate, audit, secure, and continuously deliver your infrastructure. It helps overcome common state management issues and adds several must-have features for infrastructure management.