To embrace the principles of infrastructure as code (IaC), Terraform has emerged as a powerful tool for managing and provisioning cloud infrastructure resources across multiple platforms. By treating infrastructure as code, Terraform aligns infrastructure management with software development best practices, driving efficiency and reliability in modern cloud environments.
In this article, we will cover:
- What is Infrastructure as Code with Terraform?
- ClickOps vs. IaC
- Key Terraform components
- What is the structure of an Iac Terraform project?
- What are the benefits of using Terraform for IaC?
- What are the disadvantages of using Terraform for IaC?
- Example: Using Terraform for Kubernetes deployment on Azure
- Enhancing your IaC workflow with Spacelift
IaC is the process that enables engineers to define, configure, and manage Infrastructure through code. This approach treats underlying infrastructure components, such as networks, virtual machines, storage, databases, and others, in the same way developers treat their application code.
When it comes to managing cloud resources, Terraform is the de facto standard. An IaC tool that allows engineers to define their infrastructure resources as code, it uses a declarative programming language called Hashicorp Configuration Language (HCL). It is provider-agnostic, meaning that it supports many cloud providers (AWS, Azure, GCP, OCI, and many more), and the language in which you write the code doesn’t change from one provider to another.
By using Terraform, you are not limited to cloud providers. You can write Terraform automations for Kubernetes, Helm, Artifactory, Aviatrix, and Spacelift, to name a few. You can even define your own provider. As long as your product exposes an API, a Terraform provider can be built on top of it.
Read more: Terraform Use Cases for Your Infrastructure as Code
ClickOps is the term used to describe the manual management of IT infrastructure from the UI, by clicking into the portal to achieve a desired behavior (create/edit/delete resources). This process might be enough for a small architecture that needs to be deployed, but starting like this raises some big issues: You cannot easily scale and replicate your configuration.
Imagine you are creating an EC2 instance inside AWS. Using ClickOps, you will create it much faster than you would normally through IaC. But what happens if you need to create ten EC2 instances? What about 100 EC2 instances?
Let’s suppose an engineer takes approximately two minutes to create an EC2 through the portal. For 100 instances, this will take a little over three hours, and maybe that is acceptable for some. These 100 instances will also need to reside in a network. They will require security groups, maybe some EBS storage, and other items that will also take considerable time to configure.
Doing this manually is very error-prone because attention spans cannot manage the large number of things that have to be done. By using Terraform, you can easily define all of these components as code, validate the code, plan what will happen, and ultimately deploy all resources in one go. Apart from that, you can easily scale and replicate your configuration without spending too much time.
Terraform is stateful, tracking the state of the infrastructure by comparing the current defined IaC configuration and a state file it generates after a Terraform run. This introduces a layer of complexity because you need to manage this state file.
If organizations use a combination of IaC and ClickOps, they will introduce drift and sometimes even break their infrastructure resources. So if you are getting into IaC, forget about ClickOps.
Providers
In essence, Terraform providers function as plugins that enable Terraform to communicate with specific infrastructure resources. These providers serve as a bridge between Terraform and the underlying infrastructure, converting Terraform configurations into relevant API calls and permitting Terraform to handle resources across numerous environments.
Example provider:
provider "aws" {
region = "us-east-1"
}
Take a look at Terraform Providers Overview.
Resources
In Terraform, resources represent the infrastructure elements that can be managed, such as virtual machines, virtual networks, DNS entries, pods, and more.
Each resource is identified by a specific type, like “aws_instance”, or “kubernetes_pod,” and possesses a range of configurable attributes, such as instance size or type. They are the building blocks of Terraform, and every configuration that will create a piece of the infrastructure has to contain a resource.
Example resource:
resource "azurerm_resource_group" "this" {
name = "rg1"
location = "West Europe"
}
Variables
Terraform variables are similar to variables in any other programming language. They are used to organize your code better, make it easier to change its behavior, and make your configuration reusable. In Terraform, variables can have the following types: string, number, bool, list, set, map, object, and null.
You cannot use variables to hold expression values built from resources or data sources. For that, you should use a local variable.
Example Variable:
variable "vpc_name" {
description = "name of the vpc"
type = string
default = "vpc1"
}
Outputs
An output serves as a convenient method for displaying the value of a particular data source, resource, local, or variable once Terraform has completed implementing infrastructure modifications. They can also be used to expose attributes from a Terraform module.
Example Output:
output "vpc_name" {
value = var.vpc_name
}
DataSources
A data source is a Terraform object that retrieves data from an external source and can be used in resources as arguments when they are created or updated, or you can use them in locals to manipulate the data you receive.
Example Datasource:
data "aws_ami" "ubuntu" {
most_recent = true
filter {
name = "name"
values = ["ubuntu*"]
}
}
Locals
In Terraform, a local variable assigns a name to an expression and makes it easy for the user to utilize it. Typically, you don’t want to write a complex expression multiple times throughout your code in any programming language. You usually define a variable that holds it or a function that implements it.
Local variables work similarly, and they can be used in conjunction with resource and datasource attributes to build complex expressions.
Example Local:
locals {
a_list = [for i in range(10) : i]
}
Provisioners
Terraform provisioners exist inside a resource and are used to either run a local command or a remote command or to copy a file from your local environment to a remote virtual machine.
They are considered a last-resort option because they are not part of Terraform’s declarative model. You should typically use cloud-init to run different scripts on your vm during the bootstrap phase, or if you can, use a configuration management tool like Ansible for this.
Example Provisioner:
resource "null_resource" "this" {
provisioner "local-exec" {
command = "ls -l"
}
}
Learn more about Terraform Provisioners and why you should avoid them.
Note: New versions of Terraform will be placed under the BUSL license, but everything created before version 1.5.x stays open-source. OpenTofu is an open-source version of Terraform that will expand on Terraform’s existing concepts and offerings. It is a viable alternative to HashiCorp’s Terraform, being forked from Terraform version 1.5.6.
A basic structure of a Terraform Infrastructure as Code project, contains the following files:
.
├── main.tf
├── variables.tf
├── outputs.tf
- In main.tf typically contains the core resource declarations and providers. This is where you would define your EC2 instance configurations and how to authenticate to the AWS provider.
- The variables.tf file defines the input variables that allow engineers to customize the project. They are referenced in the main.tf and can easily change the behavior of a particular resource, making it easy to reuse the automations.
- In the outputs.tf file, engineers define the outputs that are returned to the console after a terraform apply. As a best practice, they shouldn’t return all the data from a resource, only the important fields that are of interest.
Documentation is really important, so having a README.md file inside your Terraform repository that explains how to use the automation (including descriptions of variables and outputs) really helps in understanding what has been implemented. You can leverage tfdocs to easily generate the descriptions of variables and outputs.
This is just a basic structure, but it can be customized depending on the complexity of the automation and, of course, the requirements of the organizations.
The core Terraform workflow is designed to provide a consistent and repeatable process for managing your IaC. This workflow contains the following steps:
- Write — Author infrastructure as code by writing Terraform configuration files that define the desired state of your infrastructure resources across various cloud providers or on-premises environments.
- Init — Initialize the working directory by downloading the necessary provider plugins specified in the configuration files. This is done using the
terraform init
command. - Plan — Preview the changes that Terraform will make to your infrastructure based on the current configuration files and the real infrastructure state stored in Terraform’s state file. The terraform plan command shows you a preview of the changes without actually applying them.
- Apply — Execute the plan and provision or modify the infrastructure resources according to the configuration files. The terraform apply command applies the changes to your infrastructure after prompting you to confirm the plan.
The benefits of managing Infrastructure as Code with Terraform include:
1. Consistency
All the resources presented above are the building blocks for creating Terraform automations. The declarative syntax that Terraform offers is easy to understand and use and enables engineers to get up to speed quickly with these concepts.
2. Efficiency
Terraform’s init/plan/apply workflow helps prevent unintended changes to your infrastructure by previewing changes before they are actually made. The initialization part also ensures the required providers are downloaded, so you don’t need to push them to your VCS repository to be able to take advantage of them.
3. Community
Nowadays, more and more people are using Terraform. The community is large and very active, and many resources are available, like documentation and tutorials, that can be easily leveraged.
4. Repeatability
The code you build with Terraform can be packaged as a module and shared across your organization easily. There are many other features available that keep your code DRY, such as for_each, count, functions, ternary operators, loops, and dynamic blocks.
5. Support for multiple cloud providers
Because Terraform is cloud agnostic, you can use it to build multi-cloud automations without using multiple tools together
While Terraform offers significant advantages for IaC, acknowledging these limitations is also important:
- Learning curve — Terraform has a steep learning curve, especially for beginners or those new to IaC concepts. Understanding the declarative syntax, writing modules, and managing state files can be challenging initially.
- State file management — Terraform’s state file is crucial for tracking infrastructure state. However, managing it in a team setting or with remote backends requires careful consideration to avoid conflicts and potential data loss, which can lead to unintended changes or resource recreation.
- Scaling challenges — Managing multiple environments and scaling can lead to complications in more complex infrastructures. As you manage more resources in Terraform, issues with one resource can potentially affect many others.
Let’s now see a practical example of configuring IaC with Terraform.
The example code can be found here.
We are using two resources:
- azurerm_resource_group → used to create resource groups inside of Azure
- azurerm_kubernetes_cluster → used to create the Kubernetes cluster inside of Azure
For this example, the cluster management will always be free, you will need to pay only for the underlying nodes of the cluster.
On both resources, we are using for_each to create the number of resources of that particular type we want.
resource "azurerm_resource_group" "this" {
for_each = var.resource_groups
name = each.key
location = each.value.location
}
resource "azurerm_kubernetes_cluster" "this" {
for_each = var.kube_params
name = each.key
location = azurerm_resource_group.this[each.value.rg_name].location
resource_group_name = azurerm_resource_group.this[each.value.rg_name].name
…
}
If you look at the above resources, you will see how the link between them is created on the Kubernetes one, specifically at the location and resource_group_name parameters. This way, we ensure the resource group is created first, and we access its location and name attributed inside of the Kubernetes Cluster.
Thus, the cluster and the resource group in which it will be created will reside in the same location.
To declare the variables, we are using map(object) types to take advantage of the full capabilities of the for_each, and we are also ensuring optional values to make the code easier to use.
variable "kube_params" {
description = "AKS parameters"
type = map(object({
rg_name = string
dns_prefix = string
np_name = string
tags = optional(map(string), {})
vm_size = optional(string, "Standard_B2s")
client_id = optional(string, null)
client_secret = optional(string, null)
enable_auto_scaling = optional(bool, false)
max_count = optional(number, 1)
….
To keep it simple, we are providing the values of these variables in the default block, but you can use terraform.tfvars or a *.auto.tfvars file, or environment variables to pass these values.
You can even use a local variable to specify the values, or if you want to take it to the next level, you can use a local variable that reads the contents of a YAML file with values by using the file and yamldecode functions.
Right now, with the default values, this looks like this:
default = {
rg1 = {
location = "westus"
}
}
default = {
aks1 = {
rg_name = "rg1"
dns_prefix = "kube"
np_name = "np1"
}
}
For this configuration, we have declared two outputs: one will show a map containing name and location pairs for the resource groups, and the other will have some details related to the Kubernetes cluster in the following format name => id, fqdn.
output "resource_groups" {
description = "Resource Group Outputs"
value = { for rg in azurerm_resource_group.this : rg.name => rg.location }
}
output "aks" {
description = "AKS Outputs"
value = { for kube in azurerm_kubernetes_cluster.this : kube.name => { "id" : kube.id, "fqdn" : kube.fqdn } }
}
To run this code, we have initialized the working directory using terraform init
.
Initializing the backend...
Initializing provider plugins...
- Finding latest version of hashicorp/azurerm...
- Installing hashicorp/azurerm v3.49.0...
- Installed hashicorp/azurerm v3.49.0 (signed by HashiCorp)
Terraform has created a lock file .terraform.lock.hcl to record the provider
selections it made above. Include this file in your version control repository
so that Terraform can guarantee to make the same selections by default when
you run "terraform init" in the future.
Terraform has been successfully initialized!
You may now begin working with Terraform. Try running "terraform plan" to see
any changes that are required for your infrastructure. All Terraform commands
should now work.
If you ever set or change modules or backend configuration for Terraform,
rerun this command to reinitialize your working directory. If you forget, other
commands will detect it and remind you to do so if necessary.
After that, we can apply the code using the terraform apply
command.
azurerm_resource_group.this["rg1"]: Creating...
azurerm_resource_group.this["rg1"]: Creation complete after 3s [id=/subscriptions/subid/resourceGroups/rg1]
azurerm_kubernetes_cluster.this["aks1"]: Creating...
azurerm_kubernetes_cluster.this["aks1"]: Still creating... [10s elapsed]
azurerm_kubernetes_cluster.this["aks1"]: Still creating... [20s elapsed]
azurerm_kubernetes_cluster.this["aks1"]: Still creating... [30s elapsed]
…
azurerm_kubernetes_cluster.this["aks1"]: Still creating... [3m50s elapsed]
azurerm_kubernetes_cluster.this["aks1"]: Creation complete after 3m52s [id=/subscriptions/subid/resourceGroups/rg1/providers/Microsoft.ContainerService/managedClusters/aks1]
Apply complete! Resources: 2 added, 0 changed, 0 destroyed.
Outputs:
aks = {
"aks1" = {
"fqdn" = "kube-pocuwyy7.hcp.westus.azmk8s.io"
"id" = "/subscriptions/subid/resourceGroups/rg1/providers/Microsoft.ContainerService/managedClusters/aks1"
}
}
resource_groups = {
"rg1" = "westus"
}
You can see the cluster now in the Azure Portal:
Terraform is really powerful, but to achieve an end-to-end secure Gitops approach, you need to use a product that can run your Terraform workflows.
Enter Spacelift. Not only Spacelift takes care of your Terraform workflows, but it can also help build workflows for Kubernetes, Pulumi, and CloudFormation. Spacelift is GitOps Native, and by also taking advantage of stack dependencies, you can build really sophisticated workflows.
Your workflows will most likely require policies to ensure the necessary guardrails for your infrastructure. Apart from that, taking advantage of notifications when something goes wrong is really important, so taking advantage of Spacelift’s built-in features will totally help.
Taking advantage of integrations with major cloud providers avoids using static credentials, which can be easily replicated if you are not careful with them.
Integrating security tools in your workflows can be done easily by using Custom Inputs. With this feature, not only are you integrating the tools, but you can easily run policies on it to ensure engineers are not introducing vulnerabilities with their code.
If Terraform modules make your code DRY, check out Spacelift’s Blueprints feature, which really takes reusability to the next level.
Let’s reuse the above example and create a stack for it in Spacelift. We will also apply a policy that ensures that people are not changing the size of the VM. I would suggest creating your own repository that holds the above code to integrate everything.
First, go to stacks and select create a new stack. Add a name for the stack, and optionally you can add labels and a description.
After that click continue, and on the Integrate VCS tab, select the repository, and you can leave everything else as a default.
In the configure backend tab, you can select the backend (in our case, it will be Terraform), the Terraform version, whether or not Spacelift manages your state, and if you want smart sanitization enabled.
Select continue and in the define behavior tab, let’s leave everything as a default.
Now your stack has been created. You need to handle the authentication to Azure before starting to run the stack per se. There are multiple ways in which this can be done, and Spacelift’s documentation is thoroughly explaining this.
After you are done with this, you can start running your code. By triggering a run, in the end, you are going to see an output of terraform plan. If you want to create the resources inside this plan, you will need to confirm it. Otherwise, you can easily discard it and make other changes to your code.
After confirming the run, an apply job gets triggered and this is its output.
As you can see, the subscription id is directly masked inside the output, which also makes for easier demonstrations without having any fear that you will be leaking sensitive information.
The workflow can be easily extended each step of the way by either adding commands before and after phases, changing the runner image, integrating security tools, adding policies, and others.
Infrastructure as Code has become a standard nowadays in the DevOps and Cloud Engineering world. It makes it easier to create, update and replicate your infrastructure while mitigating human errors. Terraform is very powerful, and the community around it is really big.
Choosing Terraform as your Infrastructure as Code tool and enhancing your workflow with Spacelift to achieve GitOps reduces your time to market, makes the experience more secure, and helps you easily integrate with third-party products.
Manage Terraform Better with Spacelift
Build more complex workflows based on Terraform using policy as code, programmatic configuration, context sharing, drift detection, resource visualization and many more.