Upgrading Our Infrastructure with OpenTofu

This is a guest author article written by Matt Velez, Site Reliability Engineer at TrueCar, a Spacelift customer.

A company like TrueCar has countless servers, databases, Lambdae, and other infrastructure to run its operations. It’s the job of our Site Reliability Engineering team to make sure all those individual parts are set up and working properly.

We first started using Spacelift in August 2022, and it’s been a great help. It also allowed us to transition away from Terraform, as in August 2023, Hashicorp changed the Terraform license to the Business Source License, which is closed source. This prompted us to consider migrating away from it.

Spacelift had already positioned OpenTofu as the alternative. Being one of the primary drivers of the fork certainly helped our decision, but staying on the same Terraform version indefinitely was unsustainable for us.

We have over 700 stacks in Spacelift, each of which had to be moved over to OpenTofu. OpenTofu’s documentation was a great help with this, describing the order for executing this migration:

Get all Terraform modules onto Terraform version 1.5.5.
Migrate to OpenTofu 1.6.2.
Migrate to the latest OpenTofu version (1.9.0 at the time).

We launched a months-long task of taking every single Terraform stack we managed through Spacelift, upgrading them all to Terraform 1.5.5 where necessary (some used versions as early as Terraform 0.11), and then moving them onto OpenTofu.

For reference, we keep our Terraform/OpenTofu code in a monorepo, which we also discussed in our case study.

There were a few challenges with this:

Most of our stacks were managed separately from the monorepo, meaning each change required two different PRs (one to the actual Terraform code and another in the monolith to point to that newer code commit). We moved everything into the monorepo to reduce incremental PRs.
We had to juggle multiple versions of Terraform and OpenTofu. A version manager was invaluable for this: I personally like mise.
We chose to run the migrations for over 700 separate stacks with a script, leveraging mise to use different executables where necessary. I’ll include a version of this below.
We had to ensure that when we opened a PR with OpenTofu 1.9.0 code, Spacelift didn’t run Terraform 1.5.5 or earlier on the stacks, thus ruining my work. Thankfully, Spacelift’s runtime configuration made this pretty easy.
Finally — and perhaps most importantly — we needed to break Terraform support so that other engineers didn’t accidentally run Terraform 1.9.0 instead of OpenTofu.

After a significant amount of time and dozens of PRs, TrueCar is now running entirely on OpenTofu 1.9.0. We haven’t used every new feature the fork offers, but one has proven to be critically valuable…

Breaking Terraform

It defeats the purpose of the migration if it’s possible for someone to accidentally run Terraform on an OpenTofu state and revert the changes. This was at the forefront of our minds as we discussed migrating our Terraform monolith. Thankfully, it ended up being one of the easier parts of the migration, all due to a new feature OpenTofu introduced in version 1.8.

Unlike Terraform, OpenTofu supports early variable/local evaluation. Once I read about this in the release notes, I immediately thought of a use case: our state file configuration.

It’s pretty common to store Terraform/OpenTofu states remotely, so that a team working concurrently doesn’t constantly overwrite their colleagues’ work. TrueCar keeps theirs encrypted in S3, for instance. Previously, our state files looked something like this:

terraform {
 backend "s3" {
   bucket     = "[Bucket Name]"
   key        = "[module name]/[environment].tfstate"
   profile    = "[AWS account]"
   region     = "[AWS region]"
   encrypt    = true
   kms_key_id = "[Key ARN]"
 }
}

For example, the key for a stack managing nginx in production would look like: nginx/prod.tfstate

We managed our states by copying this file to new ones, switching the key around. This led to the occasional conflict, if someone forgot to change a key (I’m certainly guilty of this), but thankfully, it was just a minor annoyance.

With OpenTofu, we saw a chance to simplify this while breaking Terraform support at the same time.

Because the key was the only attribute that changed between state configs, and most were simply reflective of where they lived in the monolith, we could figure that out programmatically every time OpenTofu ran for a stack. Of course, some stacks used a different format, but adding override variables could cover those edge cases.

Now, our state config looks like this (named .state.tf.base in the monolith root directory):

variable "state_repo_override" {
 type    = string
 default = null
}
variable "state_name_override" {
 type    = string
 default = null
}
locals {
 repo_index         = fileexists("../.state.tf.base") ? 0 : fileexists("../../.state.tf.base") ? 1 : fileexists("../../../.state.tf.base") ? 2 : null
 reversed_path_list = reverse(split("/", path.cwd))
 state_repo         = coalesce(var.state_repo_override, local.reversed_path_list[local.repo_index])
 state_name         = coalesce(var.state_name_override, local.reversed_path_list[0]) # Same as the current directory.
}
terraform {
 backend "s3" {
   bucket     = "[Bucket Name]"
   key        = "${local.state_repo}/${local.state_name}.tfstate"
   profile    = "[AWS account]"
   region     = "[AWS region]"
   encrypt    = true
   kms_key_id = "[Key ARN]"
 }
}

Different stacks could live anywhere from one to three directories deep in our monolith, so the repo_index variable accounts for it, based on how far the state config is from its template. Once we have that, it’s trivial to come up with the bucket key by reversing the current directory.

This has proven robust enough to be used across our monolith, symlinking it everywhere we previously had a bespoke state config. Because OpenTofu now figures out the key automatically, it removes any user error (apart from symlinking it in the first place, of course).

And, tying it back to the original topic, Terraform doesn’t support using locals this way:

$ terraform init
Initializing the backend...
Initializing modules...
- config in ../_shared
╷
│ Error: Variables not allowed
│
│   on state.tf line 22, in terraform:
│   22:     key        = "${local.state_repo}/${local.state_name}.tfstate"
│
│ Variables may not be used here.
╵
╷
│ Error: Variables not allowed
│
│   on state.tf line 22, in terraform:
│   22:     key        = "${local.state_repo}/${local.state_name}.tfstate"
│
│ Variables may not be used here.

So the risk of Terraform accidentally overwriting an OpenTofu state has been resolved!

OpenTofu offers many more features we can take advantage of (I particularly like provider iteration and migrating resource types), which we’ll look into integrating in the future.

💡 You might also like:

Spacelift integration

This part of the migration was straightforward and painless. Since OpenTofu 1.6, Spacelift has fully supported using the fork for its stacks. Switching over is as easy as selecting it in the settings.

We manage our stacks through the Spacelift OpenTofu provider, and ensuring our stacks were permanently changed to use OpenTofu merely required changing the terraform_workflow_tool attribute:

resource "spacelift_stack" "stacker" {
 # ...
 terraform_workflow_tool = "OPEN_TOFU"
}

Migration script

This script should help others migrating from Terraform to OpenTofu.

We wanted the script to cover as much as possible, while also providing regular feedback to the user so they can quickly resolve any issues as they arise. To avoid accidentally modifying the infrastructure, simply running the script causes a dry run via tofu plan; you need to add apply at the end in order to actually make changes.

You’ll need to either install the requisite OpenTofu versions through mise, or modify the script to point to different binaries.

One potential drawback for your use case is that the script doesn’t print the OpenTofu output to the console. We found this annoying, so we simply cd’d to the relevant directory and ran OpenTofu manually, as necessary. You may want to add this output back in if you’d prefer.

A few other assumptions are hardcoded in, like using the state config template mentioned in Breaking Terraform and deriving the stack directories based on my laptop.

Lastly, the script doesn’t include filling in the Spacelift runtime configuration, as we simply found it easier to do manually. You may want to add it in if you’d prefer.

#!/usr/bin/env bash
if [ $# -eq 0 ]; then
 echo 'Must add directory!'
 echo "$0 [repo] ['apply' if applying, dry run (plan) if not]"
 exit 1
fi
REPO_DIR="$1"
[ "$2" = 'apply' ] && APPLY=true || APPLY=false
OTF_VERSIONS=('1.6.2' '1.9.0')
REPOS_WITHOUT_INIT=()
REPOS_WITH_CHANGES=()
REPOS_WITH_STATE_ERRORS=()
ROOT_DIR="$(dirname -- "$(readlink -f -- "$0")")"
otf() { "$HOME/.local/share/mise/installs/opentofu/$1/bin/tofu" "${@:2}"; }
for dir in $(find "$(pwd)/$REPO_DIR" -type f -name state.tf -exec dirname {} \;); do
 cd "$dir" || exit
 echo "Current directory: $dir"
 rm -rf .terraform .terraform.lock.hcl
 for version in "${OTF_VERSIONS[@]}"; do
   echo "Using OpenTofu $version."
   if [ "$version" = '1.6.2' ]; then
     init_output="$(otf "$version" init -no-color)"
   else
     init_output="$(otf "$version" init -reconfigure -no-color)" # Updates existing .terraform config, doesn't touch the remote state.
   fi
   if [[ "$init_output" == *"OpenTofu has been successfully initialized!"* ]]; then
     echo 'Init successful, continuing...'
   else
     echo "!!! Init unsuccessful on: $dir, skipping..."
     REPOS_WITHOUT_INIT+=("$dir")
     continue 2 # Go to next repo, skip the state file changes.
   fi
   if [ "$APPLY" = true ]; then
     output="$(yes no | otf "$version" apply -no-color)" # Ensures we skip if there are changes.
   else
     output="$(otf "$version" plan -no-color)"
   fi
   if [[ "$output" == *"No changes. Your infrastructure matches the configuration."* ]]; then
     echo 'Execution successful, continuing...'
   else
     echo "!!! Execution had changes on: $dir, skipping..."
     REPOS_WITH_CHANGES+=("$dir")
     continue 2 # Go to next repo, skip the state file changes.
   fi
 done
 if [ "$APPLY" = true ]; then
   echo 'Extracting state repo and state name variables from key...'
   read -r state_repo state_name <<<"$(sed -rn 's|^\s+key\s+= "(.*)/(.*)\.tfstate"$|\1 \2|p' state.tf)"
   state_overrides=false
   echo 'Replacing state.tf...'
   rm -f state.tf
   if [ -f ../.state.tf.base ]; then
     ln -s ../.state.tf.base state.tf
   elif [ -f ../../.state.tf.base ]; then
     ln -s ../../.state.tf.base state.tf
   elif [ -f ../../../.state.tf.base ]; then
     ln -s ../../../.state.tf.base state.tf
   fi
   # This offset is hardcoded to my personal directory path, please edit if yours is different.
   if [ "$state_repo" != "$(pwd | cut -d '/' -f 6)" ]; then
     echo "state_repo_override = \"$state_repo\"" >>terraform.tfvars
     state_overrides=true
   fi
   if [ "$state_name" != "$(basename "$(pwd)")" ]; then
     echo "state_name_override = \"$state_name\"" >>terraform.tfvars
     state_overrides=true
   fi
   [ "$state_overrides" = true ] && tofu fmt -list=false
   if [[ "$(tofu init -reconfigure -no-color)" == *"OpenTofu has been successfully initialized!"* ]]; then
     echo 'State test successful, continuing...'
   else
     echo "!!! State test had changes on: $dir, skipping..."
     REPOS_WITH_STATE_ERRORS+=("$dir")
   fi
 fi
 rm -rf .terraform .terraform.lock.hcl
done
if [ "$APPLY" = true ]; then
 cd "$ROOT_DIR" || exit
 latest_otf_version="${OTF_VERSIONS[-1]}"
 otf_version_file=".opentofu-latest-${latest_otf_version:0:3}" # Only necessary for the version we're ending on.
 if ! [ -e "$otf_version_file" ] || [ "$(cat "$otf_version_file")" != "$latest_otf_version" ]; then
   echo "Setting OpenTofu version pin to $latest_otf_version..."
   echo "$latest_otf_version" >|"$otf_version_file"
 fi
 if ! [ "$REPO_DIR/.terraform-version" -ef "$otf_version_file" ]; then # If the symlink *doesn't* target the latest OTF version.
   echo "Setting $REPO_DIR's OpenTofu pin..."
   cd "$REPO_DIR" || exit
   rm -f .terraform-version
   ln -s "../$otf_version_file" .terraform-version
 fi
fi
if [ ${#REPOS_WITHOUT_INIT[@]} -gt 0 ]; then
 echo '--------------------'
 echo '!!! Repos with failed inits:'
 printf '%s\n' "${REPOS_WITHOUT_INIT[@]}"
fi
if [ ${#REPOS_WITH_CHANGES[@]} -gt 0 ]; then
 echo '--------------------'
 echo '!!! Repos with changes:'
 printf '%s\n' "${REPOS_WITH_CHANGES[@]}"
fi
if [ ${#REPOS_WITH_STATE_ERRORS[@]} -gt 0 ]; then
 echo '--------------------'
 echo '!!! Repos with state override errors:'
 printf '%s\n' "${REPOS_WITH_STATE_ERRORS[@]}"
fi

Closing thoughts

Spacelift’s leadership and support of OpenTofu have been invaluable in helping us move away from Terraform. We look forward to OpenTofu’s continued development. Spacelift has helped us every step of the way, and we’re in a great position to continue iterating on TrueCar’s infrastructure as the company’s needs develop.

Solve your infrastructure challenges

Spacelift is a flexible orchestration solution for IaC development. It delivers enhanced collaboration, automation, and controls to simplify and accelerate the provisioning of cloud-based infrastructures.

Learn more

Upgrading Our Infrastructure with OpenTofu

Breaking Terraform

Spacelift integration

Migration script

Closing thoughts

Solve your infrastructure challenges

Ultimate Security, Ultimate Flexibility

When Artifact Management Meets Infrastructure as Code: How to Use Cloudsmith and Spacelift

OpenTofu Tutorial – Getting Started, How to Install & Examples

Share your data and download the guide