Join experts to dive deep into IaC security and governance on August 27

How to Deploy an AWS ECS Cluster with Terraform [Tutorial]

29 Aug 2023·20 min read

Reviewed by: Flavius DinuFlavius Dinu

Deploying an AWS ECS Cluster of EC2 Instances with Terraform

🚀 Level Up Your Infrastructure Skills

You focus on building. We’ll keep you updated. Get curated infrastructure insights that help you make smarter decisions.

The Elastic Container Service (ECS) from Amazon Web Services (AWS) is a fully-managed cloud Container Orchestration Service. It runs multiple Docker containers on the cluster using AWS EC2 instances. Optionally it is also possible to run these containers on Fargate.

The more AWS ECS Clusters we deploy, the more complex the infrastructure management becomes. Especially when deploying and scaling a large cluster, the process becomes time-consuming as it involves a lot of repetitive tasks that lead to human errors, creating configuration drifts and increasing the risk of security breaches.

Using Terraform is the perfect solution to simplify the deployment of the AWS ECS Cluster. Terraform offers an automated way to manage AWS ECS Clusters, making the deployment process consistent and repeatable.

In this post, we will focus on how to set up an ECS cluster of EC2 instances using Terraform.

We will cover

How to deploy a service on ECS clusters

There are mainly two ways of deploying a service on ECS clusters – Fargate and EC2 instances. This depends on the underlying infrastructure used to run the container workloads of any ECS service.

AWS Fargate is a more cloud-native approach where the compute instances are automatically managed by AWS.

Running the ECS service on EC2 instances provides more control over the infrastructure. This also requires taking additional steps to set up the EC2 instances and auto-scaling group, networking, etc.

An ECS service generally consists of the components shown in the diagram below.

When an ECS Cluster is created, before any service is deployed, we have to provide the cluster with capacity providers.

In this blog post, the capacity providers are the EC2 instances where the scaling is managed by ASG. The service can then be deployed once the capacity providers or the capacity providing EC2 instances are registered with the ECS cluster.

An ECS service consists of multiple tasks. Each task is created based on the task definition provided by the service.

A task definition is a template that describes the source of the application image, resources required in terms of CPU and memory units, container and host port mapping, and other critical information.

Apart from the task definition, ECS service also provides the number of container instances to be created. This automatically creates the tasks, which are then assigned a target EC2 instance infrastructure where the containers are run. The running containers serve the incoming requests from the application load balancer (ALB) which is deployed in front of the EC2 instances in a VPC.

Why use ECS over EC2?

ECS and EC2 are services provided by AWS for running applications in the cloud. Although at a high level, they seem to host workloads, they serve different purposes and are suited for different types of workloads.

Some of the reasons for choosing ECS over EC2 are described below:

Containerization: ECS is designed for running containers either on Fargate or EC2 instances, which provide a scalable way to package and deploy applications. Containers offer better resource utilization, faster startup times, and improved isolation compared to traditional virtual machines used in EC2. If you’re using containerized applications, ECS provides a managed environment optimized for running and orchestrating containers.
Scalability and Elasticity: ECS allows you to easily scale your applications based on demand. It integrates with AWS Auto Scaling, which can automatically adjust the number of containers based on predefined scaling policies and placement constraints.
Orchestration: ECS provides built-in orchestration capabilities through integration with AWS Fargate or EC2 launch types. With Fargate, we don’t have to provision or manage any EC2 instances, as AWS takes care of the infrastructure for you. This is also a great option and allows us to focus solely on our applications. EC2 launch type offers more control and flexibility if we prefer managing the underlying EC2 instances ourself.
Monitoring: ECS provides a centralized management console and CLI tools for deploying and managing containers. It also integrates with AWS CloudWatch, allowing us to collect and analyze metrics, monitor logs, and set up alarms for our containerized applications. EC2 instances require separate management and monitoring configurations.
Cost Optimization: ECS can help optimize costs by allowing us to run containers on-demand without the need for permanent EC2 instances. With AWS Fargate, we pay only for the resources allocated to our containers while they are running, which is more cost-effective compared to maintaining a fleet of EC2 instances. Read more about AWS cost optimization.

It’s important to note that the choice between ECS and EC2 depends on our specific requirements, workload characteristics, and familiarity with containerization. While ECS offers benefits for container-based applications, EC2 still provides more flexibility for running traditional workloads that don’t require containerization.

ECS deployment with Terraform - Overview

The diagram below shows the outcome of ECS deployment using Terraform.

It shows how an ECS cluster is set up on EC2 instances spread across multiple availability zones within a VPC. It also includes several details like Application Load Balancers (ALB), auto-scaling group (ASG), ECS Capacity provider, ECS service, etc.

Note that there are other aspects not represented in the diagram, like ECR, Route tables, task definition, etc., which we will also cover.

We will break this infrastructure down into three parts. The list below provides an overview at a high level of all the steps we will be taking to achieve our goal of hosting ECS clusters on EC2 instances. As we proceed, we will also write the corresponding Terraform configuration.

VPC setup – In this part, we will explain and implement the basic VPC and networking setup required. We will implement subnets, security groups, and route tables, to access the hosted service from the internet as well as to SSH into our EC2 instances.
EC2 setup – In this part, we will explain and implement auto-scaling groups, application load balancer, and EC2 instances, across the two availability zones. We will also cover the setup in details required to host ECS container instances on our EC2 machines.
ECS setup – Finally, we do a step-by-step creation of the ECS cluster by creating ECS clusters, services, tasks, and capacity providers. We will also take a look at the application image which will be used to run on this ECS cluster. Additionally, we will also show how to create task definitions along with various Terraform resource attributes which enable the end-to-end deployment of the service.

Note: The final Terraform configuration is saved in this Github repository.

Before moving ahead, it is highly recommended to follow these steps hands-on.

How to setup ECS with Terraform - Example

Before we start, you should have Terraform installed locally. We will not go through the details of setting up the providers, AWS credentials, and AWS CLI.

Let’s go through the steps to deploy ECS using Terraform.

1. Setting up the VPC

To begin with, let us first establish our isolated network by defining the VPC and related components. This is important as we need to create multiple container instances of our application so that the load balancer is able to distribute the requests evenly.

The diagram below displays the target VPC design we would achieve using Terraform.

Create a file named vpc.tf in the project repository and add the code below.

You can optionally include all the Terraform code in the same file (e.g., in main.tf). However, I prefer to keep separate files to manage the IaC easily.

Step 1: Define our VPC

The code below creates a VPC with a given CIDR range. You are free to choose a CIDR range.

As an example, we use 10.0.0.0/16 in this case and name our VPC as “main”.

resource "aws_vpc" "main" {
 cidr_block           = var.vpc_cidr
 enable_dns_hostnames = true
 tags = {
   name = "main"
 }
}

Step 2: Add 2 subnets

Create two subnets in different availability zones to place our EC2 instances. We have used “cidrsubnet” Terraform function to dynamically calculate the CIDR range based on the VPC’s CIDR.

Note that we are using different availability zones in the Frankfurt region for both subnets.

resource "aws_subnet" "subnet" {
 vpc_id                  = aws_vpc.main.id
 cidr_block              = cidrsubnet(aws_vpc.main.cidr_block, 8, 1)
 map_public_ip_on_launch = true
 availability_zone       = "eu-central-1a"
}

resource "aws_subnet" "subnet2" {
 vpc_id                  = aws_vpc.main.id
 cidr_block              = cidrsubnet(aws_vpc.main.cidr_block, 8, 2)
 map_public_ip_on_launch = true
 availability_zone       = "eu-central-1b"
}

Step 3: Create internet gateway (IGW)

resource "aws_internet_gateway" "internet_gateway" {
 vpc_id = aws_vpc.main.id
 tags = {
   Name = "internet_gateway"
 }
}

Step 4: Create a Route table and associate the same with subnets

This route table enables internet communication of both the subnets, making them public.

resource "aws_route_table" "route_table" {
 vpc_id = aws_vpc.main.id
 route {
   cidr_block = "0.0.0.0/0"
   gateway_id = aws_internet_gateway.internet_gateway.id
 }
}

resource "aws_route_table_association" "subnet_route" {
 subnet_id      = aws_subnet.subnet.id
 route_table_id = aws_route_table.route_table.id
}

resource "aws_route_table_association" "subnet2_route" {
 subnet_id      = aws_subnet.subnet2.id
 route_table_id = aws_route_table.route_table.id
}

Step 5: Create a security group along with ingress and egress rules

Both ingress and egress rules of the security group allow inbound and outbound access for any protocol, via any port. This is not the best practice and should only be done for working through this example. Tighter rules should be implemented when in production.

resource "aws_security_group" "security_group" {
 name   = "ecs-security-group"
 vpc_id = aws_vpc.main.id

 ingress {
   from_port   = 0
   to_port     = 0
   protocol    = -1
   self        = "false"
   cidr_blocks = ["0.0.0.0/0"]
   description = "any"
 }

 egress {
   from_port   = 0
   to_port     = 0
   protocol    = "-1"
   cidr_blocks = ["0.0.0.0/0"]
 }
}

At this point, this is all we need as far as the VPC and networking is concerned. You can go ahead and provision these resources now or proceed with the next section.

The VPC setup is quite straightforward. We will not go deep into exploring the VPC creation using Terraform, as that is not the topic of this blog post. For more information on that, check out How to Build AWS VPC using Terraform.

This gives us the networking basics to proceed to the next step.

2. Configuring the EC2 instances

In this section, we are focusing on provisioning the auto-scaling group, defining the EC2 instance template used to host the ECS containers, and provisioning the application load balancer in the VPC created in the previous section.

Below you can see the updated diagram.

Create another file named ec2.tf, and follow the steps below.

Step 1: Create an EC2 launch template

A launch template, as the name suggests, defines the template used by the auto-scaling group to provision and maintain a desired/required number of EC2 instances in a cluster.

Launch templates define various characteristics of the EC2 instance:

Image: We use Amazon Linux image with CPU architecture as AMD.
Type: Size of the instance. We use “t3.micro”.
The size of the instance is decided by the system resources consumed by the container. In our case, we are hosting a “Docker Getting Started” image which does not consume a lot of resources. More about the image will be covered later in the post.
Key name: Specify the name of the key to be able to SSH into these instances from our local machines. You can either create a key using a separate Terraform resource block. In this case, I am just using a key that I have already generated.
Security group: The security group to be associated with the EC2 instance. Associate the same security group we created in the previous section.
IAM instance profile: This is very important. Without this, the EC2 instances will not be able to access the ECS service in AWS. “ecsInstanceRole” is a predefined role available in all the AWS accounts. However, if you want to use a custom role, make sure it can access the ECS service.
User data: This is also very important. The “ecs.sh” file contains a command to create an environment variable in “/etc/ecs/ecs.config” file on each EC2 instance that will be created. Without setting this, the ECS service will not be able to deploy and run containers on our EC2 instance.

resource "aws_launch_template" "ecs_lt" {
 name_prefix   = "ecs-template"
 image_id      = "ami-062c116e449466e7f"
 instance_type = "t3.micro"

 key_name               = "ec2ecsglog"
 vpc_security_group_ids = [aws_security_group.security_group.id]
 iam_instance_profile {
   name = "ecsInstanceRole"
 }

 block_device_mappings {
   device_name = "/dev/xvda"
   ebs {
     volume_size = 30
     volume_type = "gp2"
   }
 }

 tag_specifications {
   resource_type = "instance"
   tags = {
     Name = "ecs-instance"
   }
 }

 user_data = filebase64("${path.module}/ecs.sh")
}

The contents of ecs.sh file are:

#!/bin/bash
echo ECS_CLUSTER=my-ecs-cluster >> /etc/ecs/ecs.config

Step 2: Create an auto-scaling group (ASG)

Create an ASG and associate it with the launch template created in the last step.

ASG automatically manages the horizontal scaling of EC2 instances as per what is required by the ECS service but within the limits defined in this resource block.

resource "aws_autoscaling_group" "ecs_asg" {
 vpc_zone_identifier = [aws_subnet.subnet.id, aws_subnet.subnet2.id]
 desired_capacity    = 2
 max_size            = 3
 min_size            = 1

 launch_template {
   id      = aws_launch_template.ecs_lt.id
   version = "$Latest"
 }

 tag {
   key                 = "AmazonECSManaged"
   value               = true
   propagate_at_launch = true
 }
}

Note that apart from the desired capacity, max, and min count, we have also specified the “vpc_zone_identifier” attribute. This limits the ASG to provision instances in the same availability zones where the subnets are created.

A region may have more than two availability zones. Since we are leveraging only two subnets, the vpc_zone_identifier should follow the appropriate AZs dynamically.

Step 3: Configure Application Load Balancer (ALB)

An ALB is required to test our implementation in the end. In a way, it is optional as far as the discussion of this blog post is concerned, but there is no fun in doing the hard work and not being able to witness it in the real world.

See How to Manage Application Load Balancer (ALB) with Terraform.

Create the ALB, its listener, and the target group as defined in the code samples below.

resource "aws_lb" "ecs_alb" {
 name               = "ecs-alb"
 internal           = false
 load_balancer_type = "application"
 security_groups    = [aws_security_group.security_group.id]
 subnets            = [aws_subnet.subnet.id, aws_subnet.subnet2.id]

 tags = {
   Name = "ecs-alb"
 }
}

resource "aws_lb_listener" "ecs_alb_listener" {
 load_balancer_arn = aws_lb.ecs_alb.arn
 port              = 80
 protocol          = "HTTP"

 default_action {
   type             = "forward"
   target_group_arn = aws_lb_target_group.ecs_tg.arn
 }
}

resource "aws_lb_target_group" "ecs_tg" {
 name        = "ecs-target-group"
 port        = 80
 protocol    = "HTTP"
 target_type = "ip"
 vpc_id      = aws_vpc.main.id

 health_check {
   path = "/"
 }
}

Note that we are still using the same VPC, subnets, and security group we created in the previous section. The rest of the ALB configuration is straightforward and basic.

Again, at this point, you can choose to create these resources using terraform plan and apply commands. Or, move forward for the final section, we would create the ECS cluster.

💡 You might also like:

3. Configuring the ECS cluster

With the networking and EC2 infrastructure defined or provisioned, we finally have arrived at provisioning an ECS cluster and hosting a web application on the same.

In this section, we will achieve the target architecture represented in the target deployment section as well as below.

Step 1: Create and push the application image

We would deploy the application “docker/getting-started”, which is usually shipped with every Docker desktop installation. Pull this image locally, and then push it to any accessible image repository of your choice.

To keep this simple, I have created a public Elastic Container Repository (ECR), tagged the image appropriately, and pushed the same to this public ECR.

Note that if you are using a Mac M1 processor (or any ARM-based processor) to build the image locally, then the CPU architecture differs from the Amazon Linux AMI (X86), which we have used in the EC2 setup section. This impacts the way images are built and are not compatible. As a workaround, do a manual build and push by logging in to an Amazon Linux AMI-based EC2 instance.

Step 2: Provision ECS cluster

Optionally, create a new configuration file and add the code below. In my example, I have created a main.tf file to add the ECS-related configuration. We begin by provisioning an ECS cluster using the “aws_ecs_cluster” resource.

resource "aws_ecs_cluster" "ecs_cluster" {
 name = "my-ecs-cluster"
}

This is a simple resource block with just a name attribute. This does not do much, but this is where provisioning an ECS cluster begins.

Step 3: Create capacity providers

Next, we create a couple of resources to provision capacity providers for the ECS cluster created in the previous step.

The “aws_ecs_capacity_provider” resource associates the auto-scaling group with the cluster’s capacity provider. Whereas “aws_ecs_cluster_capacity_providers” binds the ASG capacity provider with the ECS cluster created in Step 1.

resource "aws_ecs_capacity_provider" "ecs_capacity_provider" {
 name = "test1"

 auto_scaling_group_provider {
   auto_scaling_group_arn = aws_autoscaling_group.ecs_asg.arn

   managed_scaling {
     maximum_scaling_step_size = 1000
     minimum_scaling_step_size = 1
     status                    = "ENABLED"
     target_capacity           = 3
   }
 }
}

resource "aws_ecs_cluster_capacity_providers" "example" {
 cluster_name = aws_ecs_cluster.ecs_cluster.name

 capacity_providers = [aws_ecs_capacity_provider.ecs_capacity_provider.name]

 default_capacity_provider_strategy {
   base              = 1
   weight            = 100
   capacity_provider = aws_ecs_capacity_provider.ecs_capacity_provider.name
 }
}

Step 4: Create ECS task definition with Terraform

As described in the ECS overview section before, we now define the task definition/template of the container tasks to be run on the ECS cluster using the image we pushed in Step 1.

Some of the important points to note here are:

We have defined the network mode to be “awsvpc”. This tells the ECS cluster to use the VPC networking we have defined in the “VPC setup” section.
We have provided the task definition with ecsTaskExecutionRole.
Defined CPU resource requirement as 256.
The runtime platform is an important attribute. Since we are using Amazon Linux AMI for our EC2 instances, the operating_system_family is specified as “LINUX,” and the CPU architecture is set as “X86_64”. If this information is incorrect, the ECS tasks enter in the constant restart loop.
Container definitions: This is where we define the resource requirements of the container to be run in the task. We have provided below attributes:
- Name of the container instance
- Image URL of the application image
- CPU capacity units
- Memory capacity units
- Container and host port mappings

resource "aws_ecs_task_definition" "ecs_task_definition" {
 family             = "my-ecs-task"
 network_mode       = "awsvpc"
 execution_role_arn = "arn:aws:iam::532199187081:role/ecsTaskExecutionRole"
 cpu                = 256
 runtime_platform {
   operating_system_family = "LINUX"
   cpu_architecture        = "X86_64"
 }
 container_definitions = jsonencode([
   {
     name      = "dockergs"
     image     = "public.ecr.aws/f9n5f1l7/dgs:latest"
     cpu       = 256
     memory    = 512
     essential = true
     portMappings = [
       {
         containerPort = 80
         hostPort      = 80
         protocol      = "tcp"
       }
     ]
   }
 ])
}

Step 5: Create the ECS service

This is the last step, where we provision the service to be run on the ECS cluster. This is where all the resources created culminate to successfully run the application service.

The attributes defined here are:

Name: name of the ECS service
Cluster: Reference to the ECS cluster record we created in Step 2.
Task definition: Reference to the task definition template created in Step 4.
Desired count: We have specified that we want to run two instances of this container image on our ECS cluster.
Network configuration: We have specified the subnets and security group we created in the VPC setup section.
Placement constraints: We have specified that the container instances should run on distinct instances instead of using the residual capacity in each instance. This is not a best practice, but just to prove the concept.
Capacity provider strategy: We have provided the reference to the capacity provider created in Step 3.
Load balancer: Reference to the load balancer we created in the EC2 setup section.

resource "aws_ecs_service" "ecs_service" {
 name            = "my-ecs-service"
 cluster         = aws_ecs_cluster.ecs_cluster.id
 task_definition = aws_ecs_task_definition.ecs_task_definition.arn
 desired_count   = 2

 network_configuration {
   subnets         = [aws_subnet.subnet.id, aws_subnet.subnet2.id]
   security_groups = [aws_security_group.security_group.id]
 }

 force_new_deployment = true
 placement_constraints {
   type = "distinctInstance"
 }

 triggers = {
   redeployment = timestamp()
 }

 capacity_provider_strategy {
   capacity_provider = aws_ecs_capacity_provider.ecs_capacity_provider.name
   weight            = 100
 }

 load_balancer {
   target_group_arn = aws_lb_target_group.ecs_tg.arn
   container_name   = "dockergs"
   container_port   = 80
 }

 depends_on = [aws_autoscaling_group.ecs_asg]
}

Run terraform plan and apply commands to provision all the infrastructure defined so far.

In the next section, we will describe the steps to test if the deployment was successful or not.

4. Testing the ECS deployment with Terraform

When the Terraform code is successfully applied as confirmed from the terminal output, log in to the AWS account and check for the existence of the below resources.

It should have created the VPC, associated subnets, route table, and internet gateway, as shown in the resource map below.

It should create a couple of EC2 instances as defined in the launch template in appropriate VPC subnets, and appropriate security groups should have been assigned.

Confirm this by navigating to the EC2 dashboard.

Navigate to EC2 > Load balancers. It should have created the Application load balancer (ALB), with appropriate VPC and subnet association, as shown in the screenshot below.

Navigate to EC2 > Autoscaling groups.

It should have created the autoscaling group with the attribute values we supplied in the resource configuration.

Navigate to Amazon ECS service. A cluster by name “my-ecs-cluster” should have been created, as shown below.

This ECS cluster should have correctly associated capacity providers in the form of ASG and registered EC2 instances managed by that ASG, as shown in the screenshot below.

Under the “Services” tab, a service with the configuration below should be created.

As defined in the configuration, the Service should trigger two tasks and thus create two container instances on separate EC2 instances, as shown in the screenshot below.

Finally, to test if the application is being served as expected, navigate to EC2 > Load balancer.

Copy the DNS name (A Record / URL) and open the same in a separate tab. It should serve the “Docker Getting Started” homepage, as shown in the below screenshot.

How to mange Terraform resources with Spacelift

Terraform is really powerful, but to achieve an end-to-end secure Gitops approach, you need to use a product that can run your Terraform workflows. Spacelift takes managing Terraform to the next level by giving you access to a powerful CI/CD workflow and unlocking features such as:

Policies (based on Open Policy Agent) – You can control how many approvals you need for runs, what kind of resources you can create, and what kind of parameters these resources can have, and you can also control the behavior when a pull request is open or merged.
Multi-IaC workflows – Combine Terraform with Kubernetes, Ansible, and other infrastructure-as-code (IaC) tools such as OpenTofu, Pulumi, and CloudFormation, create dependencies among them, and share outputs
Build self-service infrastructure – You can use Blueprints to build self-service infrastructure; simply complete a form to provision infrastructure based on Terraform and other supported tools.
Integrations with any third-party tools – You can integrate with your favorite third-party tools and even build policies for them. For example, see how to Integrate security tools in your workflows using Custom Inputs.

Supply chain management platform Logixboard was a Terraform Cloud customer seeking a more reliable Terraform experience. By migrating from Terraform Cloud to Spacelift, they have slashed the time they spend troubleshooting deployments, freeing them for more productive work.

Spacelift customer case study

Read the full story

Spacelift enables you to create private workers inside your infrastructure, which helps you execute Spacelift-related workflows on your end. For more information on configuring private workers, refer to the documentation.

Spacelift can also optionally manage the Terraform state for you, offering a backend synchronized with the rest of the platform to maximize convenience and security. You can also import your state during stack creation, which is very useful for engineers who are migrating their old configurations and states to Spacelift.

For more information, refer to this blog post, which shows Spacelift’s remote state capabilities in detail. If you want to try it out yourself, create a free account today or book a demo with one of our engineers.

Key points

This article covered creating and managing an AWS ECS cluster with Terraform, including examples and Terraform ECS task definitions.

We encourage you also to explore how Spacelift makes it easy to work with Terraform. If you need any help managing your Terraform infrastructure, building more complex workflows based on Terraform, and managing AWS credentials per run, instead of using a static pair on your local machine, Spacelift is a fantastic tool for this. You can check it for free by creating a trial account.

Note: New versions of Terraform are placed under the BUSL license, but everything created before version 1.5.x stays open-source. OpenTofu is an open-source version of Terraform that expands on Terraform’s existing concepts and offerings. It is a viable alternative to HashiCorp’s Terraform, being forked from Terraform version 1.5.6.

Discover better way to manage Terraform at scale

Spacelift helps manage Terraform state, build more complex workflows, supports policy as code, programmatic configuration, context sharing, drift detection, resource visualization, and many more.

Start free trial

Written by

Sumeet Ninawe

Sumeet is a seasoned professional with 14 years of experience in both startups and enterprises. He is the founder of Let’s Do Tech, where he specializes in building products and writing content on Cloud-Native technologies and DevOps practices and offers independent consulting services. Stay updated about his activities in his newsletter.