Scaling AWS infrastructure is essential to accommodate the increasing demands of the environment. Not only does it help maintain high service reliability, but it also optimizes AWS resource utilization. However, the process isn’t as complex as most people think.
AWS has excellent service capabilities to enable high scalability among infrastructure components and cloud-hosted applications. We will explore the tools and features provided by Amazon that are perfect for scaling existing AWS infrastructure.
In this article, we will cover:
Before scaling the infrastructure horizontally or vertically, we first need to analyze it. This helps us identify bottlenecks or performance issues that can limit the effectiveness of scaling efforts. We can also identify underutilized resources for optimization and evaluate the infrastructure’s scalability requirements.
Monitor the infrastructure to gain insights and gather key metrics
Monitoring the AWS infrastructure is essential for gathering key insights and metrics. It helps optimize performance, identify potential issues, and ensure the infrastructure’s security and compliance. With the information gathered, we can make data-driven decisions to enhance resource utilization and reduce costs.
AWS CloudWatch is the best service for monitoring and scaling AWS infrastructure. In addition to CloudWatch, we can use various third-party tools like Splunk, NewRelic, and Datadog to monitor AWS infrastructure. Each of these tools has different capabilities and is utilized depending on the type of AWS infrastructure we have.
Identify bottlenecks
Infrastructure bottlenecks are events that cause various service disruptions like:
- Network congestion due to insufficient bandwidth
- Loss of user engagement and data
- Delays in infrastructure resource deployments
- Physical failure of the servers, routers, databases, apps, etc.
- Outdated hardware or software components
Bottlenecks can occur at any point in the infrastructure, resulting in slower processing time and reduced productivity. Addressing infrastructure bottlenecks is crucial to reducing system downtime.
Various metrics are used to monitor and analyze. Tools like AWS Trusted Advisor, New Relic, Datadog, or CloudWatch help identify these bottlenecks. These tools can identify the source of the bottlenecks and implement best practices such as hardware upgrades, software optimization, additional capacity provisioning, etc.
Leverage AWS CloudWatch
AWS CloudWatch is a powerful AWS monitoring and logging service with all the features to collect and analyze data on infrastructure resource utilization, software performance, and network traffic. Use the methods below to leverage AWS CloudWatch to scale any AWS infrastructure.
- Monitoring logs: Use CloudWatch’s real-time monitoring and alerting capabilities to fetch monitor logs and resource utilization reports from security metrics. Refer to the logs to identify potential issues that might impact resource availability and performance.
- Key metrics: AWS CloudWatch has a centralized dashboard to view all existing AWS resources and applications. Define key performance indicators (KPIs) and other metrics that are relevant to the service. Monitor the thresholds of AWS resources for scaling and create scaling policies accordingly with the help of the defined metrics.
- CloudWatch alarms: Use the AWS Management Console or CloudWatch API to create and configure alarms in CloudWatch that notify us whenever metrics reach predefined threshold levels.
- Infrastructure scaling policies: Create or define separate infrastructure scaling policies to define the precautionary actions that need to be taken when an alarm is triggered. To configure the policies, we can use the AWS Auto Scaling service or the AWS EC2 Auto Scaling service.
Scaling AWS web servers ensures proper handling of web traffic with improved application performance and availability. It distributes the workload to handle an increasing number of incoming requests without affecting the availability of websites and web apps.
AWS vertical scaling (scaling up/down)
Vertical scaling in AWS involves increasing or decreasing the capacity of a single resource, such as upgrading an instance to a more powerful one or reducing its capacity.
- Scaling up: Increasing the power of an individual resource. For example:
- Moving from a t3.medium EC2 instance to a m5.large instance with more CPU and memory.
- Adding more storage to an EBS volume
- Using a more powerful RDS database instance
- Scaling down: Decreasing the power of a resource when demand decreases. For example:
- Downgrading an EC2 instance from m5.large to t3.small
- Reducing the allocated storage or instance type for an RDS database
AWS horizontal scaling (scaling out/in)
Horizontal scaling of AWS infrastructure involves adding or removing multiple resources to distribute the load across a larger system. It’s typically implemented by adding or removing instances or servers.
- Scaling out: Adding more instances to handle the increased load. For example:
- Adding more EC2 instances to an Auto Scaling Group
- Adding more nodes to an RDS Aurora cluster
- Increasing the number of containers in a Kubernetes cluster
- Scaling in: Removing instances or nodes when demand decreases. For example:
- Reducing the number of EC2 instances in an Auto Scaling Group
- Removing nodes from a distributed system like Amazon ElastiCache
Horizontal vs. vertical scaling in AWS
Let’s see how these broad ways of scaling web servers compare:
Vertical scaling | Horizontal scaling | |
Definition | Increasing/decreasing the size or capacity of a single resource. | Adding/removing multiple resources to distribute the workload. |
Key actions | Upgrade or downgrade instance type or size. | Add or remove instances or servers. |
Scalability limit | Limited by the maximum capacity of the resource. | Theoretically unlimited with proper design. |
Downtime | May require downtime for upgrades. | Can be achieved without downtime. |
Complexity | Simple to implement and manage. | More complex; requires distributed systems. |
Application suitability | Best for monolithic or single-node applications. | Ideal for distributed or stateless applications. |
Fault tolerance | Lower, as it depends on a single resource. | Higher, due to distributed architecture. |
AWS example | Upgrading an EC2 instance type or RDS class. | Auto Scaling EC2 instances or adding Aurora replicas. |
To summarize, vertical scaling enhances the capacity of a single resource to handle increased demand, while horizontal scaling expands capacity by adding multiple resources, enabling distributed load management.
Choose the type of scaling based on your AWS infrastructure setup and application requirements. For resource-intensive applications with occasional or predictable traffic peaks, vertical scaling can be a quick solution. However, for handling long-term growth, high availability, and unpredictable traffic patterns, horizontal scaling provides a more robust and scalable approach.
What is AWS Auto Scaling?
AWS Auto Scaling is a service that automatically increases or decreases the number of compute resources, such as EC2 instances, based on the current demand for your application.
For example, if the web server’s memory usage is over 90%, the Amazon EC2 Autoscaling service will dynamically add a new server instance. It will also remove the extra instance once the memory optimization is below the threshold value.
The primary goal of Auto Scaling is to ensure that the application performs well and remains cost-efficient by scaling resources up or down as needed.
It is also possible to schedule Auto Scaling based on certain conditions on web servers. It offers us enough flexibility to scale in or scale out AWS infrastructure components during the scheduled runtimes. The system gets to the normal phase once the schedule is over.
Read more about deploying the AWS auto-scaling group with Terraform.
Key features of AWS Auto Scaling
Here are the key features of AWS Auto Scaling:
- Automatic scaling: Automatically adjusts the capacity of your AWS resources (e.g., EC2 instances, DynamoDB tables, Aurora databases) based on predefined conditions or actual demand.
- Dynamic scaling: Adjusts resources in real-time based on changing demand patterns. For example, it can add or remove EC2 instances based on CPU utilization or application metrics.
- Predictive scaling: Uses machine learning to predict future traffic patterns and adjusts capacity proactively to handle expected increases or decreases in demand.
- Resource scaling: Ensures that you’re not over-provisioning or under-provisioning resources, helping to optimize costs while maintaining performance.
- Scalability across multiple services: Supports various AWS resources, such as:
- Amazon EC2 instances
- Spot Fleet instances
- Amazon ECS (Elastic Container Service) tasks
- DynamoDB tables and indexes
- Aurora replicas
- Health monitoring: Automatically replaces unhealthy instances or resources to maintain application availability.
- Scaling policies: Offers several scaling policies
- Target tracking scaling: Automatically adjusts capacity to maintain a target utilization metric (e.g., CPU usage).
- Step scaling: Scales resources in steps based on alarms triggered by CloudWatch metrics.
- Scheduled scaling: Increases or decreases capacity at specific times.
- Cost-optimization: Helps reduce costs by dynamically allocating just the right amount of resources, avoiding both underutilization and overprovisioning.
Auto Scaling use cases
Generally speaking, AWS Auto Scaling makes it easier to build highly available and scalable applications while minimizing operational overhead and costs. It can also be useful in the following scenarios:
- Handling unpredictable traffic spikes, such as during flash sales or marketing campaigns.
- Scaling applications to meet seasonal demand changes (e.g., holiday shopping).
- Maintaining consistent performance for applications with cyclical workloads (e.g., daily or weekly traffic patterns).
- Dynamically scaling compute resources to handle large-scale data processing tasks, such as rendering videos or running analytics pipelines, and de-scales once the jobs are completed.
- Automatically scaling development or testing instances up during working hours and down during off-hours to minimize costs while maintaining developer availability.
AWS Auto Scaling is more agile, cost-efficient, and automated compared to traditional scaling methods, making it ideal for dynamic and modern application environments. Traditional scaling methods, while still useful in some on-premises scenarios, lack the automation and flexibility needed to handle rapidly changing workloads.
Scaling with Application Load Balancer (ALB)
Application Load Balancer (ALB) is an AWS service that allows us to divide the application load between multiple AWS EC2 instances or Lambda functions. Here are the main types of AWS services that support ALB:
- EC2 instances
- EKS (Elastic Kubernetes Service)
- ECS (Elastic Container Service)
ALB is suited for handling HTTP or HTTPS traffic. It takes only a few minutes to set up the ALB in web servers and balance the traffic load between AWS EC2 instances.
Read more: How to Manage Application Load Balancer (ALB) with Terraform.
Amazon Relational Database Service (RDS) is a collection of AWS-managed services that simplify database setup, scaling, and management in the cloud. Amazon supports all the popular relational database management systems and offers excellent scalability features.
Scaling AWS with Amazon RDS Multi-AZ
Amazon RDS Multi-AZ enhances the availability of the Amazon RDS database instances, making them ideal for handling production workloads. Below are some important reasons for using RDS Multi-AZ to scale AWS infrastructure.
- Automatic failover: This feature ensures the high availability of AWS databases by performing automatic database failovers within 60 seconds with no manual intervention and zero data loss.
- Protect database performance: This feature ensures that I/O activity is not suspended during the ongoing backup of the database standby instance.
- Enhanced durability: AWS RDS Multi-AZ synchronous replication can hold the data on standby database instances side-by-side with the primary instance.
- Increased availability: It allows us to deploy a standby database instance in another AZ and achieve excellent fault tolerance during instance failure.
The Multi-AZ feature of AWS RDS places a standby database instance in another availability zone to ensure high availability during hardware failures. Enabling this through the RDS dashboard is straightforward.
Learn how to create an AWS RDS Instance using Terraform.
Scaling AWS with RDS Read replicas
Amazon RDS Read Replicas are clone servers of the primary database server with similar features and capabilities. Being a secondary database instance, RDS Read Replicas offer enhanced read performance for Amazon RDS database instances by elastically scaling out the primary instance.
Both primary and secondary database servers are auto-synced in real-time to maintain data synchronization. However, it is possible to route web app traffic that only needs to read from the database to the Read Replicas directly and reduce the primary database instance’s workload.
Read replicas are available in Amazon Aurora and AWS RDS for MariaDB, MySQL, Oracle, PostgreSQL, and SQL Server.
Scaling AWS with Aurora
Amazon Aurora offers unparalleled high availability and performance at a global scale with end-to-end PostgreSQL and MySQL compatibility. This relational database combines the capabilities of traditional enterprise databases with its simple yet cost-effective open-source databases. Amazon Aurora is perfect for:
- Modernizing the operations of enterprise applications like ERP, CRM, etc.
- Support reliable and multi-tenant SaaS applications with DB flexibility.
- Develop and deploy distributed applications at scale across different regions.
- Instantaneous serverless scaling to reduce operational expenses.
Compared to RDS, Amazon Aurora has built-in DR (disaster recovery) and HA (high availability) capabilities. We can easily migrate from commercial database engines like SQL or Oracle to relational database instances. Aurora is perfect for scaling small to medium workloads in AWS infrastructure.
Event-driven architecture (EDA) is a design pattern in which decoupled components of a system communicate with each other by producing and consuming events. In AWS, this architecture leverages managed services to build scalable, reliable, and loosely coupled systems. It is also one of the strategies used in scaling AWS infrastructure in below contexts:
- It can handle asynchronous communication between AWS services that are distributed across multiple servers and regions.
- Its main components are events generated by various sources, such as users, system components, and external components.
- It promotes loose service coupling and reduces dependencies across AWS resources without affecting the rest of the system.
- It supports high fault tolerance so that AWS resources can communicate asynchronously.
AWS provides several services, such as Lambda and Simple Queue Service (SQS), that enable EDA events to trigger the execution of specific code.
SQS to implement loose coupling
SQS is a fully managed message queuing service that enables us to decouple and scale various microservices, serverless applications, and distributed systems within the AWS infrastructure. We can use SQS to decouple the system components so that they can work and scale independently.
Leverage serverless architecture using Lambda
AWS Lambda is Amazon’s serverless computing service, enabling us to run code without provisioning any physical servers. The lambda function can automatically scale our AWS-based applications by considering the incoming traffic. So we don’t need to consider capacity planning for executing a Lambda function.
Some of the benefits of scaling your AWS infrastructure are described in short below.
- Improved performance and availability: Distributing workloads across multiple AWS instances or servers, ensuring that all applications remain available.
- Cost optimization: Scaling allows us to use AWS resources more efficiently, avoiding overprovisioning, which results in unused cloud resources. Learn more about AWS Cost Optimization.
- Increased flexibility: AWS provides enough capabilities to scale up or scale down the infrastructure to match the demand.
- Reduced downtime: Scaling ensures that all infrastructure components are up and running even during unexpected spikes in traffic, reducing the risk of service downtime and outages.
- Automatic scaling: AWS offers auto-scaling capabilities so that the infrastructure scales automatically based on predefined policies.
- Geographic scalability: AWS allows infrastructure scaling across different regions globally. Hence, we can deploy resources in multiple regions to reduce latency.
Does your organization have extra compliance concerns? Spacelift has you covered with the possibility of self-hosting it in AWS. You can also read about Spacelift integration with AWS, with the new Cloud Integrations section and update to support account-level AWS integrations.
Terraform IaC enables us to create, change, and scale AWS infrastructure by defining resources with a loop using for_each
or count
.
Here’s a Terraform code example with which we can create multiple EC2 instances using count
.
resource "aws_instance" "my_web_server" {
count = 3
ami = "ami-0c94855ba95c71c99"
instance_type = "t2.micro"
tags = {
Name = "My web server ${count.index + 1}"
}
}
In the above example, the number of EC2 instances to be created can be declaratively mentioned using the count attribute. Creating an input variable to adjust this number helps update the number of instances dynamically.
Similarly, we can also use for_each
construct to create multiple resources with similar configuration dynamically.
The example below implements an input variable “bucket_names” along with a for_each
attribute to create multiple S3 buckets.
variable "bucket_names" {
type = set(string)
default = [
"example-bucket-1",
"example-bucket-2",
"example-bucket-3"
]
}
resource "aws_s3_bucket" "example_buckets" {
for_each = var.bucket_names
bucket = each.value
tags = {
Name = "${each.value} Bucket"
Environment = "Production"
}
}
Learn more about best practices when managing Terraform at scale.
There are various ways to scale Kubernetes clusters to manage varying demands. The kubectl CLI, Kubernetes dashboard, and Kubernetes API are some of the most commonly used tools for this purpose.
Scaling a Kubernetes cluster using these tools requires monitoring efforts to identify the changes in demand and act accordingly. Horizontal Pod Autoscaler (HPA) helps in automatically performing kubernetes cluster scaling tasks. However, it is not a continuous process and needs to be scheduled. Due to this, it makes it difficult to accurately scale the pods.
There are a couple of auto scalers that are worth considering – Cluster Autoscaler and Karpenter.
Cluster Autoscaler
The Kubernetes Cluster Autoscaler is a Kubernetes deployment component that monitors demand, manages pod creation, and provides additional nodes for scaling purposes.
It monitors resource utilization and workload. Depending on this information, if required, it provisions additional nodes to spin more pods or deprovisions them when demand fades down. This way, it adjusts the cloud resources automatically to gain optimum cost benefits.
Karpenter
Karpenter is an open-source serverless auto-scaling solution for Kubernetes. It is designed to work natively with Kubernetes and all the major cloud providers, making it easy for organizations running their workloads on Kubernetes to adopt it.
Its automatic scaling up and scaling down capabilities of nodes enable organizations to run their workloads on a cost-effective, on-demand basis. Karpenter uses Kubernetes API to manage nodes and workloads, making it easy to deploy and use.
- Leverage Auto Scaling: AWS Auto Scaling allows dynamic adjustments of resources like EC2 instances and ECS tasks based on demand. Use target tracking, step scaling, and predictive scaling policies to handle traffic variations while maintaining performance and cost efficiency.
- Design for stateless applications: Stateless applications store state externally in services like DynamoDB, S3, or ElastiCache. Combined with Elastic Load Balancers (ELB), this design enables seamless horizontal scaling by eliminating dependencies between application instances.
- Monitor and optimize resource utilization: AWS CloudWatch provides real-time insights into resource usage metrics, such as CPU and memory utilization. Tools like AWS Compute Optimizer and Cost Explorer help identify underutilized resources, ensuring cost-effective scaling.
- Adopt purpose-built databases and caching: Use workload-specific databases (e.g., DynamoDB for NoSQL or RDS for relational data) to optimize performance. Integrate Amazon ElastiCache for caching frequently accessed data to reduce database load and latency.
- Enable fault tolerance and high availability: Multi-AZ deployments for services like RDS and DynamoDB ensure high availability. Health checks in Auto Scaling groups and load balancers automatically detect and replace failed instances, maintaining application uptime.
- Adopt Infrastructure as Code (IaC): Tools like AWS CloudFormation, OpenTofu and Terraform allow you to define and manage infrastructure configurations as code. This ensures consistent, scalable, and reproducible deployments, simplifying updates and scaling adjustments.
Spacelift is an infrastructure orchestration platform that increases your infrastructure deployment speed without sacrificing control.
With Spacelift, you can provision, configure, and govern with one or more automated workflows that orchestrate Terraform, OpenTofu, Terragrunt, Pulumi, CloudFormation, Ansible, and Kubernetes.
You don’t need to define all the prerequisite steps for installing and configuring the infrastructure tool you are using, nor the deployment and security steps, as they are all available in the default workflow.
Spacelift offers a unique set of infrastructure orchestration capabilities, such as:
- Policies (based on Open Policy Agent) — You can control how many approvals you need for runs, the kind of resources you can create, and the kind of parameters these resources can have, and you can also control the behavior when a pull request is open or merged.
- Multi-IaC workflows — Combine Terraform with Kubernetes, Ansible, and other IaC tools such as OpenTofu, Pulumi, and CloudFormation, create dependencies among them, and share outputs
- Build self-service infrastructure — You can use Blueprints to build self-service infrastructure; simply complete a form to provision infrastructure based on Terraform and other supported tools.
- Integrations with any third-party tools — You can integrate with your favorite third-party tools and even build policies for them. For example, you can Integrate security tools in your workflows using Custom Inputs.
- Drift detection and remediation
Spacelift enables you to create private workers inside your infrastructure, which helps you execute Spacelift-related workflows on your end. The documentation provides more information on configuring private workers.
If you want to learn more about what you can do with Spacelift, check out this article, create a free account today, or book a demo with one of our engineers.
Scaling AWS infrastructure is essential to ensure that web applications, servers, and databases can handle increased traffic and workload demands. By analyzing the infrastructure and leveraging AWS services like Aurora, CloudWatch, and Autoscaling Groups, we can effectively scale our web servers and databases.
Additionally, event-driven architecture and serverless technologies such as Lambda Function and SQS can help us implement loose coupling and improve scalability. We can leverage Terraform’s for_each/count loops to dynamically create or destroy multiple resources. In this post, we also discussed how Karpenter and AWS Cluster Autoscaler automate the scaling of AWS infrastructure.
Solve your infrastructure challenges
Spacelift is a flexible orchestration solution for IaC development. It delivers enhanced collaboration, automation, and controls to simplify and accelerate the provisioning of cloud-based infrastructures.