Cloud adoption has become a de facto standard today. The need to scale up and scale down the resources on demand, paying only for the resources you consume, and convenience are some of the major reasons for its adoption.
While there are many benefits of cloud computing, sometimes enterprises end up paying more for the services due to a lack of appropriate planning, architecture, optimization, and improper monitoring of resource consumption.
In this post, we explore cloud cost optimization strategies and best practices for cost-efficient utilization of cloud resources.
We will cover:
In general, IAAS cloud pricing models are based on three fundamental aspects – compute, storage/database, and network utilization. AWS, for example, provides variations in its services to optimize costs based on our business needs.
- Compute – AWS provides Amazon Elastic Compute Cloud (EC2), Amazon Container Services (ECS), Amazon Kubernetes Services (EKS), AWS lambda, etc. Each of these services differs in various attributes like availability, managed/shared/dedicated tenancy, nature of the workload, etc., which potentially impact costs.
- Example compute cost calculation: $ per seconds /minutes/hours/…/days
- Storage/databases – AWS provides Amazon Simple Storage Service (S3), Amazon Elastic Block Storage (EBS), Amazon Elastic File System (EFS), Amazon Glacier, etc., as storage solutions. Whereas some of the database services are – Amazon RDS, Amazon DynamoDB, Aurora, etc. This provides flexibility to the customers to move data to less frequently accessed storage class and save costs.
- Example storage/database cost calculation: $ per Mb/Gb/Tb
- Network – Costs associated with network bandwidth consumption mainly depend on two aspects – the speed of the connection and the amount of data transferred (inbound/outbound).
- Example network consumption cost calculation: $ per data transfer/ No. of API requests
The examples above provide a high-level overview of how the cost calculations are done based on the three core infrastructure components.
A complex pricing model is associated with each service which makes it difficult to arrive at a final figure that indicates the costs associated with any given cloud architecture.
Thankfully, AWS provides the pricing calculator to create estimates for each service depending on the customization options chosen.
AWS’ well-architected framework provides a set of best practices and guidelines for designing and operating reliable, secure, efficient, and cost-effective systems in the cloud. Cost optimization is one of the five pillars of this framework. It aims to achieve the desired outcomes with justified costs.
In this section, we discuss design principles that guide us in optimizing costs.
Note that this may not be an exhaustive list of principles but a suggested approach toward cloud cost optimization.
Monitor resource consumption
Optimization efforts are always preceded by tracking and monitoring efforts. Monitoring provides the context and understanding of why certain costs need to be optimized. Monitoring resource consumption sheds light on areas where overspending takes place. Several tools provided by AWS and other 3rd party services help uncover the unnecessary expenditure.
- CloudWatch Metrics – provides various metrics and the ability to set alarms when a certain threshold is met. With CloudWatch Metrics, we monitor compute resource consumption, database storage capacities, and various other metrics.
- AWS Cost Explorer – provides a detailed analysis of cloud spend by service.
- 3rd party tools – specialized services that monitor AWS cloud costs provide reporting on the cloud resource consumption and price associated with the same.
Adopt a consumption-based pricing model
Instead of reserving resources to serve uneven or occasional traffic, adopt consumption-based resource allocation by leveraging auto-scaling capabilities. This helps reduce costs by avoiding overprovisioning of resources and paying for unused resources.
This also binds with the degree of cloud adoption. Cost optimization and finer control over cloud resource utilization are supported in a better way by cloud-native services.
For example, moving from EC2 instances to AWS Fargate or Lambda functions. These services only create additional compute instances when they are needed and in a quicker way.
Use the right compute instances
It is possible to classify workloads as critical and non-critical. Depending on this, provisioning compute instances with general configuration on shared resources for less critical workloads may make sense. Limiting the provisioning of dedicated servers only to critical workloads helps optimize costs.
To identify and provision appropriate compute resources based on utilization and criticality, the steps are summarized below:
- Analyze the resource usage with AWS CloudWatch to identify which resources are underutilized.
- Identify optimal resource levels that will suffice your business needs.
- Resize resources. It can be either modifying resources or adjusting the number of running instances.
- Automate the above process to maintain the right sizing of the resources.
The table below summarizes various types of EC2 instances and associated features.
|Type||Description||Price||When to use|
|On Demand Instances||Pay for the compute capacity only when the instance is running.||$$$$||Dev, Test, and Prod Environments;
|Reserved Instances||Pay for the compute with the discount since you commit, for instance, usage for 1 or 3 years.||$$$||Prod Environments
|Spot Instances||Bid for the unused compute capacity of the AWS.||$$||Workload processing which are flexible, no critical time deadline, and can handle interruptions.|
|Dedicated Hosts||Pay for dedicated physical servers to run instances. We get control of the physical server.||$$$$$$||Applications with licensing requirements.
To meet compliance requirements.
Need control over the underlying hardware.
|Dedicated Instances||Run the instances on the dedicated physical server. We don’t have control over the underlying physical server.||$$$$$||Applications with licensing requirements.
To meet compliance requirements.
Additionally, choosing Amazon’s Savings plan on EC2 can give you significant cost savings (72%) in exchange for a commitment of 1-3 years of period.
Use the right storage classes
AWS EC2 instances often use Elastic Block Storage (EBS) to persist data. Pricing of volumes usually depends on the size and type of EBS volumes. Depending on performance characteristics, size, I/O, region, etc., the pricing for EBS differs.
The table below summarizes various types of EBS volumes along with the use case.
|EBS Type||Description||Price||When to use|
|General purpose SSD||This is the default volume provided to the AWS EC2. It has high performance (IOPS) since an SSD drive.||$$$||Workloads that require a balance of price and performance. Can be used in boot volumes or small and mid-sized databases.|
|Provisioned IOPS SSD||This has the highest performance and most expensive SSD drive.||$$$$||High-performance workloads where you need very low latency and very high throughputs. For e.g., Heavy Transactional databases and high-performance computing.|
|Throughput Optimized HDD||This is a regular HDD with a lower cost compared to SSD drives but specially designed for throughput-intensive workloads.||$$||Sequential workloads such as ETL, data warehouses, log processing, and data analytics where you need high throughput.|
|Cold HDD||This is the lowest-cost HDD for less frequently accessed workloads.||$||Infrequent access workloads, such as backups, disaster recovery, and long-term archival storage.|
Similarly, there are various classes of S3 storage that offer lower prices by compromising on aspects like availability, redundancy, and retrieval of historical data. Depending on the consumption patterns, move data to colder storage solutions, which in turn reduces the storage costs over a period of time.
Leverage AWS Cost Explorer for analysis
AWS Cost Explorer is a tool for cost analysis that is capable of displaying cost division based on service in a granular and detailed manner. Use cost explorer to track spending over time and identify areas of cost optimization.
Generate custom reports to provide insights into cost drivers, usage patterns, and trends. As a best practice, it is always suggested to use Tags while provisioning various cloud resources for billing purposes. Cost Explorer is a tool where those tagging efforts culminate into meaningful reports and actions.
Acknowledging the fact that there are a vast number of AWS services, in this section, we explore some tips to optimize costs for some of the commonly used services.
Amazon S3 is an object storage service that facilitates users to implement use cases like data backup and restore data archival, data lake, and other enterprise applications storage. S3 stores data in buckets that are similar to the directories in our local system. The actual data we store in S3 as a fundamental unit is called an object which is similar to the files.
Storage classes are one of the primary aspects that impact Amazon S3 pricing. Each object is associated with the storage class. It is important to identify which storage class should be assigned to the object.
|Storage Class||Description||Price||When to use|
|S3 Standard||This is a general-purpose and default storage class.||$$$$$$$||Very frequent access to the data is needed with low latency and high durability.|
|S3 Standard-IA||Infrequent Access to the objects, but when accessed, it needs the same low latency and high throughput as S3 standard.
Is up to 48% lower cost than S3 Standard.
|S3 One zone- IA||Same as S3-IA but stores data only in 1 availability zone.
Is up to 20% lower cost than Standard-IA.
|$$$$$||Data that is infrequently accessed and does not need availability and resiliency.|
|S3 Intelligent -tiering||Monitors your data access patterns and moves the objects in the different tiers.
||$$$$||For unpredictable data access patterns and yet we want to achieve cost optimization rather than adding all of the data into S3 standard.|
|S3 Glacier Instant retrieval||Upto 68% lower cost than S3-IA||$$$||For long-term data archival, where it is rarely accessed but requires retrieval in milliseconds.|
|S3 Glacier Flexible retrieval||Is up to 10% lower cost than S3 Glacier Instant Retrieval||$$||Hardly data is accessed once or twice a year but requires retrieval in minutes and hours.|
|S3 Glacier Deep Archive||Cheapest storage class where you want to store data as digital preservation for almost 7-10 years. Data from this class takes almost 12 hours to retrieve.||$||Hardly accessed once or twice a year.
Need to store the data for a very long-term retention policy.
Additionally, when unsure about the data access pattern and volume of the data, then use either S3 analytics or S3 storage lens to get insights into the usage. This will help us identify the data that should be in a particular storage class and when.
We can also leverage the S3 lifecycle rules to make the automated transition of your data based on these insights.
Some practical tips for optimizing the usage of EBS are listed below.
- Unattached volumes – After terminating EC2 instances, you might not need their attached volumes. Unattached volumes will incur the same cost as when it was attached to the EC2 instance. It is better to remove such volumes when no longer needed. The best practice would be to take its backup as a snapshot and then remove the volume.
- Snapshot policies – If the EBS volume is hosting a database, it creates many incremental snapshots. These snapshots may create high costs for your AWS S3. It would be better to have some retention policy where we can delete the older snapshots to save S3 cost.
Optimizing data transfer charges
Services like Amazon EC2, Amazon RDS, and Amazon S3 have no inbound data transfer free. However, outbound data transfer is chargeable. Always monitor the amount of data transferred from these resources to the public internet and limit the same where it makes sense.
If the outbound data is static and repetitive, consider using the Amazon Cloudfront. Amazon CloudFront is a CDN service designed for caching and serving static content, which results in the outbound costs being lower as compared to EC2 machines.
AWS also charges for the data transfer between regions and between availability zones. These are important considerations while designing and architecting systems to be deployed on cloud.
Optimizing RDS costs
AWS RDS costs are optimized based on factors like
- Right-sizing the instance by choosing appropriate CPU, memory, and storage requirements.
- Using read replicas help offload read traffic from primary database instances.
- Writing efficient queries and improving database schemas also help in the processing and retrieval of data sets.
- Use reserved instances to get discounts wherever possible.
- Use database engine features appropriately, like backups, AZ, and read replicas.
Optimize DynamoDB costs
AWS DynamoDB costs can be optimized based on factors like
- Choosing the appropriate capacity (On-demand or provisioned, or reserved capacity)
- Choosing the right table class (Standard or Standard infrequent access)
- Using TLL feature to automatically delete expired data,
- Reviewing secondary index usage and deleting any unused or unnecessary indexes
- Reviewing backup retention periods.
Infrastructure as code (IaC) is a way to manage and provision our resources using code. Terraform, Pulumi, and AWS CDK are some examples of IAC. Adopting IAC for managing cloud resources will have several benefits, a few given below:
- IaC allows us to automate the provisioning, configuration, and management of our resources. This automation can help you reduce the time and effort required to manage your infrastructure, resulting in cost savings.
- Versioning of infrastructure is possible by adopting IaC.
- Use templates to standardize. Ensure that our infrastructure is consistent across environments. This can help us reduce errors and increase efficiency, resulting in cost savings.
Serverless architecture is a way to run our applications without managing the underlying infrastructure. The infrastructure management is owned by AWS. AWS cloud services typically used in serverless designs are AWS lambda, AWS API gateway, AWS DynamoDB, AWS SNS, AWS step functions, AWS S3, AWS Cognito, etc.
Leveraging serverless helps us reduce operational costs to a greater extent. Moreover, with serverless computing, we are only charged for the duration it takes for our code to execute, providing a more cost-effective pricing model compared to paying for idle instances or containers.
This does not mean that serverless is always the better solution to using EC2 instances or containers. It is a paradigm that needs rethinking of infrastructure and redesigning of solutions.
- Identify the type of EC2 instance based on the requirement. Provision reserved instances for production (or sub-production) environments.
- Opt for the AWS savings plan, which also provides great discounts (72%) on Amazon EC2, Amazon Fargate, and Amazon Lambda.
- Shutting down or terminating, dev, and testing EC2 machines when not in use. (After working hours and on weekends)
- Implement automating right sizing of resources by leveraging auto-scaling and CloudWatch events.
- Use appropriate S3 storage classes for the objects. Use S3 lifecycle rules for the transition of the rarely accessed data to lower-cost tiers.
- Delete very old snapshots of the EBS. Should have the lifecycle for snapshots
- Make sure to destroy unattached EBS volumes. Taking snapshots before destroying them.
- Releasing unattached elastic IP addresses after terminating EC2 instances.
- Make cloud optimization part of your DevOps lifecycle.
- Use serverless solutions for low-traffic applications and short-duration tasks.
- Avoid data transfer between AZs and regions. Keep services in the same region and AZ.
- Provision Amazon CloudFront distributions for repetitive and static outbound data.
Below are a few tools that are used for cost optimization
- AWS Trusted Advisor – It evaluates the AWS account and provides a way to optimize infrastructure, improve security, reduce costs, and monitor service quotas.
- AWS anomaly detection – By leveraging advanced machine learning technology, AWS Cost Anomaly Detection is capable of identifying unusual spending trends and their underlying causes, empowering teams to respond promptly.
- AWS Cost Explorer – It provides an interactive dashboard to visualize and manage AWS resources, enabling users to view, analyze and optimize infrastructure for cost and usage.
- AWS Compute Optimizer – It provides recommendations to optimize EC2 instance usage based on historical usage patterns.
- AWS Pricing Calculator – It helps us estimate our AWS costs by specifying the services information in detail.
- AWS autoscaling – It allows us to automatically adjust the capacity of AWS resources based on the demand of your application.
- AWS cloud watch – It captures and presents real-time logs, metrics, and event data on automated dashboards to optimize infrastructure and application management.
- AWS Lambda Power Tuning – It helps optimize the performance and cost-effectiveness of AWS Lambda functions by using machine learning to analyze and optimize a Lambda function’s configuration parameters, such as memory allocation, timeout settings, and concurrency levels.
- Infracost – Infracost is a 3rd party service that allows users to calculate the cost of cloud resources before they are deployed. This is a key insight from the cost optimization perspective. It is integrated with Spacelift and is available in the free tier!
Spending a huge budget on AWS resources is a concern faced by enterprises. In this post, we covered various scenarios where we discussed various approaches to limit our expenditure on AWS bills by leveraging the flexibility of resource consumption and its association with the pricing that AWS provides.
We also delved into some of the best practices and tools which help us with cloud cost optimization. Note that the points discussed here are not exhaustive. The use cases vastly differ for every organization. It is important to observe these principles all the time to reach an optimum cost of the infrastructure supporting any business service. It is not a one-time activity.
Does your organization have extra compliance concerns? Here you can learn more about self-hosting Spacelift in AWS, to ensure your organization’s compliance, control ingress, egress, internal traffic, and certificates, and have the flexibility to run it within GovCloud.
The Most Flexible CI/CD Automation Tool
Spacelift is an alternative to using homegrown solutions on top of a generic CI. It helps overcome common state management issues and adds several must-have capabilities for infrastructure management.