How to effectively prove compliance in a multi-cloud, multi-IaC world

➡️ Register now

AWS

AWS CloudFormation Drift Detection & Remediation Guide

cloudformation drift detection

🚀 Level Up Your Infrastructure Skills

You focus on building. We’ll keep you updated. Get curated infrastructure insights that help you make smarter decisions.

Infrastructure as code (IaC) has become the gold standard for managing and deploying cloud resources. 

AWS CloudFormation is an IaC service that helps users automate, scale, and manage their environments efficiently. On the other hand, GitOps has become one of the standard ways of ensuring the IaC configuration stored in code repositories is deployed live on the correct systems. 

Although combining IaC and GitOps seems straightforward, it can lead to provisioning inconsistencies and complicated operations. 

In this blog, we will analyze infrastructure drift, explain how you can detect drift with CloudFormation, discuss remediation strategies, and examine best practices.

What we’ll cover:

  1. Understanding infrastructure drift
  2. What is drift detection in AWS CloudFormation?
  3. How do you resolve CloudFormation drift?
  4. Common challenges in resolving CloudFormation drift
  5. Best practices for drift prevention and remediation in CloudFormation

Understanding infrastructure drift

Infrastructure drift occurs when the actual live configuration of your AWS resources differs from their expected configuration as defined in CloudFormation templates. Examples of a drift include a difference in virtual machine image sizes or a property that still appears on the template after it has been deleted.

In CloudFormation, stacks are collections of AWS resources that you can manage as a single unit. A stack is considered drifted if one or more of its resources have drifted from their template definitions. This deviation creates inconsistencies that can affect your infrastructure’s predictability and management.

Common causes of drift

In the world of infrastructure management, drift typically happens when users make changes outside of CloudFormation’s workflow. A few examples of these include:

  1. Manual user changes: Users may bypass IaC automations and modify resources directly through the AWS Management Console or via the AWS CLI. This could be due to an operational event, some action that needs to be performed immediately to address an issue, or even an accident. Examples include manually adjusting networking rules, security groups, or autoscaling configurations.
  2. Third-party tools and AI agents: DevOps and platform teams often set up external automation tools or scripts that modify AWS resources independently of CloudFormation’s lifecycle. With the rise of AI Agents, it’s even easier to set up agentic flows that respond to specific events and attempt to auto-remediate issues. Although these solutions offer operational overhead improvements, they could also introduce infrastructure drift.
  3. Delayed or failed deployments: The IaC code merged into your branches declares your intent to configure your environment, but your infrastructure could be in a drift state until that code has reached production. Drift is introduced accidentally when deployments fail, or they get deprioritized or are simply forgotten.

Read more: Top 7 Causes of Infrastructure Drift

What is CloudFormation drift detection?

Let’s look at how drift detection works in CloudFormation

CloudFormation drift detection is a feature that checks whether resources in a CloudFormation stack match the configuration defined in its template. The service analyzes each resource that supports drift detection and identifies any properties that differ from their template definitions, including deleted resources.

Image source: CloudFormation drift detection

When you initiate drift detection, CloudFormation performs the following process:

  1. Resource analysis: Examines each resource in the stack that supports drift detection
  2. Property comparison: Compares actual resource properties with expected template values
  3. Status assignment: Marks resources as IN_SYNC or DRIFTED based on the comparison results: Resources that don’t support drift detection get a drift status of NOT_CHECKED. Resources that have been deleted are marked as DELETED.
  4. For the drifted resource properties, CloudFormation assigns the ADD value for additions, REMOVE for property removals, and NOT_EQUAL for properties that differ.
  5. Report generation: Provides detailed information about drift status and specific property differences

Image source: CloudFormation drift example

The operation can take several minutes, depending on the number of resources in your stack. 

Advanced capabilities of CloudFormation drift detection:

  • Stack-level detection: You can detect drift across an entire stack, which provides a comprehensive view of all resource deviations. To initiate this, select your stack, choose Stack actions, and then Detect drift.
Detect drift on an entire CloudFormationd stack

Image source: Detect drift on an entire CloudFormation stack

  • Individual resource detection: Target specific resources for drift detection when you need to verify particular components without scanning the entire stack.
  • StackSets support for multi-region, multi-account detection: CloudFormation extends drift detection to StackSets, allowing you to identify drift across multiple accounts and regions. In this case, CloudFormation performs drift detection on each stack instance in the StackSet.
  • Integration with AWS Config: AWS Config is a service that helps with AWS configuration auditing, assessment, and evaluation on live environments. You can leverage AWS Config Rules to automate drift detection and create compliance checks that trigger when drift occurs. Check the cloudformation-stack-drift-detection-check managed rule for more details on how to set this up.

How do you resolve CloudFormation drift?

To resolve CloudFormation drift, you need to update the stack to match the expected template configuration or manually revert out-of-band changes in the underlying resources.

Manual remediation

The most straightforward approach involves reverting drifted resources to their template specifications through the console, CLI, or APIs. This method works well for one-off changes or when precise control over the remediation process is needed.

Template updates

Sometimes the best approach is to stop the drift by updating your CloudFormation template to match the live new configuration and then perform a stack update.

Resource import

If you made changes to a resource that requires replacement, you might accidentally recreate the resource during the next stack update. 

When you want to retain the existing resources, you can use the import feature to update the live resource and resolve the drift status.

Image source: Import resources into the stack to resolve drift

This process involves:

  • Adding a DeletionPolicy: Retain to the resource
  • Removing the resource from the template and performing a stack update
  • Updating the template to match the actual resource state
  • Importing the resource back into the stack with its current configuration

Automated remediation

You can implement automated solutions using services such as Amazon EventBridge and AWS Lambda functions to detect and remediate drift automatically. This approach monitors for drift events and triggers remediation workflows without manual intervention. 

Automated remediation could also be combined nicely with agentic AI using services such as Amazon Bedrock AgentCore. Check this guide for some inspiration.

Automatic drift remediation solution architecture

Common challenges in drift detection

These challenges can lead to false positives, delayed detection, or increased resource usage:

  1. Resource support limitations: Not all AWS resources support drift detection. Resources that don’t support drift detection get a drift status of NOT_CHECKED. Check the Resource type support documentation for a list of resources that support drift detection.
  2. Nested stack complexity: CloudFormation doesn’t detect drift on nested stacks automatically. Managing drift across nested stacks requires individual detection operations on each stack, increasing operational overhead.
  3. False drift detection: Functionally equivalent resource properties can appear different in the template and actual state. AWS provides property transforms to handle these scenarios. The AWS documentation on preventing false drift detection results for resource types contains more information.

Best practices for drift prevention and remediation

Combining preventive controls with automated remediation reduces operational risk and ensures infrastructure remains predictable and secure.

  1. Establish strict IaC policies and workflows: Establish clear guidelines for all infrastructure changes to go through CloudFormation templates. This prevents manual and accidental modifications that cause drift.
  2. Set up automated drift detection: Use AWS Config rules to continuously monitor for drift and trigger alerts when deviations occur. Configure EventBridge to respond to drift detection results and notify your team immediately. Check out this guide and architecture below:
Automatic drift detection alarm solution architecture

Image source: Automatic drift detection alarm solution architecture

  1. Use change sets for updates: CloudFormation best practices recommend always creating and reviewing CloudFormation change sets before applying updates, to understand exactly what will change. This practice helps prevent unintended modifications.
  2. Regular drift detection scans: Implement scheduled drift detection checks to proactively identify deviations before they cause operational issues.
  3. Document remediation procedures: Create clear runbooks for common drift scenarios. This ensures consistent handling across your team and quick issue remediation when issues appear.

How to detect and remediate drift with Spacelift

The most important remedy for drift is to prevent it as much as possible. 

Spacelift helps you avoid drift with features such as:

  • Policy as code minimizes human error and unauthorized changes.
  • Spaces ensure RBAC is implemented and partial admin rights can be granted to avoid frustration.
  • Custom inputs allow you to integrate security vulnerability scanning tools into your workflows to minimize vulnerabilities and define custom policies for them.
  • Blueprints enable self-service infrastructure that helps avoid drift by ensuring all resources respect the governance mechanisms you have implemented. User-friendly templates mitigate frustration.
  • Contexts are reusable containers for your variables, files, and lifecycle hooks that can be attached to multiple configurations, minimizing the potential for error considerably.
  • Cloud integrations: Dynamic and short-lived credentials for AWS, Azure, and GCP ensure that no accounts or roles are targeted by mistake.

In some cases, drift cannot be avoided, which means a detection mechanism is required to overcome these issues. Spacelift offers a drift detection and remediation mechanism that can be easily leveraged for infrastructure drift.

You simply define a schedule to check your infrastructure for drift automatically. This periodical scan will check the current state of your infrastructure against the IaC configuration. 

Regardless of how many configurations you have, the Spacelift Resources View and dashboard features enable instant visibility of any resources that have drifted. Instead of checking every configuration for drift detection results, you simply check a single screen to see the status.

If you enable drift remediation, a remediation job will start returning your resources to the state you had in your IaC code as soon as drift is detected. 

Alternatively, you can disable auto-deploy, check the remediation plan, and decide whether to return the infrastructure state to what it was before by applying the plan — or discard the remediation run,  make the changes to your code, and push them in your VCS. By checking the plan, you will know exactly what has changed, making it easy to decide whether to implement those changes to your code or simply revert them.

To take your infrastructure automation to the next level, create a Spacelift account today or book a demo with one of our engineers.

Key points

Even with strong governance, infrastructure drift is inevitable and will happen due to operational processes, accidents, and emergency responses. In this blog, we discussed what IaC drift is, how it happens, and what you can do about it. 

Planning for drift detection and remediation is more effective than trying to prevent it entirely.  Implement automated drift detection and alerting to catch deviations quickly. Manual detection processes don’t scale effectively across large infrastructures, so automate where possible.

Detect and remediate drift with Spacelift

Drift happens, so let Spacelift deal with it. Spacelift provides drift detection capabilities to any IaC provider to enable the desired state for application infrastructure across teams, applications, and clouds.

Learn more

Frequently asked questions

  • What types of resources support CloudFormation drift detection?

    CloudFormation drift detection supports most AWS resource types, but not all. It works with core services like EC2 instances, S3 buckets, IAM roles, Lambda functions, CloudWatch alarms, and many more. However, some newer or less commonly used resource types may not be supported. Unsupported resources will be marked as “NOT_CHECKED” during drift status checks.

  • What is the difference between drift in Terraform and CloudFormation?

    With Terraform drift, the actual infrastructure differs from the state file, whereas CloudFormation drift means the stack resources no longer match the template configuration.

    Terraform requires external tools or manual state comparison, whereas CloudFormation has built-in drift detection capabilities.

  • Why is it important to detect drift in CloudFormation stacks?

    Detecting drift in CloudFormation stacks is essential to ensure the deployed infrastructure matches the defined stack template. Drift can indicate unauthorized changes, manual edits outside of CloudFormation, or configuration inconsistencies.

  • What if a resource shows NOT_CHECKED?

    In AWS CloudFormation, a resource with status NOT_CHECKED usually appears during a stack drift detection operation. It means CloudFormation skipped evaluating that resource type for drift, either because it does not support drift detection for that type or because the resource is nested within another template. 

The Practitioner’s Guide to Scaling Infrastructure as Code

Transform your IaC management to scale

securely, efficiently, and productively

into the future.

ebook global banner
Share your data and download the guide