Monthly musings with Marcin Wyszynski, technical co-founder of Spacelift

📨 Join the Newsletter

General

5 Infrastructure as Code Security Issues & How to Fix Them

how to fix iac security issues

Infrastructure as code promises repeatable deployments and faster delivery, yet it introduces security patterns that can quietly undermine both goals. The most frequent problems surface during audits or incident reviews, when teams realize they cannot explain who changed what or why production no longer matches the repository. 

This article addresses five frequently recurring issues and outlines practical ways to reduce risk with tools you likely already use. The guidance is vendor-neutral and applies whether you rely on Terraform, Spacelift, or another platform that supports policy and automation.

Key questions every IaC team should answer

Before diving into solutions, confirm that your current setup can answer these questions without delay:

  • Who changed the production database configuration last Tuesday, and what was the reason?
  • Do any environments show configuration drift compared to the code in version control?
  • Can you list all access controls and identify who can access what?
  • Are there any hard-coded secrets in your infrastructure code repositories?

If you cannot answer every question with confidence, you likely have audit gaps that need attention.

If you would prefer this walkthrough in video form, you can watch the original here:

iac security issues and how to fix them

1. Detect and remediate configuration drift

Configuration drift occurs when console changes, emergency hotfixes, or undocumented updates cause the environment to diverge from source control. 

The impact is subtle at first and then becomes visible as open security groups, missing tags, or failing compliance checks. Teams often notice the problem during an incident when they realize that production no longer aligns with the plan they reviewed.

Schedule drift detection that compares the desired state in code with the actual state in the environment. Most platforms can run these checks on a regular cadence, produce a focused list of drifted resources, and link each finding to the code that last managed it. 

For example, in Spacelift, you can enable this by going to Stacks → Scheduling to add a drift-detection cron, then review Resources → filter: Drifted to focus remediation.

Once you have that view, make a conscious choice for every item. Either accept the change and update the code or roll the resource back to the approved configuration.

Pair detection with guardrails that reduce the chance of future drift, so you are not chasing the same issues each week.

Checklist:

  • Enable scheduled drift detection across all production stacks.
  • Triage drifted resources regularly, and update code or roll back changes.
  • Add tags and owners so drift alerts route to the right people.

2. Implement policy as code

Without policy as code, the path to production depends on individual judgment and institutional memory. That approach works until someone launches an oversized instance, exposes a storage bucket, or deploys untagged resources that are not reflected in cost and security reports. 

A small set of clear, automated policies prevents these outcomes and keeps reviews focused on meaningful decisions rather than policing basics.

Open Policy Agent is a common choice because it evaluates plans before apply and returns precise messages that developers can act on. If you’re using Spacelift, you can jump-start with reusable OPA templates and attach them to stacks with minimal customization.

Start with a small number of rules that offer high value and low friction. Prevent public storage unless explicitly allowed, restrict instance families to approved options, require tags that drive ownership and budgeting, and avoid IAM users and access keys entirely. Enforce short‑lived, federated credentials (SSO/OIDC) instead.

As adoption grows, expand the library and tune messages so developers understand exactly what failed and how to correct it.

OPA quick path from plan → JSON → policy check (works anywhere OPA runs):

terraform plan -out=tfplan
terraform show -json tfplan > plan.json
# Evaluate against your Rego policies; non-empty "deny" indicates failures
opa exec --decision deny plan.json

Useful starter policies:

  • Block public S3 buckets unless explicitly approved.
  • Enforce instance type rules to prevent the creation of oversized or unsupported instances.
  • Require mandatory tags such as owner, environment, and cost center.
  • Require short‑lived, federated credentials (e.g., SSO/OIDC), and deny IAM access key creation.

3. Build complete audit trails

Auditors and incident responders ask similar questions, albeit with varying urgency. They need to know who changed a resource, what changed, how the change was approved, and whether related policies were evaluated.

A complete audit trail answers those questions without stitching together screenshots and chat logs. Centralize events from the IaC platform, cloud providers, version control, and ticketing systems, so that you can reconstruct change history with confidence.

Good audit trails do more than satisfy compliance. They make everyday work easier because engineers can trace a resource back to the plan, see the diff, and understand the reason for the change. 

Export logs to durable storage, retain them for the required retention period, and provide scoped read-only access to the people who need it. The result is faster audits and fewer meetings that pull engineers away from delivery.

Checklist:

  • Enable platform-level audit logging for runs, approvals, and policy decisions.
  • Retain logs in accordance with your compliance policy and export to durable storage.
  • Provide auditors with scoped, read-only access to view logs and artifacts.

4. Enforce role-based access control

Overbroad permissions increase the cost of mistakes and slow down investigations. A simple role model solves much of the problem. Provide read-only roles for viewing plans, logs, and resources, contributor roles for triggering runs in lower environments, and tightly scoped administrative roles for approving changes or overriding policies. 

If your platform supports environment scoping, give developers freedom to experiment in development while protecting production. 

In Spacelift, use Spaces to separate Dev/Staging/Prod and apply different role sets. With Advanced Access Control (AAC), you can compose granular permissions like approve run, override policy, or manage secrets to match real responsibilities.

Larger organizations often benefit from advanced access control that breaks actions into granular permissions, allowing you to define roles that match real responsibilities rather than broad categories.

Checklist:

  • Define standard roles for read-only, contributor, and administrator responsibilities.
  • Scope access by environment, so development and staging remain open while production stays restricted.
  • Require just-in-time elevation for emergency production changes with strong approval and logging.
  • Map granular permissions to tasks such as approve run, override policy, and manage secrets.
  • Review role assignments quarterly and remove access that is no longer needed.

5. Remove hard-coded secrets from repositories

Hard-coded secrets are still one of the fastest routes to compromise. Credentials slip into repositories during local testing, quick fixes, or automated changes that alter repository settings. Attackers monitor public sources for these leaks and act quickly when they find them. 

Use a dedicated secrets manager, such as HashiCorp Vault or a cloud-native equivalent, to reference values through short-lived tokens and rotate them on a defined schedule. Add secret scanning to your CI process so merges fail when sensitive strings appear, and review findings as part of regular hygiene rather than one-time cleanups.

Checklist:

  • Store secrets only in a managed secrets service.
  • Enable automated secret scanning on push and during pull requests.
  • Rotate credentials and prefer short-lived access where possible.

Key points

A steady, incremental approach builds confidence with auditors and reduces incidents in production. Start with visibility, add guardrails, and then refine controls as your team matures.

  • Turn on scheduled drift detection and review results weekly.
  • Add a minimal policy set and expand it quarter by quarter.
  • Centralize audit logs and validate that you can answer common audit questions.
  • Define and enforce roles by environment with least privilege as the default.
  • Avoid IAM users and access keys, and instead adopt short‑lived, federated credentials.
  • Remove hard-coded secrets, and deploy continuous secret scanning.

If you want an IaC platform that bakes in drift detection, policy as code, audit logging, and granular RBAC, create a free Spacelift account today or book a demo with one of our engineers.

Solve your infrastructure challenges

Spacelift is a flexible orchestration solution for IaC development. It delivers enhanced collaboration, automation, and controls to simplify and accelerate the provisioning of cloud-based infrastructures.

Learn more

The Practitioner’s Guide to Scaling Infrastructure as Code

Transform your IaC management to scale

securely, efficiently, and productively

into the future.

ebook global banner
Share your data and download the guide