Managing Infrastructure as Code at Scale

45.how_to_manage_infrastructure_as_code_at_scale

Scalability and IaC are not always a match made in heaven. There are several areas within the development environment, and associated tooling must be seriously considered to ensure effective “management” of IaC to assure scalability. 

This blog post will touch upon a few critical areas that can negatively impact IaC when provisioning and deploying complex, large-scale infrastructures. 

IaC Code Platforms

The first area to understand is IaC code platforms. Terraform is the de facto standard and most widely used IaC coding platform by far; several alternatives, such as Pulumi, Cloudformation, and Ansible, are available. Each of them has its relative strengths and weaknesses. While we are huge fans of Terraform, we are also pragmatic about its shortcomings, limiting the ability of a developer to manage IaC at scale cleanly. Below we highlight some of these weaknesses that you need to consider:

  • Resource management: Running the terraform state mv command every time a changed resource identifier is painful and time-consuming. It is further complicated by a large codebase, as it is challenging to maintain clean code and evolve over time
  • TF State: Terraform tracks the current state of managed resources using plan and apply. State is very fragile and can be impacted, for example, by a stale state file or forgetting to run “terraform remote config”
  • Local machines: Each action is manual and must be performed directly by the developer, leading to multiple people working on the same codebase. Developer one applies their changes to the environment and initiates a pull request. At the same time, developer two would like to deploy their changes but is blocked as the codebase is not up-to-date with developer one’s changes. Applying the changes in this state would cause damage to the resources created in the previous deployment. Working in larger teams will often result in wasted time just waiting and rebasing the codebase to apply changes.
  • Security: Devs working with the codebase need access to the Terraform state, which creates a security concern due to how Terraform state works. Pulling sensitive data from an external system such as Vault will be stored within the state file in plaintext once used in a resource, so a threat actor could run `terraform state pull` to access all the secrets
  • CLI Interface: Typos and other types of errors are common, resulting in misconfigurations or failed plans, and applies
  • Infrastructure Fine-Tuning: It is difficult to interface with subsets of the infrastructure to make small changes dynamically. Terraform works well for bulk deployments but lacks the delicate touch necessary for fine-tuning infrastructures
  • Rollback: Terraform does not support rollback to a previous state due to limitations of Terraform plan, which means that the only option is to roll forward

Is CI/CD Really Necessary?

The short answer is yes! IaC development can be significantly enhanced using CI/CD processes and workflows like applications or tool development. Using CI/CD to provide workflows for infrastructure provisioning and deployment offers the following benefits: 

Consistency:

  • Code is controlled – well documented and optimized
  • Code is modular – reused across multiple projects

Testing:

  • Testing is integrated – cannot be skipped
  • It is far simpler to test code in small bites – find issues faster not hunting for a needle in a haystack – 100 loc vs. 10K loc

Automation:

  • Routing of change approvals – single or multiple sign-offs
  • Establishes a set of controllable mechanisms – gates for events, etc.

A Complex Infrastructure Example

The below diagram provides an example of a Fintech application and supporting infra implemented in AWS. This solution is used by financial institutions, banks, or specialized lenders to manage their Asset Based Lending portfolio and deliver their services to their clients.

Fintech ABL Application and Infrastructure

In this example infrastructure, each of the infra resources designated with the TF logo has Terraform providers, or HCL code is available to configure the infrastructure.

Provisioning and deploying for new clients is complex as it has to meet their individual compliance requirements and banking and regulatory rules, regulations, and standards. And what about a foreign client bank? Banks must meet completely different banking and regulatory rules and compliance standards. And of course, overarching everything is stringent cybersecurity requirements, which are necessary in many cases as the application has to be able to securely interface with external core banking systems for daily updates on loan status with each customer.

Imagine duplicating this infrastructure each time a new client is onboarded using a non-IaC optimized CI/CD solution.

 

IaC and CI/CD

As we mentioned above, a CI/CD solution is strongly recommended for provisioning and deploying IaC based infrastructures. One of the most widely used CI/CD solutions is Jenkins. However, Jenkins and other generic CI/CD solutions were designed and optimized for application-centric workflows, have limitations that will prevent scalability when implemented in support of IaC specific development. Some of these issues include:

  • IaC is a stack-based technology, and generic CI/CD tools do not understand stacks
  • No unified toolset or framework for policy management requiring manual checks
  • Lack of built-in synchronization resulting in manual updates and synchronization
  • Inability to provide an automated workflow across multiple repositories and stacks
  • Cannot visualize or provide cost tracking of resources, drift detection, and remediation
  • In most cases, SSO and SAML are plugins – just another component to configure, manage and update
  • Conflicting pull requests where one developer is already working on a piece of code and another developer is also making changes, and this could result in code discrepancies

Generic CI/CD solutions check all the boxes to support implementing a well-defined development process adding end-to-end workflows allowing DevOps teams to deliver better applications faster, but not for IaC.

 

Accountability and IAC

What do we mean by accountability? It is analogous to a business having a set of accounting books that represent the financial condition of the company at present, but also the financial history. Without this history, it is impossible to trace back and establish accountability, to understand when financial decisions and changes were made, the actions taken, and by whom and for what purpose.

Accountability is also applicable to your infrastructure as it enables you to understand the present state and how you arrived at this point. 

Infrastructure accountability provides checks and balances to ensure that you can account for all infrastructure-related events. From the first line of code to production,  you need to know and understand every event that has or will affect the infrastructure to ensure that process integrity exists and, by extension, proof of compliance, both internally and externally. Some of the accountability questions that you have to be able to answer are:

  • Code History:
    • What is the history of code?
    • Who made changes to code and infrastructure.
    • Do you know if access to your repo is secure? 
  • Change Implementation:
    • Has a change has been fully implemented?
    • Can you validate and prove that it was deployed?
    • Are paper-based policies and regulatory compliance requirements being met? Can you prove it?
  • Change Approval Process:
    • What is the change approval process?
    • Are all changes approved, no matter how small or innocuous they seem?
  • Auditability:
    • Are you prepared for an audit?
    • Do you have the supporting materials necessary to support security and compliance audits?

What types of information do you need to answer the above questions?

  • Provenance: The ability to trace every piece of code back to inception and any changes made
  • Compliance: Provide the detailed information needed to ensure compliance with policies and procedures and the ability to provide complete documentation for an audit
  • Access: Who has access, what do they have access to, what are they allowed to do, when are they allowed to do it, why are they doing it
  • Visibility: Understand the state of the infrastructure and resources at any point in time

Fintech Example Security Overlay

The Complex Infrastructure Example above gave an overview of a current ABL Fintech solution. This diagram represents the security that is provisioned and deployed on top of the infrastructure for the application.

Fintech ABL Security Overlay

In many cases, individual clients require VPCs for their implementation, with multiple layers of protection, as they may have different security requirements or their specific vendor solutions in place. It is especially relevant if financial institutions have product offerings beyond ABL that have strict regulatory or industry compliance requirements based on their geographic location.

The Terraform logo highlights those security-specific resources supported in Terraform and can be defined and provisioned.

The ability to control the following security aspects for; access, source code, changes, and deployments and validate it all is not achievable without embedded policy guardrails in place. They are essential to achieve secure infrastructure provisioning and deployments. Without policies in place, the resulting accountability due to those policies, the probability of a bank passing an internal or external audit is very low.

Scalability Doesn’t Have to be Difficult

Now that we’ve covered code platforms, CI/CD solutions, and accountability to help you understand the significant issues that stand in your way of IaC scalability nirvana. Let’s take a look at what could help you to manage IaC at scale without workarounds or a manual or partially automated development process: 

Automation: Covers the end-to-end IaC workflow from repo to production

  • Integrated with source code repos such as Github to trigger IaC workflows when committing new or revised code
  • Create a library of pre-configured infrastructure environments, enabling sharing across teams and projects

Policy as Code: provides immediately enforceable guardrails

  • Declare rules around infrastructure, access to code and stacks, workflow, state changes, set triggers 
  • Access controls for the 4Ws
  • Multi-layer change approvals and apply after approval
  • Audit support, SSO (SAML2.0) fully integrated 

Collaboration: Adaptable to existing workflows with no force-fitting or changes required or fully customized to meet goals

  • Coordination of all runs with the ability to review, comment and iterate across teams
  • Prevent conflicting changes and reconcile the differences automatically

Resource Management: understand resources and their state

  • Plan out changes
  • Drift detection and remediation,
  • Cost of infrastructure

In addition to the above, below are some additional vital capabilities that an ideal, purpose-built orchestration and management solution for IaC should also provide at a minimum. Starting from the left of the diagram:

  • Create a pre-built library of infra components 
  • Support of mono or multiple repos with version control
  • Able to support multiple projects simultaneously
  • Integrating and applying policy as code in concert with IaC configurations ensures compliance and security initiatives are met

Conclusion

You can take positive actions to overcome the scalability issues of IaC code platforms, generic CI/CD solutions, and accountability to prevent your infrastructure from inhibiting your future business initiatives and growth.

You should be realistic and find an IaC orchestration and management solution that enables your development process to scale. 

And finally, you need to be aware that proactive management and control over compliance and audit objectives are necessary and can be covered through Policy as Code integrated into your IaC development, security, provisioning, and deployment workflows.

If you are concerned about achieving IaC scalability, try Spacelift! You can overcome code platform issues, enable accountability with a CI/CD solution built from the ground up for IaC. Control the Process, Visualize Resources and create Workflows that Work using Spacelift as your IaC orchestration and management solution. 

The most flexible management platform for Infrastructure as Code

Spacelift is a sophisticated SaaS product for Infrastructure as Code that helps DevOps develop and deploy new infrastructures or changes quickly and with confidence.

Start free trial

Share this post

twitter logo