Terraform

Terraform Datadog Provider – How to Manage & Examples

Managing Datadog with Terraform

In this article, we will take a look at the cloud-based monitoring and analytics platform, Datadog and how to manage it using Terraform. We will look at a few examples of how to create Datadog monitors and monitor automation using Terraform, and the datadog Terraform provider options.

We will cover:

  1. What is Datadog?
  2. How to manage Datadog with Terraform
  3. Why use Terraform to create Datadog monitors
  4. Ways Terraform can help you manage Datadog
  5. Automating Datadog monitors with Terraform

What is Datadog?

Datadog is a cloud-based monitoring and analytics platform that provides observability across infrastructure, applications, and logs in real time. It helps organizations gain visibility into their technology stacks and monitor the performance and health of their systems.

Datadog can collect data from various sources such as servers, containers, cloud providers, databases, and more. In fact, Datadog boasts it is capable of ‘See inside any stack, any app, at any scale, anywhere.’

It allows users to monitor key metrics, set alerts, visualize data through dashboards, and collaborate with team members. Datadog also provides features such as APM (application performance monitoring), tracing, and security monitoring.

Datadog is used by companies of all sizes and industries to improve their operations, troubleshoot issues, and optimize performance.

Check out more information on the Datadog website, where you can also sign up for a free trial.

How do I manage Datadog with Terraform?

To manage Datadog with Terraform, you can use the Datadog provider for Terraform. This provider allows you to create and manage Datadog resources, such as monitors, dashboards, and alerts, using Terraform configuration files.

Datadog Terraform Provider

To use the datadog provider, you will first need to set up your Datadog API credentials as you’ll need to provide your Datadog API key and application key to the Terraform to authenticate. You can do this by setting environment variables or by adding them to your Terraform configuration file.

See your Datadog account settings if you don’t know your API key, and create an application key on the same page if you don’t already have one.

provider "datadog" {
  api_key = var.datadog_api_key
  app_key = var.datadog_app_key
}
datadog terraform provider settings dashboard

Check out more information on the Datadog provider over on the official Terraform docs website.

You can also check out Datadog integration with Spacelift. Spacelift can send data to Datadog to help you monitor your infrastructure and Spacelift stacks using Datadog’s excellent monitoring and analytics tools. Our integration with Datadog focuses primarily on runs and lets you create dashboards and alerts.

Why use Terraform to create Datadog monitors?

All the benefits of Infrastructure as Code can apply to creating Datadog monitors in Terraform. This includes version-controlled configuration, observability, automation, consistency and repeatability, scalability, and the ability to integrate Datadog configurations with other parts of your infrastructure.

Ways Terraform can help you manage Datadog

Configuring Datadog resources such as dashboards, alerts, and monitors in Terraform can help you manage Datadog in a more efficient and effective way.

One of the major benefits of managing your Datadog deployment with Terraform is the added infrastructure drift detection that Terraform brings when you run a terraform plan or terraform apply.

Any changes in your Datadog infrastructure that are not defined in your code and have been made manually can be detected, helping to keep your infrastructure in the desired state, making it easier to identify and correct any configuration issues

Read more about managing Infrastructure as Code (IaC) with Terraform.

Automating Datadog Monitors with Terraform - Examples

To demonstrate how to use the Datadog Terraform provider to set alerts that will be used to monitor various resources on the Microsoft Azure cloud, several examples have been shown below:

Example 1 – Datadog example to monitor an Azure Web App

resource "datadog_monitor" "webapp_cpu_monitor" {
  name        = "Azure Web App CPU Usage"
  type        = "metric alert"
  message     = "The CPU usage of the Azure Web App has exceeded the threshold."
  query       = "max:azure.webapp.cpu{*} by {app_name} > 80"
  monitor_thresholds {
    critical = 80
  }
  notify_no_data = true
  no_data_timeframe {
    minutes = 15
  }
  tags = [
    "environment:production",
    "application:azure_webapp"
  ]
}

Here a metric alert is configured to query the web app CPU threshold and raise an alert should it exceed the critical threshold set at 80%.

The no-data timeframe is also set to 15 minutes, which means that if no data is received for the monitor within this timeframe, it will trigger an alert (this might mean the web app has gone offline).

Example 2 – Datadog example to monitor an Azure Storage Account

resource "datadog_monitor" "storage_account_monitor" {
  name        = "Azure Storage Account Available Space"
  type        = "metric alert"
  message     = "The available space in the Azure Storage Account has fallen below the threshold."
  query       = "100 - max:azure.storage_account.percent_used_space{*} by {account_name} < 20"
  monitor_thresholds {
    critical = 20
  }
  notify_no_data = true
  no_data_timeframe {
    minutes = 15
  }
  tags = [
    "environment:production",
    "application:azure_storage_account"
  ]
}

Similar to the previous example, here we set a metric alert to detect when the space left in the Azure storage account is less than 20%.

Example 3 – Datadog example to monitor an Azure AKS cluster and trigger autoscaling

resource "datadog_monitor" "aks_cpu_monitor" {
  name        = "AKS CPU Usage"
  type        = "metric alert"
  message     = "The CPU usage of the AKS cluster has exceeded the threshold. Scaling up the cluster."
  query       = "max:kubernetes.container.cpu.usage.total{namespace=\"default\",pod!=\"\",image!=\"\"} by {pod} > 80"
  monitor_thresholds {
    warning = 70
    critical = 80
  }
  notify_no_data = true
  no_data_timeframe {
    minutes = 15
  }
  tags = [
    "environment:production",
    "application:aks_cluster"
  ]
}

resource "azurerm_kubernetes_cluster" "aks_cluster" {
  name                = "my-aks-cluster"
  location            = "uksouth"
  resource_group_name = "my-resource-group"
  dns_prefix          = "my-aks-cluster"
  agent_pool_profile {
    name            = "agentpool"
    count           = 3
    vm_size         = "Standard_DS2_v2"
    os_type         = "Linux"
    os_disk_size_gb = 30
  }
  service_principal {
    client_id     = var.client_id
    client_secret = var.client_secret
  }
  tags = {
    Environment = "production"
    Application = "aks_cluster"
  }
}

resource "azurerm_monitor_autoscale_setting" "aks_cluster_autoscale" {
  name                = "aks_cluster_autoscale"
  resource_group_name = azurerm_kubernetes_cluster.aks_cluster.resource_group_name
  location            = azurerm_kubernetes_cluster.aks_cluster.location
  target_resource_id  = azurerm_kubernetes_cluster.aks_cluster.id

  profile {
    name  = "aks_cluster_autoscale_profile"
    rules = jsonencode(var.autoscale_rules)
  }
}

variable "autoscale_rules" {
  type    = any
  default = [
    {
      "metricTrigger" : {
        "metricName" : "CpuPercentage",
        "metricNamespace": "",
        "metricResourceUri": "${azurerm_kubernetes_cluster.aks_cluster.id}/namespaces/default/pods",
        "timeGrain": "PT1M",
        "statistic": "Average",
        "timeWindow": "PT10M",
        "timeAggregation": "Average",
        "operator": "GreaterThan",
        "threshold": 80
      },
      "scaleAction": {
        "direction": "Increase",
        "type": "ChangeCount",
        "value": "1",
        "cooldown": "PT10",
      }
   ]

In this example, we use the Datadog provider to monitor when the AKS cluster CPU usage goes over 80% and directly trigger the cluster to autoscale. The resource azurerm_monitor_autoscale_setting includes a profile block which includes a set of autoscaling rules set in the autoscale_rules variable.

If you are using the examples for testing, once you have deployed the Datadog monitors and Azure resources, be sure to clean them up using terraform destroy. (Read more about destroying Terraform resources.)

Key Points

Terraform can be used with hundreds of providers, including popular cloud services such as AWS, Azure, and GCP, as well as widely used services such as Kubernetes, VMWare, and Datadog. The Datadog provider can be used to create and manage Datadog resources such as monitors, bringing all the benefits of infrastructure-as-code to your Datadog deployment.

For more information on Managing Datadog with Terraform, check out this helpful article on the Datadog pages.

And explore how Spacelift makes it easy to work with Terraform. If you need any help managing your Terraform infrastructure, building more complex workflows based on Terraform, and managing AWS credentials per run, instead of using a static pair on your local machine, Spacelift is a fantastic tool for this. It supports Git workflows, policy as code, programmatic configuration, context sharing, drift detection, and many more great features right out of the box. You can check it for free by creating a trial account.

Note: New versions of Terraform will be placed under the BUSL license, but everything created before version 1.5.x stays open-source. OpenTofu is an open-source version of Terraform that will expand on Terraform’s existing concepts and offerings. It is a viable alternative to HashiCorp’s Terraform, being forked from Terraform version 1.5.6. OpenTofu retained all the features and functionalities that had made Terraform popular among developers while also introducing improvements and enhancements. OpenTofu is not going to have its own providers and modules, but it is going to use its own registry for them.

Terraform Management Made Easy

Spacelift effectively manages Terraform state, more complex workflows, supports policy as code, programmatic configuration, context sharing, drift detection, resource visualization and includes many more features.

Start free trial