Automate monitoring with the Terraform Datadog provider

12min
|
Terraform

Datadog is a cloud monitoring platform that integrates with your infrastructure and gives you real-time visibility into your operations. With the Datadog Terraform provider, you can create custom monitors and dashboards for the resources you already manage, with or without Terraform, as well as new infrastructure automatically.

In this tutorial, you will deploy a demo Nginx application to a Kubernetes cluster with Helm and install the Datadog agent across the cluster. The Datadog agent reports the cluster health back to your Datadog dashboard. You will then create a monitor for this cluster in Terraform.

Prerequisites

This tutorial assumes you are familiar with the standard Terraform workflow. If you are unfamiliar with Terraform, complete the Get Started tutorials first.

For this tutorial, you will need:

a Datadog trial account
Terraform 1.1+
an EKS cluster

Provision Kubernetes

Complete the provision an EKS cluster tutorial and do not destroy your cluster.

Get Datadog API credentials

Once you have signed up for your Datadog trial, you need to retrieve your API and Application keys.

Log into your Datadog account and navigate to the API Keys section on the Organization Settings page.

Your API key is automatically generated and is obscured for security. Click on the API key to show more information, then click Copy. Save this somewhere safe.

Datadog API Key

To generate an application key, click the Application Keys on the Organization Settings page.

Click New Key, type in "Terraform" as your new application key name and click Create Key. Click Copy and save the key somewhere safe.

Datadog Application Key

These keys are the credentials Terraform will use to create monitors and dashboards on your behalf. Together, they give full access to your Datadog account, so treat them like a password and do not share or check them into version control.

Clone the example repository

Ensure that you are not inside the learn-terraform-provision-eks-cluster you created in the EKS cluster tutorial. Then clone the configuration for this tutorial.

$ git clone https://github.com/hashicorp-education/learn-terraform-datadog-local

Change into the repository directory.

$ cd learn-terraform-datadog-local

Deploy your Kubernetes application

Open the terraform.tf configuration. This file lists the minimum versions of your Datadog, Helm, AWS, and Kubernetes providers, and the minimum version of Terraform.

Open kubernetes.tf in your file editor. This tutorial will walk you through each block.

The kubernetes_namespace block declares your new namespace, which is named after the beacon image that the rest of the tutorial will use.

resource "kubernetes_namespace" "beacon" {
  metadata {
    name = "beacon"
  }
}

Update the configuration to read your Terraform state from your EKS deployment. Select the HCP Terraform tab if you deployed your cluster using HCP Terraform. If you deployed your cluster using Terraform Community Edition, choose the Terraform Community Edition tab.

Add the variables for your HCP Terraform organization and workspace to the variables.tf file.

variable "tfc_org" {
  type        = string
  description = "TFC Organization"
}

variable "tfc_workspace" {
  type        = string
  description = "TFC Workspace"
  default     = "learn-terraform-eks"
}

Run the following command, replacing <YOUR_TFC_ORG> with your HCP Terraform organization name.

$ export TF_VAR_tfc_org="<YOUR_TFC_ORG>"

If you did not deploy your EKS cluster using the default workspace name, run the following command. Replace <YOUR_TFC_WORKSPACE> with your HCP Terraform workspace name.

$ export TF_VAR_tfc_workspace="<YOUR_TFC_WORKSPACE>"

Inside your learn-terraform-datadog-local directory, review the kubernetes.tf file. If necessary, modify the path to point to the terraform.tfstate file you used to provision your EKS cluster.

Make the following changes to the kubernetes.tf file to read your local Terraform state.

- // Remove when using Terraform Community Edition
- data "tfe_outputs" "eks" {
-  organization = var.tfc_org
-   workspace = var.tfc_workspace
- }

- /* Uncomment when using Terraform Community Edition
data "terraform_remote_state" "eks" {
  backend = "local"

  config = {
    path = "../learn-terraform-provision-eks-cluster/terraform.tfstate"
  }
}
- */


# Retrieve EKS cluster configuration
data "aws_eks_cluster" "cluster" {
-  /* Uncomment when using Terraform Community Edition
  name = data.terraform_remote_state.eks.outputs.cluster_name
-  */

-  // Remove when using Terraform Community Edition
-  name = data.tfe_outputs.eks.values.cluster_name
}

Notice that the Terraform Kubernetes provider is authenticated using the cluster name provided by the learn-terraform-provision-eks-cluster directory.

The kubernetes_deployment block defines the number of nodes in the cluster, assigns metadata, and defines the container image. This configuration deploys a beacon:datadog image. This container image is custom-built by HashiCorp employees for this tutorial.

resource "kubernetes_deployment" "beacon" {
  metadata {
    name      = var.application_name
    namespace = kubernetes_namespace.beacon.id
    labels = {
      app = var.application_name
    }
  }

  spec {
    replicas = 3

    selector {
      match_labels = {
        app = var.application_name
      }
    }

    template {
      metadata {
        labels = {
          app = var.application_name
        }
      }

      spec {
        container {
          image = "onlydole/beacon:datadog"
          name  = var.application_name
        }
      }
    }
  }
}

Finally, the kubernetes_service resource exposes the beacon service using a load balancer on port 8080.

resource "kubernetes_service" "beacon" {
  metadata {
    name      = var.application_name
    namespace = kubernetes_namespace.beacon.id
  }
  spec {
    selector = {
      app = kubernetes_deployment.beacon.metadata[0].labels.app
    }
    port {
      port        = 8080
      target_port = 80
    }
    type = "LoadBalancer"
  }
}

Your application_name variable is defined in the variables.tf file and is set to a default value of beacon.

Now that you have reviewed the infrastructure, initialize your configuration.

$ terraform init

Apply your configuration. Remember to confirm your apply with a yes.

$ terraform apply

Verify your namespace.

$ kubectl get namespaces
NAME              STATUS        AGE
beacon            Active            10m
## ...

Verify your deployment.

$ kubectl get deployment --namespace=beacon
NAME        READY    UP-TO-DATE     AVAILABLE           AGE
beacon      3/3          3                  3                   10m

In the next step, you will deploy the Datadog Agent to your Kubernetes cluster as a DaemonSet in order to start collecting your cluster and application metrics, traces, and logs. To do this, you will use the Helm provider to deploy the datadog/datadog helm chart.

Deploy the Datadog Agent to your nodes with Helm

Next, deploy the Datadog helm chart. This chart adds the Datadog Agent to all nodes in your cluster via a DaemonSet.

Copy and paste the configuration below into helm_datadog.tf

provider "helm" {
  kubernetes {
    config_path = "~/.kube/config"
  }
}

resource "helm_release" "datadog_agent" {
  name       = "datadog-agent"
  chart      = "datadog"
  repository = "https://helm.datadoghq.com"
  version    = "3.10.9"
  namespace  = kubernetes_namespace.beacon.id

  set_sensitive {
    name  = "datadog.apiKey"
    value = var.datadog_api_key
  }

  set {
    name  = "datadog.site"
    value = var.datadog_site
  }

  set {
    name  = "datadog.logs.enabled"
    value = true
  }

  set {
    name  = "datadog.logs.containerCollectAll"
    value = true
  }

  set {
    name  = "datadog.leaderElection"
    value = true
  }

  set {
    name  = "datadog.collectEvents"
    value = true
  }

  set {
    name  = "clusterAgent.enabled"
    value = true
  }

  set {
    name  = "clusterAgent.metricsProvider.enabled"
    value = true
  }

  set {
    name  = "networkMonitoring.enabled"
    value = true
  }

  set {
    name  = "systemProbe.enableTCPQueueLength"
    value = true
  }

  set {
    name  = "systemProbe.enableOOMKill"
    value = true
  }

  set {
    name  = "securityAgent.runtime.enabled"
    value = true
  }

  set {
    name  = "datadog.hostVolumeMountPropagation"
    value = "HostToContainer"
  }
}

This Helm configuration requires your Datadog API and application keys. Set these values as environment variables in your terminal.

Run the following command, replacing <Your-API-Key> with your Datadog API key you saved earlier.

$ export TF_VAR_datadog_api_key="<Your-API-Key>"

Repeat this process with the application key. Replace <Your-App-Key> with your Datadog application key you saved earlier.

$ export TF_VAR_datadog_app_key="<Your-App-Key>"

Note the URL of the Datadog website and refer to the Getting Started with Datadog Sites documentation to determine the correct values for the datadog_site and datadog_api_url variables. This tutorial defaults to using values for site US1. If you are on a different site, set the datadog_site and datadog_api_url to the values in the Datadog documentation. For example, if you are on site US5, run the following commands.

$ export TF_VAR_datadog_site="us5.datadoghq.com"

$ export TF_VAR_datadog_api_url="https://api.us5.datadoghq.com"

Add the values for your Datadog keys to the variables.tf file. Terraform will apply the environment variable values to the corresponding variable declarations.

variable "datadog_api_key" {
  type        = string
  description = "Datadog API Key"
}

variable "datadog_app_key" {
  type        = string
  description = "Datadog Application Key"
}

variable "datadog_site" {
  type        = string
  description = "Datadog Site Parameter"
  default     = "datadoghq.com"
}

variable "datadog_api_url" {
  type        = string
  description = "Datadog API URL"
  default     = "https://api.datadoghq.com"
}

Apply your configuration. Remember to confirm your apply with a yes.

$ terraform apply

In the next section, you will create monitoring criteria for this cluster with the Datadog provider.

Create a metric alert with the Datadog provider

The datadog_monitor resource will report threshold errors in the Kubernetes pods and report errors if any pods go down.

Copy and paste the configuration below to datadog_metrics.tf.

provider "datadog" {
  api_key = var.datadog_api_key
  app_key = var.datadog_app_key
  api_url = var.datadog_api_url
}

resource "datadog_monitor" "beacon" {
  name               = "Kubernetes Pod Health"
  type               = "metric alert"
  message            = "Kubernetes Pods are not in an optimal health state. Notify: @operator"
  escalation_message = "Please investigate the Kubernetes Pods, @operator"

  query = "max(last_1m):sum:kubernetes.containers.running{short_image:beacon} <= 1"

  monitor_thresholds {
    ok       = 3
    warning  = 2
    critical = 1
  }

  notify_no_data = true

  tags = ["app:beacon", "env:demo"]
}

The datadog_monitor.beacon resource notifies and escalates the health of the Kubernetes "beacon" application. The query argument is how Datadog communicates with the pods.

If all three pods are operational, your Datadog monitor status report as "OK".
If any pods go down, your Datadog monitor status will change to "Warn".
If more than one pod goes down, your Datadog monitor status will change to "Alert".

Apply your configuration to create a new Datadog monitor. Remember to confirm your apply with a yes.

$ terraform apply

Navigate to the Datadog Monitor page. Your Kubernetes Pod Health monitor is reporting here now.

Datadog Metrics Monitor

Create a synthetic alert with the Datadog provider

A synthetic check allows Datadog to check a specific webpage at intervals of your choice. The datadog_synthetics_test resource can create and manage API and URL performance monitors. If the URL times out or does not return the expected value Datadog will alert you.

Copy and paste the configuration below into datadog_synthetics.tf.

resource "datadog_synthetics_test" "beacon" {
  type    = "api"
  subtype = "http"

  request_definition {
    method = "GET"
    url    = "http://<Host_URL>"
  }

  assertion {
    type     = "statusCode"
    operator = "is"
    target   = "200"
  }

  locations = ["aws:us-west-2"]
  options_list {
    tick_every          = 900
    min_location_failed = 1
  }

  name    = "Beacon API Check"
  message = "Oh no! Light from the Beacon app is no longer shining!"
  tags    = ["app:beacon", "env:demo"]

  status = "live"
}

Use terraform output to return the endpoint of the Beacon service.

$ terraform output beacon_endpoint

Update <Host_URL> with the Beacon service address.

Apply your configuration to create a new synthetic monitor. Remember to confirm your apply with a yes.

$ terraform apply

Navigate to the Datadog Monitor page to view your "Beacon API Check" monitor.

Datadog Synthetics Monitor

Create a Datadog dashboard

The Datadog dashboard is an easily accessible dashboard for your monitors in the Datadog UI, which is useful if you have several monitors and need to group them together for visibility. This configuration contains the dashboard setup for your metrics and synthetics monitors with the datadog_dashboard resource.

Copy and paste the configuration below into datadog_dashboard.tf.

resource "datadog_dashboard" "beacon" {
  title       = "Beacon Service"
  description = "A Datadog Dashboard for the ${kubernetes_deployment.beacon.metadata[0].name} deployment"
  layout_type = "ordered"

  widget {
    hostmap_definition {
      no_group_hosts  = true
      no_metric_hosts = true
      node_type       = "container"
      title           = "Kubernetes Pods"

      request {
        fill {
          q = "avg:process.stat.container.cpu.total_pct{image_name:onlydole/beacon} by {host}"
        }
      }

      style {
        palette      = "hostmap_blues"
        palette_flip = false
      }
    }
  }

  widget {
    timeseries_definition {
      show_legend = false
      title       = "CPU Utilization"

      request {
        display_type = "line"
        q            = "top(avg:kubernetes.cpu.usage.total{image_name:onlydole/beacon} by {short_image,container_id}, 10, 'mean', 'desc')"

        style {
          line_type  = "solid"
          line_width = "normal"
          palette    = "dog_classic"
        }
      }

      yaxis {
        include_zero = true
        max          = "auto"
        min          = "auto"
        scale        = "linear"
      }
    }
  }

  widget {
    alert_graph_definition {
      alert_id = datadog_monitor.beacon.id
      title    = "Kubernetes Node CPU"
      viz_type = "timeseries"
    }
  }

  widget {
    hostmap_definition {
      no_group_hosts  = true
      no_metric_hosts = true
      node_type       = "host"
      title           = "Kubernetes Nodes"

      request {
        fill {
          q = "avg:system.cpu.user{*} by {host}"
        }
      }

      style {
        palette      = "hostmap_blues"
        palette_flip = false
      }
    }
  }

  widget {
    timeseries_definition {
      show_legend = false
      title       = "Memory Utilization"
      request {
        display_type = "line"
        q            = "top(avg:kubernetes.memory.usage{image_name:onlydole/beacon} by {container_name}, 10, 'mean', 'desc')"

        style {
          line_type  = "solid"
          line_width = "normal"
          palette    = "dog_classic"
        }
      }
      yaxis {
        include_zero = true
        max          = "auto"
        min          = "auto"
        scale        = "linear"
      }
    }
  }
}

Apply your configuration to create a new Datadog dashboard for your metrics and synthetics monitors. Remember to confirm your apply with a yes.

$ terraform apply

Navigate to the Datadog Dashboard page. Your Beacon Service dashboard is reporting here now. Click on the Beacon Service dashboard to see all of your monitors reporting.

Datadog API Key

Clean up resources

After verifying that the resources were deployed successfully, run terraform destroy to destroy them. Remember to respond to the confirmation prompt with yes.

$ terraform destroy

Note

If you provisioned an EKS cluster for use with this tutorial, destroy it as well.

Next steps

Now that you have successfully created a metric monitor, an endpoint monitor, and a Datadog dashboard, consider reviewing the resources below.

Apache Kafka

Vercel preview environments

This tutorial also appears in:

13 tutorials

Use Cases for Terraform
Use Terraform to perform common operations with other technologies, including Consul, Vault, Packer, and Kubernetes.
- Terraform