Monitor application health and performance with Consul proxy metrics
Consul helps you securely connect applications running in any environment, at any scale. Consul observability features enhance your service mesh capabilities with enriched metrics, logs, and distributed traces so you can improve performance and debug your distributed services with precision.
Consul proxy metrics give you detailed health and performance information about your service mesh applications. This includes upstream/downstream network traffic metrics, ingress/egress request details, error rates, and additional performance information that you can use to understand your distributed applications. Once you enable proxy metrics in Consul, you do not need to configure or instrument your applications in the service mesh to leverage proxy metrics.
In this tutorial, you will enable proxy metrics for your Consul data plane. You will use Grafana to explore dashboards that provide information regarding health, performance, and operations for your service mesh applications. In the process, you will learn how using these features can provide you with deep insights, reduce operational overhead, and contribute to a more holistic view of your service mesh applications.
Scenario overview
HashiCups is a coffee shop demo application. It has a microservices architecture and uses Consul service mesh to securely connect the services. At the beginning of this tutorial, you will use Terraform to deploy the HashiCups microservices, a self-managed Consul cluster, and an observability suite on Elastic Kubernetes Service (EKS).
The Consul proxy sidecar container can collect Layer 7 (L7) metrics (HTTP status codes, request latency, transaction volume, etc.) for your service mesh applications. Consul can also collect metrics from the Consul management plane and gateways. By configuring the Consul Helm chart, you can configure the proxies to send this data to Prometheus, then visualize them with Grafana.
In this tutorial, you will:
- Deploy the following resources with Terraform:
- Elastic Kubernetes Service (EKS) cluster
- A self-managed Consul datacenter on EKS
- Grafana and Prometheus on EKS
- HashiCups demo application
- Perform the following Consul data plane procedures:
- Review and enable proxy metrics features
- Explore the demo application
- Explore dashboards with Grafana
Prerequisites
The tutorial assumes that you are familiar with Consul and its core functionality. If you are new to Consul, refer to the Consul Getting Started tutorials collection.
For this tutorial, you will need:
- An AWS account configured for use with Terraform
- (Optional) An HCP account
- aws-cli >= 2.0
- terraform >= 1.0
- consul >= 1.16.0
- consul-k8s >= 1.2.0
- helm >= 3.0
- git >= 2.0
- kubectl > 1.24
Clone GitHub repository
Clone the GitHub repository containing the configuration files and resources.
$ git clone https://github.com/hashicorp-education/learn-consul-proxy-metrics
Change into the directory that contains the complete configuration files for this tutorial.
$ cd learn-consul-proxy-metrics/self-managed/eks
Review repository contents
This repository contains Terraform configuration to spin up the initial infrastructure and all files to deploy Consul, the demo application, and the observability suite resources.
Here, you will find the following Terraform configuration:
aws-vpc.tf
defines the AWS VPC resourceseks-cluster.tf
defines Amazon EKS cluster deployment resourceseks-consul.tf
defines the self-managed Consul deploymenteks-hashicups-with-consul.tf
defines the HashiCups resourceseks-observability.tf
defines the Prometheus and Grafana resourcesoutputs.tf
defines outputs you will use to authenticate and connect to your Kubernetes clusterproviders.tf
defines AWS and Kubernetes provider definitions for Terraformvariables.tf
defines variables you can use to customize the tutorial
Additionally, you will find the following directories and subdirectories:
dashboards
contains the JSON configuration files for the example Grafana dashboardsapi-gw
contains the Kubernetes configuration files for the Consul API gatewayconfig
contains the Kubernetes configuration files for the Consul telemetry collector intentionshashicups
contains the Kubernetes configuration files for HashiCupshelm
contains the Helm charts for Consul, Grafana, and Prometheus
Deploy infrastructure and demo application
With these Terraform configuration files, you are ready to deploy your infrastructure.
Initialize your Terraform configuration to download the necessary providers and modules.
$ terraform init
Initializing the backend...
Initializing provider plugins...
## ...
Terraform has been successfully initialized!
## ...
Then, deploy the resources. Confirm the run by entering yes
.
$ terraform apply
## ...
Do you want to perform these actions?
Terraform will perform the actions described above.
Only 'yes' will be accepted to approve.
Enter a value: yes
## ...
Apply complete! Resources: 97 added, 0 changed, 0 destroyed.
Note
The Terraform deployment could take up to 15 minutes to complete. Feel free to explore the next sections of this tutorial while waiting for the environment to complete initialization.
Connect to your infrastructure
Now that you have deployed the Kubernetes cluster, configure kubectl
to interact with it.
$ aws eks --region $(terraform output -raw region) update-kubeconfig --name $(terraform output -raw kubernetes_cluster_id)
Enable Consul proxy metrics
In this section, you will review the parameters that enable Consul proxy metrics, upgrade your Consul installation to apply the new configuration, and restart your service mesh sidecar proxies.
Review the Consul values file
Consul lets you expose metrics for your service mesh applications and sidecars so they may be scraped by a Prometheus service that is outside of your service mesh. Review the highlighted lines in the values file below to see the parameters that enable this feature.
helm/consul-v2.yaml
global:
## ...
# Exposes Prometheus metrics for the Consul service mesh and sidecars.
metrics:
enabled: true
# Enables Consul servers and clients metrics.
enableAgentMetrics: true
# Configures the retention time for metrics in Consul servers and clients.
agentMetricsRetentionTime: "59m"
ui:
## ...
# Enables displaying metrics in the Consul UI.
metrics:
enabled: true
# The metrics provider specification.
provider: "prometheus"
# The URL of the prometheus metrics server.
baseURL: http://prometheus-server.observability.svc.cluster.local
connectInject:
## ...
# Enables metrics for Consul Connect sidecars.
metrics:
defaultEnabled: true
Refer to the Consul metrics for Kubernetes documentation to learn more about metrics configuration options and details.
Deploy the updated Consul values file
Update Consul in your Kubernetes cluster with Consul K8S CLI. Confirm the run by entering y
.
$ consul-k8s upgrade -config-file=helm/consul-v2.yaml
Refer to the Consul K8S CLI documentation to learn more about additional settings.
Note
The upgrade could take up to 5 minutes to complete. Feel free to explore the next sections of this tutorial while waiting for your updated Consul environment to become available.
Review the official Helm chart values to learn more about these settings.
Restart sidecar proxies
You need to restart your sidecar proxies to retrieve the updated proxy configuration. To do so, redeploy your HashiCups application.
$ kubectl rollout restart deployment --namespace default
deployment.apps/api-gateway restarted
deployment.apps/frontend restarted
deployment.apps/nginx restarted
deployment.apps/payments restarted
deployment.apps/product-api restarted
deployment.apps/product-api-db restarted
deployment.apps/public-api restarted
deployment.apps/traffic-generator restarted
Prometheus will now begin scraping the /metrics
endpoint for all proxy sidecars on port 20200
. Refer to the Consul metrics for Kubernetes documentation to learn more about changing these default parameters.
Confirm sidecar configuration
Confirm that your sidecar proxy configuration has been successfully updated by viewing the Envoy admin interface. You can connect to the Envoy admin interface by port-forwarding port 19000
from a service that has a sidecar proxy.
$ kubectl port-forward deploy/frontend 19000:19000
Open http://localhost:19000/config_dump in your browser to find the Envoy configuration. Search for 20200
, the default endpoint port for Prometheus metrics. You should find two different stanzas that reference this port. One of them is included next for reference.
{
"name": "envoy_prometheus_metrics_listener",
"address": {
"socket_address": {
"address": "0.0.0.0",
"port_value": 20200
}
}
The presence of these stanzas confirms that Consul has configured the Envoy sidecar to expose Prometheus metrics.
Explore the demo application
In this section, you will visit your demo application to explore the HashiCups UI.
Retrieve the Consul API Gateway public DNS address.
$ export CONSUL_APIGW_ADDR=http://$(kubectl get svc/api-gateway -o json | jq -r '.status.loadBalancer.ingress[0].hostname') && echo $CONSUL_APIGW_ADDR
http://a4cc3e77d86854fe4bbcc9c62b8d381d-221509817.us-west-2.elb.amazonaws.com
Open the Consul API Gateway's URL in your browser and explore the HashiCups UI.
Explore health insights dashboard
Consul proxy metrics help you monitor the health of your service mesh applications with information including: requests by status code, upstream/downstream connections, rejected connections, and Envoy cluster state. Most of these metrics are available for any service mesh application and require no additional application configuration.
Navigate to the HashiCups health monitoring Grafana dashboard.
$ export GRAFANA_HEALTH_DASHBOARD=http://$(kubectl get svc/grafana --namespace observability -o json | jq -r '.status.loadBalancer.ingress[0].hostname')/d/data-plane-health/ && echo $GRAFANA_HEALTH_DASHBOARD
http://a20fb6f2d1d3e4be296d05452a378ad2-428040929.us-west-2.elb.amazonaws.com/d/data-plane-health/
Note
The example dashboards take a few minutes to populate with data after the proxy metrics feature is enabled.
Notice that the example dashboard panes provide detailed health insights for HashiCups.
For example, the Upstream Rq by Status Code
proxy statistics gives you a high-level overview of the HTTP requests throughout your service mesh. The Total active upstream connections
graph shows how many upstream hosts are currently receiving requests and returning responses. These graphs can be useful to analyze the health of the upstream hosts in your service mesh and identify any anomalies in behavior.
Tip
Consul proxy metrics contain a large set of statistics that you can use to create custom dashboards for monitoring your service mesh applications according to your production environment's unique requirements. Refer to the Envoy proxy statistics overview for a complete list of available metrics.
Explore performance insights dashboard
In addition to monitoring service health, you can use Consul proxy metrics to monitor the performance of your service mesh applications. These metrics include network traffic statistics, CPU/memory usage by pod, data plane latency, and upstream/downstream connection data.
Navigate to the HashiCups performance monitoring Grafana dashboard.
$ export GRAFANA_PERFORMANCE_DASHBOARD=http://$(kubectl get svc/grafana --namespace observability -o json | jq -r '.status.loadBalancer.ingress[0].hostname')/d/data-plane-performance/ && echo $GRAFANA_PERFORMANCE_DASHBOARD
http://a20fb6f2d1d3e4be296d05452a378ad2-428040929.us-west-2.elb.amazonaws.com/d/data-plane-performance/
Note
The example dashboards take a few minutes to populate with data after the proxy metrics feature is enabled.
Notice that the example dashboard panes provide detailed performance insights for HashiCups.
For example, the Dataplane latency
proxy statistics help you understand network performance for the respective percentiles of network traffic. In this example, p50
shows you the average performance and p99.9
shows you the worst performance for a given period of time. The Memory/CPU Usage % by pod limits
panes can be useful to analyze the performance of the pods in your service mesh so you can modify resource allocations for any services that are over-provisioned or under-provisioned.
Clean up resources
Destroy the Terraform resources to clean up your environment. Confirm the destroy operation by inputting yes
.
$ terraform destroy
## ...
Do you really want to destroy all resources?
Terraform will destroy all your managed infrastructure, as shown above.
There is no undo. Only 'yes' will be accepted to confirm.
Enter a value: yes
## ...
Destroy complete! Resources: 0 added, 0 changed, 97 destroyed.
Note
Due to race conditions with the cloud resources in this tutorial, you may need to run the destroy
operation twice to remove all the resources.
Next steps
In this tutorial, you enabled proxy metrics in the Consul service mesh to enhance the health and performance monitoring of your service mesh applications. You did not need to configure or instrument for your applications to enable these features, leading to a very quick time-to-value for your service mesh applications. This integration offers faster incident resolution, increased application understanding, and reduced operational overhead.
For more information about the topics covered in this tutorial, refer to the following resources: