Skip to content

Commit

Permalink
advanced monitoring with prom and grafana
Browse files Browse the repository at this point in the history
  • Loading branch information
pauldotyu committed Nov 1, 2024
1 parent 8f5ccfa commit 05a3a92
Showing 1 changed file with 139 additions and 3 deletions.
142 changes: 139 additions & 3 deletions workshops/advanced-aks/workshop.md
Original file line number Diff line number Diff line change
Expand Up @@ -888,10 +888,146 @@ I1025 15:04:39.055667 1 main.go:63] "successfully got secret" secret="Hell
## Advanced Monitoring Concepts
### Azure Managed Prometheus
Monitoring your AKS cluster has never been easier. Services like Azure Managed Prometheus and Azure Managed Grafana provide a fully managed monitoring solution for your AKS cluster all while using industry standard cloud-native tools. You can always deploy the open-source Prometheus and Grafana to your AKS cluster, but with Azure Managed Prometheus and Azure Managed Grafana, you can save time and resources by letting Azure manage the infrastructure for you.
- ServiceMonitor
- PodMonitor
To onboard your AKS cluster for monitoring, you can head over to the **Insights** under the **Monitoring** section in the Azure portal. From there, you can click on the **Monitor Settings** button and select the appropriate options. More information can be found [here](https://learn.microsoft.com/azure/azure-monitor/containers/kubernetes-monitoring-enable?tabs=cli#enable-full-monitoring-with-azure-portal).
### AKS control plane metrics
As you may know, Kubernetes control plane components are managed by Azure and there are metrics that Kubernetes administrators would like to monitor such as kube-apiserver, kube-scheduler, kube-controller-manager, and etcd. These are metrics that typically were not exposed to AKS users... until now. AKS now offers a preview feature that allows you to access these metrics and visualize them in Azure Managed Grafana. More on this preview feature can be found [here](https://learn.microsoft.com/azure/aks/monitor-aks#monitor-aks-control-plane-metrics-preview). Before you set off to enable this, it is important to consider the [pre-requisites and limitations](https://learn.microsoft.com/azure/aks/monitor-aks#prerequisites-and-limitations) of this feature while it is in preview.
To enable the feature simply run the following command to register the preview feature.
```bash
az feature register \
--namespace "Microsoft.ContainerService" \
--name "AzureMonitorMetricsControlPlanePreview"
```
Once the feature is registered, refresh resource provider.
```bash
az provider register --namespace Microsoft.ContainerService
```
After the feature is registered, you can enable the feature on your existing AKS cluster by running the following command. New clusters will have this feature enabled by default from this point forward.
```bash
az aks update \
--resource-group myResourceGroup \
--name myAKSCluster \
```
<div class="info" data-title="Note">
The AKS cluster must also have been onboarded to Azure Managed Prometheus in order for the data to be collected.
</div>
With Azure Managed Grafana integrated with Azure Managed Prometheus, you can import [kube-apiserver](https://grafana.com/grafana/dashboards/20331-kubernetes-api-server/) and [etcd](https://grafana.com/grafana/dashboards/20330-kubernetes-etcd/) metrics dashboards.
Run the following command to get the name of your Azure Managed Grafana instance.
```bash
AMG_NAME=$(az grafana list -g myResourceGroup --query "[0].name" -o tsv)
```
Run the following command to import the kube-apiserver and etcd metrics dashboards.
```bash
# make sure the amg extension is installed
az extension add --name amg
# import kube-apiserver dashboard
az grafana dashboard import \
--name $AMG_NAME \
--resource-group myResourceGroup \
--folder 'Azure Managed Prometheus' \
--definition 20331
# import etcd dashboard
az grafana dashboard import \
--name $AMG_NAME \
--resource-group myResourceGroup \
--folder 'Azure Managed Prometheus' \
--definition 20330
```
Now you, should be able to browse to your Azure Managed Grafana instance and see the kube-apiserver and etcd metrics dashboards in the Azure Managed Prometheus folder.
Out of the box, only the etcd and kube-apiserver metrics data is being collected as part of the [minimal ingestion profile](https://learn.microsoft.com/azure/aks/monitor-aks-reference#minimal-ingestion-profile-for-control-plane-metrics-in-managed-prometheus) for control plane metrics. This profile is designed to provide a balance between the cost of monitoring and the value of the data collected. The others mentioned above will need to be manually enabled and this can be done by deploying a ConfigMap named [ama-metrics-settings-configmap](https://github.com/Azure/prometheus-collector/blob/89e865a73601c0798410016e9beb323f1ecba335/otelcollector/configmaps/ama-metrics-settings-configmap.yaml) in the kube-system namespace.
<div class="info" data-title="Note">
More on the minimal ingestion profile can be found [here](https://learn.microsoft.com/azure/azure-monitor/containers/prometheus-metrics-scrape-configuration-minimal).
</div>
Run the following command to deploy the `ama-metrics-settings-configmapp` in the `kube-system` namespace.
```bash
kubectl apply -f https://raw.githubusercontent.com/Azure/prometheus-collector/89e865a73601c0798410016e9beb323f1ecba335/otelcollector/configmaps/ama-metrics-settings-configmap.yaml
```
Now, you can edit the `ama-metrics-settings-configmap` to enable the metrics you want to collect. Run the following command to edit the `ama-metrics-settings-configmap`.
```bash
kubectl edit cm ama-metrics-settings-configmap -n kube-system
```
Toggle any of the metrics you wish to collect to `true`, but keep in mind that the more metrics you collect, the more resources you will consume.
<div class="info" data-title="Note">
The Azure team does not offer a [pre-built dashboard](https://grafana.com/orgs/azure/dashboards) for some of these metrics, but you can reference the doc on [supported metrics for Azure Managed Prometheus](https://learn.microsoft.com/azure/aks/monitor-aks-reference#supported-metrics-for-microsoftcontainerservicemanagedclusters) and create your own dashboards in Azure Managed Grafana or search for community dashboards on [Grafana.com](https://grafana.com/grafana/dashboards) and import them into Azure Managed Grafana. Just be sure to use the Azure Managed Prometheus data source.
</div>
https://learn.microsoft.com/en-us/azure/azure-monitor/containers/prometheus-metrics-scrape-default#prometheus-visualization-recording-rules
### Custom scrape jobs for Azure Managed Prometheus
Typically when you want to scrape metrics from a target, you would create a scrape job in Prometheus. With Azure Managed Prometheus, you can create custom scrape jobs for your AKS cluster using the PodMonitor and ServiceMonitor custom resource definitions (CRDs) that is automatically created when you onboard your AKS cluster to Azure Managed Prometheus. These CRDs are nearly identical to the open-source Prometheus CRDs, with the only difference being the apiVersion. When you deploy a PodMonitor or ServiceMonitor for Azure Managed Prometheus, you will need to specify the apiVersion as `azmonitoring.coreos.com/v1` instead of `monitoring.coreos.com/v1`.
We'll go through a quick example of how to deploy a PodMonitor for a reference app that is deployed to your AKS cluster.
Run the following command to deploy a reference app to the cluster to generate some metrics.
```bash
kubectl apply -f https://raw.githubusercontent.com/Azure/prometheus-collector/refs/heads/main/internal/referenceapp/prometheus-reference-app.yaml
```
Run the following command to deploy a PodMonitor for the reference app
```bash
kubectl apply -f https://raw.githubusercontent.com/Azure/prometheus-collector/refs/heads/main/otelcollector/deploy/example-custom-resources/pod-monitor/pod-monitor-reference-app.yaml
```
Custom resource targets are scraped by pods that start with the name `ama-metrics-*` and the Prometheus Agent web user interface is available on port 9090. So we can port-forward the Prometheus pod to our local machine to access the Prometheus UI and explore all that is configured.
Run the following command to get the name of the Azure Monitor Agent pod.
```bash
AMA_METRICS_POD_NAME=$(kubectl get po -n kube-system -lrsName=ama-metrics -o jsonpath='{.items[0].metadata.name}')
```
Run the following command to port-forward the Prometheus pod to your local machine.
```bash
kubectl port-forward $AMA_METRICS_POD_NAME -n kube-system 9090
```
Open a browser and navigate to `http://localhost:9090` to access the Prometheus UI.
If you click on the **Status** dropdown and select **Targets**, you will see the target for **podMonitor/default/prometheus-reference-app-job/0** and the endpoint that is being scraped.
If you click on the **Status** dropdown and select **Service Discovery**, you will see the scrape jobs with active targets and discovered labels for **podMonitor/default/prometheus-reference-app-job/0**.
When you are done, you can stop the port-forwarding by pressing `Ctrl+C`.
Give the scrape job a few moments to collect metrics from the reference app. Once you have given it enough time, you can head over to Azure Managed Grafana and click on the **Explore** tab to query the metrics that are being collected.
More on custom scrape jobs can be found [here](https://learn.microsoft.com/azure/azure-monitor/containers/prometheus-metrics-scrape-crd) and [here](https://learn.microsoft.com/azure/azure-monitor/containers/prometheus-metrics-troubleshoot#prometheus-interface)
### AKS Cost Analysis
Expand Down

0 comments on commit 05a3a92

Please sign in to comment.