Skip to content

docs: enhance Proxy Metrics documentation.#5568

Merged
arkodg merged 9 commits intoenvoyproxy:mainfrom
sudiptob2:docs/5411/proxy-metrics
Mar 27, 2025
Merged

docs: enhance Proxy Metrics documentation.#5568
arkodg merged 9 commits intoenvoyproxy:mainfrom
sudiptob2:docs/5411/proxy-metrics

Conversation

@sudiptob2
Copy link
Copy Markdown
Member

Fixes #5411

Release Notes: No

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 21, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 65.22%. Comparing base (953ccc1) to head (d0840b2).
Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5568      +/-   ##
==========================================
- Coverage   65.26%   65.22%   -0.04%     
==========================================
  Files         213      213              
  Lines       34073    34073              
==========================================
- Hits        22237    22224      -13     
- Misses      10501    10511      +10     
- Partials     1335     1338       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Member Author

@sudiptob2 sudiptob2 Mar 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verify OTel-Collector metrics:

I couldn't get this step to work in my environment. When I try to access localhost:19001/metrics, I receive the following error:

Forwarding from 127.0.0.1:19001 -> 9000
Forwarding from [::1]:19001 -> 9000
Handling connection for 19001
E0321 01:06:54.145499 38433 portforward.go:424] "Unhandled Error" err="an error occurred forwarding 19001 -> 9000: error forwarding port 9000 to pod f90e62112405273c19bfdfcaabf573a2bf03516a80ec84e2f25637ece8aaa8c9, uid : failed to execute portforward in network namespace "/var/run/netns/cni-0d155390-a9cd-70c5-7184-c52e57222874": failed to connect to localhost:9000 inside namespace "f90e62112405273c19bfdfcaabf573a2bf03516a80ec84e2f25637ece8aaa8c9", IPv4: dial tcp4 127.0.0.1:9000: connect: connection refused IPv6 dial tcp6 [::1]:9000: connect: connection refused"
error: lost connection to pod

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @zirain

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like the steps to enable Otel using EnvoyProxy.Spec.Telemetry.Metrics is missing here

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As seen here, OTEL metrics are exported for Prometheus to scrape. To view these metrics via command line, we need to target the pod that's collecting the metrics, which is located in the envoy-gateway-system namespace instead of the monitoring namespace.

So, Instead of:

export OTEL_POD_NAME=$(kubectl get pod -n monitoring --selector=app.kubernetes.io/name=opentelemetry-collector -o jsonpath='{.items[0].metadata.name}')
kubectl port-forward pod/$OTEL_POD_NAME -n monitoring 19001:19001

We have to do,

export ENVOY_POD_NAME=$(kubectl get pod -n envoy-gateway-system --selector=app.kubernetes.io/instance=eg,app.kubernetes.io/name=gateway-helm -o jsonpath='{.items[0].metadata.name}')
kubectl port-forward pod/$ENVOY_POD_NAME -n envoy-gateway-system 19001:19001

After that, we can access the curl http://localhost:19001/metrics endpoint and see metrics.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--selector=app.kubernetes.io/instance=eg,app.kubernetes.io/name=gateway-helm this's not right, I think this should be the pod of envoy-gateway.

Envoy send metrics with OTel sinks to OTel-collector, the collector will expose it with following configuration:

metrics:
          exporters:
            - prometheus
          receivers:
            - datadog
            - otlp

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It actually targets the envoy-gateway pod. I can see the pod has the following labels

app.kubernetes.io/instance=eg
app.kubernetes.io/name=gateway-helm
control-plane=envoy-gateway
pod-template-hash=8c7d97d98

There is another pod in the envoy-gateway-system namespace, but that does not expose metrics to http://localhost:19001/metrics

Copy link
Copy Markdown
Member

@zirain zirain Mar 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say that it might not the right approch, we're trying to verift OTel sink for dataplane not the controller.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can view metrics in the OTEL pod logs if we enable debug exporter. Updated the PR with that approach. Let me know if this approach looks good.

helm upgrade eg-addons oci://docker.io/envoyproxy/gateway-addons-helm --version {{< helm-version >}} -n monitoring --reuse-values --set opentelemetry-collector.config.service.pipelines.metrics.exporters='{debug,prometheus}'
export OTEL_POD_NAME=$(kubectl get pod -n monitoring --selector=app.kubernetes.io/name=opentelemetry-collector -o jsonpath='{.items[0].metadata.name}')
kubectl logs -n monitoring -f $OTEL_POD_NAME --tail=100

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we combine this in L67 instead - " ......set disabled to true. This may be useful when you are only using the OpenTelemetry sink"

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

@sudiptob2 sudiptob2 force-pushed the docs/5411/proxy-metrics branch from 58fd8d1 to 295165b Compare March 21, 2025 17:41
@sudiptob2 sudiptob2 marked this pull request as ready for review March 21, 2025 17:54
@sudiptob2 sudiptob2 requested a review from a team as a code owner March 21, 2025 17:54
Signed-off-by: sudipto baral <sudiptobaral.me@gmail.com>
Signed-off-by: sudipto baral <sudiptobaral.me@gmail.com>
Signed-off-by: sudipto baral <sudiptobaral.me@gmail.com>
Signed-off-by: sudipto baral <sudiptobaral.me@gmail.com>
@sudiptob2 sudiptob2 force-pushed the docs/5411/proxy-metrics branch from 295165b to 85ba303 Compare March 26, 2025 23:40
@sudiptob2 sudiptob2 requested review from arkodg and zirain March 26, 2025 23:47
To install add-ons with OpenTelemetry Collector enabled, use the following command.

```shell
helm install eg-addons oci://docker.io/envoyproxy/gateway-addons-helm --version {{< helm-version >}} --set opentelemetry-collector.enabled=true --set opentelemetry-collector.config.service.pipelines.metrics.exporters='{debug,prometheus}' -n monitoring --create-namespace
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prefer if this line didnt enable Otel, and a separate helm upgrade cmd was added for Otel in the Otel section in proxy metric because not only does it enable enable it enables debug exporter which is only used for demo purposes (we should mention that)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, yeah. It was a mistake. I removed the debug exporter from here. Later, using Helm upgrade, it will be enabled in the OTEL section.

```yaml
cat <<EOF | kubectl apply -f -
apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we link this to the Gateway instead of GatewayClass since thats a more common approach

```yaml
cat <<EOF | kubectl apply -f -
apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as mentioned in the [Prerequisites](#prerequisites).

```yaml
cat <<EOF | kubectl apply -f -
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use tabs here ?

{{% tab header="Apply from stdin" %}}

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated with a similar approach now.

Signed-off-by: sudipto baral <sudiptobaral.me@gmail.com>
Signed-off-by: sudipto baral <sudiptobaral.me@gmail.com>
Signed-off-by: sudipto baral <sudiptobaral.me@gmail.com>
Signed-off-by: sudipto baral <sudiptobaral.me@gmail.com>
@sudiptob2 sudiptob2 requested a review from arkodg March 27, 2025 05:48
Signed-off-by: sudipto baral <sudiptobaral.me@gmail.com>
@sudiptob2 sudiptob2 requested a review from arkodg March 27, 2025 06:02
Copy link
Copy Markdown
Contributor

@arkodg arkodg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks !

@arkodg arkodg merged commit 55fc01f into envoyproxy:main Mar 27, 2025
28 checks passed
@sudiptob2 sudiptob2 deleted the docs/5411/proxy-metrics branch March 1, 2026 23:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

docs: Improve Proxy Metrics page

3 participants