Add example deployments for prometheus / grafana by stevesloka · Pull Request #970 · projectcontour/contour

stevesloka · 2019-04-01T16:44:23Z

Fixes #969 by adding example deployments for prometheus & grafana. It updates the ds-hostnet-split example to match quickstart defined in /docs/prometheus.md so that users can follow along and end up with a working solution.

This also adds sample dashboards to Grafana which provide metrics for both Contour & Envoy.

Signed-off-by: Steve Sloka slokas@vmware.com

stevesloka · 2019-04-01T20:23:42Z

Thanks to @jipperinbham fixed a quick issue with the statsd forwarder. =)

jipperinbham · 2019-04-01T21:40:38Z

deployment/prometheus/02-prometheus-configmap.yaml

+        target_label: kubernetes_pod_name
+      metric_relabel_configs:
+      - source_labels: [envoy_cluster_name]
+        regex: '(.+?)\/(.+?)\/(.*)'


need to replace \/ with _ here as well and the two spots below

davecheney · 2019-04-04T00:38:26Z

I'm not qualified to review this but i'm glad to see these prometheus dashboards included.

rata · 2019-04-05T20:26:45Z

deployment/ds-hostnet-split/03-envoy.yaml

+      - name: statsd-sink
+        image: prom/statsd-exporter:v0.6.0
+        command: 
+            - "/bin/statsd_exporter"


Just curious: envoy 1.10.0 has just been released (like 2 hours ago) and supports histograms in prometheus export now (envoyproxy/envoy#5601 you shared it on slack a few days ago, so I understand you are aware :-D) . Does the sidecar add more metrics if envoy 1.10 is used?

I guess the answer is yes, as you shared that link on slack, but I really don't know and would like to understand if it doesn't bother you :). Also, I'm not sure when contour plans to upgrade to envoy 1.10, so even if the sidecar does not add more metrics, this of course can be totally relevant :-)

@rata yes Envoy v1.10 has Histograms enabled so we can remove the statsd forwarder bit. I'm not sure when we'll upgrade to v1.10, we typically wait a bit to let any issues shake out. I'd like to still merge this for v1.9.1 and then we'll update with a future PR once we move to envoy v1.10

Thanks a lot for the info! :)

davecheney · 2019-04-09T23:18:54Z

@stevesloka @rata @jipperinbham can you please finalise the review of this PR. I'd like it to either land or move back into development by EOD April 12. Thank you.

stevesloka · 2019-04-10T14:51:56Z

I think this is good to merge @davecheney, there is an issue with how the metrics are visualized which is tracked in #1008

rata · 2019-04-10T22:27:51Z

deployment/ds-hostnet-split/03-envoy.yaml

-        prometheus.io/path: "/stats"
-        prometheus.io/format: "prometheus"
+        prometheus.io/statsdport: "9102"
+        prometheus.io/path: "/stats/prometheus"


Am I missing something or this will scrape only envoy metrics? stastd, I think, will gather only envoy and so will port 8002. Right?

I think metrics for the contour container (on endpoint /metrics and port 8000) are not gathered by this. Am I missing something? Hopefully I am :)

I will try to create a cluster and see if those are missing or not, it's too late here now to try this :-(

@rata Contour metrics are scraped on the Contour deployment. This example is applied to the split model where Contour & Envoy are not colocated in the same pod.

Ohh, I see. Silly me, it was not obvious just looking quickly at the diff. Thanks! :)

davecheney · 2019-04-14T23:07:29Z

@stevesloka can you merge this or are you blocked on #1008?

alexbrand

I tested this with the ds_hostnet_split deployment of contour/envoy and was able to visualize metrics for both components successfully.

I have added one comment that can perhaps be solved in a follow-up PR if we want to land this.

alexbrand · 2019-04-15T14:21:25Z

deployment/prometheus/03-prometheus-deployment.yaml

+  - protocol: TCP
+    name: prometheus
+    port: 9090
+  - protocol: TCP


Should we remove this? Doesn't seem like we should be targetting the alertmanager in the prometheus service.

Thanks for the review @alexbrand, I just dropped the alertmanager reference.

davecheney

LGTM. Please merge this, we can fix #1008 between now and the end of the cycle.

rata

@davecheney I've tried this in a test cluster and could use it to gather all the metrics (envoy, including envoy histograms, contour metrics, etc.). LGTM, thanks for asking :)

As a side note, I think docs or manifests can be improved a little bit. Let me explain what I mean.

With this example, I think it seems you need to split envoy and contour in different pods so they can be monitored (of course it is not the case, but it was confusing for me). Because this example takes advantage of the containers split to scrape them, by using annotations on the pod spec. The thing is, the annotations support only one port and path per pod, but if both containers are running on the same pod is not clear how to scrape both (the annotations in the pod spec don't let you do that, by default). So, IMHO, it is not clear from this example how to monitor when envoy and contour are co-located in the same pod.

Of course it can be monitored just fine even if contour and envoy are co-located in the same pod. But, at least for me, when looking at the PR it was not obvious. I though annotations, that only one per pod can be used, made it difficult and I thought that is why the contour and pod container were split in different pod for the monitoring example. So it would have helped me at least if this was more clearly documented :-)

And in fact, I don't think it is totally obvious if you are new to contour and prometheus (like me :)).

So, I'd consider (maybe on a different PR? As you prefer, if you think it is relevant) improving the doc so it is clear how to monitor when contour and envoy are in the same pod. Or, even better IMHO, just adding the annotations to the yamls too so it works out of the box.

Right now (before and after this PR, this PR doesn't change that) the deployments in deployment/deployment-grpc-v2 and deployment/ds-grpc-v2 have annotations to scrape envoy metrics but not contour ones. We can just add annotations to scrape for contour metrics too.

That would make deployments using the yamls work with prometheus, and have contour and envoy metrics.

Histogram metrics for envoy will be missing (statsd sidecar is needed when using envoy < 1.10), but you can look at this doc and add the sidecar.

And hopefully, when we update to envoy 1.10, the sidecar won't even be needed and all (envoy metrics, including histograms, contour metrics) will work out of the box.

What do you think @stevesloka @davecheney ?

stevesloka · 2019-04-22T14:41:06Z

@rata thanks for the reply! I agree with you that this needs some work. We did most of this work in Gimbal which only uses the split-model deployment because of network perf reasons. My goal with this was to enable folks to start to look at what we had for metrics and stir inspiration (like you've been doing). =)

I think once we can move to Envoy 1.10 then we can drop the statsd_forwarder and make all this much simpler. I'd like to merge this, then open a new issue to track the work that needs to be done once we're at Envoy v1.1.0.

ds-hostnet-split to match quickstart defined in /docs/prometheus.md Signed-off-by: Steve Sloka <slokas@vmware.com>

rata · 2019-04-22T15:16:36Z

@stevesloka sounds good to me. Let me know if I can help with the related issues. Thanks again! :-)

stevesloka changed the title ~~Add example deployments for prometheus / grafana~~ wip: Add example deployments for prometheus / grafana Apr 1, 2019

stevesloka force-pushed the metrics branch from 7a92361 to f6e3de1 Compare April 1, 2019 20:21

stevesloka changed the title ~~wip: Add example deployments for prometheus / grafana~~ Add example deployments for prometheus / grafana Apr 1, 2019

jipperinbham reviewed Apr 1, 2019

View reviewed changes

stevesloka force-pushed the metrics branch from f6e3de1 to a9c8807 Compare April 2, 2019 19:00

davecheney added this to the 0.11.0 milestone Apr 3, 2019

stevesloka mentioned this pull request Apr 5, 2019

prometheus.io/path should be /stats/prometheus #976

Closed

stevesloka force-pushed the metrics branch from a9c8807 to 4cb193c Compare April 5, 2019 15:12

rata reviewed Apr 5, 2019

View reviewed changes

davecheney modified the milestones: 0.11.0, 0.12.0 Apr 8, 2019

timh removed this from the 0.12.0 milestone Apr 10, 2019

davecheney added this to the 0.12.0 milestone Apr 10, 2019

stevesloka force-pushed the metrics branch from 4cb193c to 80ea367 Compare April 10, 2019 14:47

rata reviewed Apr 10, 2019

View reviewed changes

alexbrand reviewed Apr 15, 2019

View reviewed changes

davecheney approved these changes Apr 16, 2019

View reviewed changes

rata approved these changes Apr 16, 2019

View reviewed changes

stevesloka force-pushed the metrics branch from 80ea367 to 8075624 Compare April 22, 2019 14:41

Add example deployments for prometheus / grafana & update

cbd2391

ds-hostnet-split to match quickstart defined in /docs/prometheus.md Signed-off-by: Steve Sloka <slokas@vmware.com>

stevesloka force-pushed the metrics branch from 8075624 to cbd2391 Compare April 22, 2019 14:43

stevesloka mentioned this pull request Apr 22, 2019

Update metrics examples for Envoy v1.10 #1035

Closed

stevesloka merged commit 1e4d810 into projectcontour:master Apr 22, 2019

stevesloka deleted the metrics branch April 22, 2019 15:31

Conversation

stevesloka commented Apr 1, 2019

Uh oh!

stevesloka commented Apr 1, 2019

Uh oh!

jipperinbham Apr 1, 2019

Choose a reason for hiding this comment

Uh oh!

davecheney commented Apr 4, 2019

Uh oh!

rata Apr 5, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stevesloka Apr 10, 2019

Choose a reason for hiding this comment

Uh oh!

rata Apr 10, 2019

Choose a reason for hiding this comment

Uh oh!

davecheney commented Apr 9, 2019

Uh oh!

stevesloka commented Apr 10, 2019

Uh oh!

rata Apr 10, 2019

Choose a reason for hiding this comment

Uh oh!

stevesloka Apr 10, 2019

Choose a reason for hiding this comment

Uh oh!

rata Apr 11, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

davecheney commented Apr 14, 2019

Uh oh!

alexbrand left a comment

Choose a reason for hiding this comment

Uh oh!

alexbrand Apr 15, 2019

Choose a reason for hiding this comment

Uh oh!

stevesloka Apr 22, 2019

Choose a reason for hiding this comment

Uh oh!

davecheney left a comment

Choose a reason for hiding this comment

Uh oh!

rata left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stevesloka commented Apr 22, 2019

Uh oh!

rata commented Apr 22, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

rata Apr 5, 2019 •

edited

Loading

rata Apr 11, 2019 •

edited

Loading

rata left a comment •

edited

Loading

rata commented Apr 22, 2019 •

edited

Loading