Add example deployments for prometheus / grafana#970
Add example deployments for prometheus / grafana#970stevesloka merged 1 commit intoprojectcontour:masterfrom
Conversation
|
Thanks to @jipperinbham fixed a quick issue with the statsd forwarder. =) |
| target_label: kubernetes_pod_name | ||
| metric_relabel_configs: | ||
| - source_labels: [envoy_cluster_name] | ||
| regex: '(.+?)\/(.+?)\/(.*)' |
There was a problem hiding this comment.
need to replace \/ with _ here as well and the two spots below
|
I'm not qualified to review this but i'm glad to see these prometheus dashboards included. |
| - name: statsd-sink | ||
| image: prom/statsd-exporter:v0.6.0 | ||
| command: | ||
| - "/bin/statsd_exporter" |
There was a problem hiding this comment.
Just curious: envoy 1.10.0 has just been released (like 2 hours ago) and supports histograms in prometheus export now (envoyproxy/envoy#5601 you shared it on slack a few days ago, so I understand you are aware :-D) . Does the sidecar add more metrics if envoy 1.10 is used?
I guess the answer is yes, as you shared that link on slack, but I really don't know and would like to understand if it doesn't bother you :). Also, I'm not sure when contour plans to upgrade to envoy 1.10, so even if the sidecar does not add more metrics, this of course can be totally relevant :-)
There was a problem hiding this comment.
@rata yes Envoy v1.10 has Histograms enabled so we can remove the statsd forwarder bit. I'm not sure when we'll upgrade to v1.10, we typically wait a bit to let any issues shake out. I'd like to still merge this for v1.9.1 and then we'll update with a future PR once we move to envoy v1.10
There was a problem hiding this comment.
Thanks a lot for the info! :)
|
@stevesloka @rata @jipperinbham can you please finalise the review of this PR. I'd like it to either land or move back into development by EOD April 12. Thank you. |
|
I think this is good to merge @davecheney, there is an issue with how the metrics are visualized which is tracked in #1008 |
| prometheus.io/path: "/stats" | ||
| prometheus.io/format: "prometheus" | ||
| prometheus.io/statsdport: "9102" | ||
| prometheus.io/path: "/stats/prometheus" |
There was a problem hiding this comment.
Am I missing something or this will scrape only envoy metrics? stastd, I think, will gather only envoy and so will port 8002. Right?
I think metrics for the contour container (on endpoint /metrics and port 8000) are not gathered by this. Am I missing something? Hopefully I am :)
I will try to create a cluster and see if those are missing or not, it's too late here now to try this :-(
There was a problem hiding this comment.
@rata Contour metrics are scraped on the Contour deployment. This example is applied to the split model where Contour & Envoy are not colocated in the same pod.
There was a problem hiding this comment.
Ohh, I see. Silly me, it was not obvious just looking quickly at the diff. Thanks! :)
|
@stevesloka can you merge this or are you blocked on #1008? |
alexbrand
left a comment
There was a problem hiding this comment.
I tested this with the ds_hostnet_split deployment of contour/envoy and was able to visualize metrics for both components successfully.
I have added one comment that can perhaps be solved in a follow-up PR if we want to land this.
| - protocol: TCP | ||
| name: prometheus | ||
| port: 9090 | ||
| - protocol: TCP |
There was a problem hiding this comment.
Should we remove this? Doesn't seem like we should be targetting the alertmanager in the prometheus service.
There was a problem hiding this comment.
Thanks for the review @alexbrand, I just dropped the alertmanager reference.
davecheney
left a comment
There was a problem hiding this comment.
LGTM. Please merge this, we can fix #1008 between now and the end of the cycle.
There was a problem hiding this comment.
@davecheney I've tried this in a test cluster and could use it to gather all the metrics (envoy, including envoy histograms, contour metrics, etc.). LGTM, thanks for asking :)
As a side note, I think docs or manifests can be improved a little bit. Let me explain what I mean.
With this example, I think it seems you need to split envoy and contour in different pods so they can be monitored (of course it is not the case, but it was confusing for me). Because this example takes advantage of the containers split to scrape them, by using annotations on the pod spec. The thing is, the annotations support only one port and path per pod, but if both containers are running on the same pod is not clear how to scrape both (the annotations in the pod spec don't let you do that, by default). So, IMHO, it is not clear from this example how to monitor when envoy and contour are co-located in the same pod.
Of course it can be monitored just fine even if contour and envoy are co-located in the same pod. But, at least for me, when looking at the PR it was not obvious. I though annotations, that only one per pod can be used, made it difficult and I thought that is why the contour and pod container were split in different pod for the monitoring example. So it would have helped me at least if this was more clearly documented :-)
And in fact, I don't think it is totally obvious if you are new to contour and prometheus (like me :)).
So, I'd consider (maybe on a different PR? As you prefer, if you think it is relevant) improving the doc so it is clear how to monitor when contour and envoy are in the same pod. Or, even better IMHO, just adding the annotations to the yamls too so it works out of the box.
Right now (before and after this PR, this PR doesn't change that) the deployments in deployment/deployment-grpc-v2 and deployment/ds-grpc-v2 have annotations to scrape envoy metrics but not contour ones. We can just add annotations to scrape for contour metrics too.
That would make deployments using the yamls work with prometheus, and have contour and envoy metrics.
Histogram metrics for envoy will be missing (statsd sidecar is needed when using envoy < 1.10), but you can look at this doc and add the sidecar.
And hopefully, when we update to envoy 1.10, the sidecar won't even be needed and all (envoy metrics, including histograms, contour metrics) will work out of the box.
What do you think @stevesloka @davecheney ?
|
@rata thanks for the reply! I agree with you that this needs some work. We did most of this work in Gimbal which only uses the split-model deployment because of network perf reasons. My goal with this was to enable folks to start to look at what we had for metrics and stir inspiration (like you've been doing). =) I think once we can move to Envoy 1.10 then we can drop the statsd_forwarder and make all this much simpler. I'd like to merge this, then open a new issue to track the work that needs to be done once we're at Envoy v1.1.0. |
ds-hostnet-split to match quickstart defined in /docs/prometheus.md Signed-off-by: Steve Sloka <slokas@vmware.com>
|
@stevesloka sounds good to me. Let me know if I can help with the related issues. Thanks again! :-) |
Fixes #969 by adding example deployments for prometheus & grafana. It updates the
ds-hostnet-splitexample to match quickstart defined in /docs/prometheus.md so that users can follow along and end up with a working solution.This also adds sample dashboards to Grafana which provide metrics for both Contour & Envoy.
Signed-off-by: Steve Sloka slokas@vmware.com