Skip to content

Add example deployments for prometheus / grafana#970

Merged
stevesloka merged 1 commit intoprojectcontour:masterfrom
stevesloka:metrics
Apr 22, 2019
Merged

Add example deployments for prometheus / grafana#970
stevesloka merged 1 commit intoprojectcontour:masterfrom
stevesloka:metrics

Conversation

@stevesloka
Copy link
Member

Fixes #969 by adding example deployments for prometheus & grafana. It updates the ds-hostnet-split example to match quickstart defined in /docs/prometheus.md so that users can follow along and end up with a working solution.

This also adds sample dashboards to Grafana which provide metrics for both Contour & Envoy.

Signed-off-by: Steve Sloka slokas@vmware.com

@stevesloka stevesloka changed the title Add example deployments for prometheus / grafana wip: Add example deployments for prometheus / grafana Apr 1, 2019
@stevesloka stevesloka changed the title wip: Add example deployments for prometheus / grafana Add example deployments for prometheus / grafana Apr 1, 2019
@stevesloka
Copy link
Member Author

Thanks to @jipperinbham fixed a quick issue with the statsd forwarder. =)

target_label: kubernetes_pod_name
metric_relabel_configs:
- source_labels: [envoy_cluster_name]
regex: '(.+?)\/(.+?)\/(.*)'

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to replace \/ with _ here as well and the two spots below

@davecheney davecheney added this to the 0.11.0 milestone Apr 3, 2019
@davecheney
Copy link
Contributor

I'm not qualified to review this but i'm glad to see these prometheus dashboards included.

- name: statsd-sink
image: prom/statsd-exporter:v0.6.0
command:
- "/bin/statsd_exporter"
Copy link
Contributor

@rata rata Apr 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious: envoy 1.10.0 has just been released (like 2 hours ago) and supports histograms in prometheus export now (envoyproxy/envoy#5601 you shared it on slack a few days ago, so I understand you are aware :-D) . Does the sidecar add more metrics if envoy 1.10 is used?

I guess the answer is yes, as you shared that link on slack, but I really don't know and would like to understand if it doesn't bother you :). Also, I'm not sure when contour plans to upgrade to envoy 1.10, so even if the sidecar does not add more metrics, this of course can be totally relevant :-)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rata yes Envoy v1.10 has Histograms enabled so we can remove the statsd forwarder bit. I'm not sure when we'll upgrade to v1.10, we typically wait a bit to let any issues shake out. I'd like to still merge this for v1.9.1 and then we'll update with a future PR once we move to envoy v1.10

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the info! :)

@davecheney davecheney modified the milestones: 0.11.0, 0.12.0 Apr 8, 2019
@davecheney
Copy link
Contributor

@stevesloka @rata @jipperinbham can you please finalise the review of this PR. I'd like it to either land or move back into development by EOD April 12. Thank you.

@timh timh removed this from the 0.12.0 milestone Apr 10, 2019
@davecheney davecheney added this to the 0.12.0 milestone Apr 10, 2019
@stevesloka
Copy link
Member Author

I think this is good to merge @davecheney, there is an issue with how the metrics are visualized which is tracked in #1008

prometheus.io/path: "/stats"
prometheus.io/format: "prometheus"
prometheus.io/statsdport: "9102"
prometheus.io/path: "/stats/prometheus"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I missing something or this will scrape only envoy metrics? stastd, I think, will gather only envoy and so will port 8002. Right?

I think metrics for the contour container (on endpoint /metrics and port 8000) are not gathered by this. Am I missing something? Hopefully I am :)

I will try to create a cluster and see if those are missing or not, it's too late here now to try this :-(

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rata Contour metrics are scraped on the Contour deployment. This example is applied to the split model where Contour & Envoy are not colocated in the same pod.

Copy link
Contributor

@rata rata Apr 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohh, I see. Silly me, it was not obvious just looking quickly at the diff. Thanks! :)

@davecheney
Copy link
Contributor

@stevesloka can you merge this or are you blocked on #1008?

Copy link
Contributor

@alexbrand alexbrand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested this with the ds_hostnet_split deployment of contour/envoy and was able to visualize metrics for both components successfully.

I have added one comment that can perhaps be solved in a follow-up PR if we want to land this.

- protocol: TCP
name: prometheus
port: 9090
- protocol: TCP
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we remove this? Doesn't seem like we should be targetting the alertmanager in the prometheus service.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review @alexbrand, I just dropped the alertmanager reference.

Copy link
Contributor

@davecheney davecheney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Please merge this, we can fix #1008 between now and the end of the cycle.

Copy link
Contributor

@rata rata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davecheney I've tried this in a test cluster and could use it to gather all the metrics (envoy, including envoy histograms, contour metrics, etc.). LGTM, thanks for asking :)

As a side note, I think docs or manifests can be improved a little bit. Let me explain what I mean.

With this example, I think it seems you need to split envoy and contour in different pods so they can be monitored (of course it is not the case, but it was confusing for me). Because this example takes advantage of the containers split to scrape them, by using annotations on the pod spec. The thing is, the annotations support only one port and path per pod, but if both containers are running on the same pod is not clear how to scrape both (the annotations in the pod spec don't let you do that, by default). So, IMHO, it is not clear from this example how to monitor when envoy and contour are co-located in the same pod.

Of course it can be monitored just fine even if contour and envoy are co-located in the same pod. But, at least for me, when looking at the PR it was not obvious. I though annotations, that only one per pod can be used, made it difficult and I thought that is why the contour and pod container were split in different pod for the monitoring example. So it would have helped me at least if this was more clearly documented :-)

And in fact, I don't think it is totally obvious if you are new to contour and prometheus (like me :)).

So, I'd consider (maybe on a different PR? As you prefer, if you think it is relevant) improving the doc so it is clear how to monitor when contour and envoy are in the same pod. Or, even better IMHO, just adding the annotations to the yamls too so it works out of the box.

Right now (before and after this PR, this PR doesn't change that) the deployments in deployment/deployment-grpc-v2 and deployment/ds-grpc-v2 have annotations to scrape envoy metrics but not contour ones. We can just add annotations to scrape for contour metrics too.

That would make deployments using the yamls work with prometheus, and have contour and envoy metrics.

Histogram metrics for envoy will be missing (statsd sidecar is needed when using envoy < 1.10), but you can look at this doc and add the sidecar.

And hopefully, when we update to envoy 1.10, the sidecar won't even be needed and all (envoy metrics, including histograms, contour metrics) will work out of the box.

What do you think @stevesloka @davecheney ?

@stevesloka
Copy link
Member Author

@rata thanks for the reply! I agree with you that this needs some work. We did most of this work in Gimbal which only uses the split-model deployment because of network perf reasons. My goal with this was to enable folks to start to look at what we had for metrics and stir inspiration (like you've been doing). =)

I think once we can move to Envoy 1.10 then we can drop the statsd_forwarder and make all this much simpler. I'd like to merge this, then open a new issue to track the work that needs to be done once we're at Envoy v1.1.0.

ds-hostnet-split to match quickstart defined in /docs/prometheus.md

Signed-off-by: Steve Sloka <slokas@vmware.com>
@rata
Copy link
Contributor

rata commented Apr 22, 2019

@stevesloka sounds good to me. Let me know if I can help with the related issues. Thanks again! :-)

@stevesloka stevesloka merged commit 1e4d810 into projectcontour:master Apr 22, 2019
@stevesloka stevesloka deleted the metrics branch April 22, 2019 15:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Example metrics

6 participants