Add Prometheus metrics for the upstream #918

davidor · 2018-10-02T16:40:46Z

Part of #745

codeclimate · 2018-10-02T16:41:37Z

gateway/src/apicast/policy/nginx_metrics/nginx_metrics.lua

@@ -110,4 +112,8 @@ function _M:metrics()
  end
 end

+function _M:log()


unused argument 'self'

mikz · 2018-10-02T16:58:14Z

gateway/src/apicast/policy/nginx_metrics/nginx_metrics.lua

@@ -110,4 +112,8 @@ function _M:metrics()
  end
 end

+function _M.log()
+  upstream_metrics.report(ngx.var.upstream_status, ngx.var.upstream_response_time)


I think we should use labels to differentiate different upstreams. Having just one metric for all upstreams won't make it much useful as you won't see which one it is taking longer or failing.

I think it can be helpful to check if there's a noticeable mismatch between apicast vs upstream status codes.

I agree that it would be more useful to have a label for the upstreams. However, I'm not sure about the performance impact that it might have. The number of upstreams could be very large and the Prometheus docs recommend against using labels that can have many different values. See the caution note here: https://prometheus.io/docs/practices/naming/#labels . Also: https://prometheus.io/docs/practices/instrumentation/#do-not-overuse-labels

@davidor btw upstream_status can be missing if any policy terminates the request.
We have this code in APIcast Cloud Hosted to measure the difference between upstream_status and ngx.status: https://github.com/3scale/apicast-cloud-hosted/blob/e5da0fc2f9df0d0ea89339ab8bfab3d27d169f78/apicast/policies/cloud_hosted.balancer_blacklist/0.1/balancer_blacklist.lua#L104-L117

The prometheus docs say that we should avoid cardinality higher than 100. I don't think we could get there. If we just store the host and port as a label it should be fine.

But also it is something that can be done later.

I added specs and modified the code to make sure that we cover the cases where the status or the latency are nil or empty.
de684be

davidor requested a review from a team as a code owner October 2, 2018 16:40

codeclimate bot reviewed Oct 2, 2018

View reviewed changes

davidor force-pushed the upstream-metrics branch from fa1a8fb to 38a0595 Compare October 2, 2018 16:44

davidor requested a review from mikz October 2, 2018 16:55

mikz reviewed Oct 2, 2018

View reviewed changes

mikz approved these changes Oct 2, 2018

View reviewed changes

davidor added 4 commits October 3, 2018 11:57

metrics: add metrics for the upstream

273875c

spec/metrics: add specs for upstream metrics

de684be

t/prometheus-metrics: test upstream metrics

2966401

CHANGELOG: add upstream metrics

68122b9

davidor force-pushed the upstream-metrics branch from e686e4f to 68122b9 Compare October 3, 2018 09:57

mikz approved these changes Oct 3, 2018

View reviewed changes

davidor merged commit f7145e6 into master Oct 3, 2018

davidor deleted the upstream-metrics branch October 3, 2018 10:08

davidor added this to the 3.4 milestone Oct 4, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Prometheus metrics for the upstream #918

Add Prometheus metrics for the upstream #918

davidor commented Oct 2, 2018

codeclimate bot Oct 2, 2018

mikz Oct 2, 2018

davidor Oct 2, 2018

mikz Oct 2, 2018

davidor Oct 3, 2018

Add Prometheus metrics for the upstream #918

Add Prometheus metrics for the upstream #918

Conversation

davidor commented Oct 2, 2018

codeclimate bot Oct 2, 2018

Choose a reason for hiding this comment

mikz Oct 2, 2018

Choose a reason for hiding this comment

davidor Oct 2, 2018

Choose a reason for hiding this comment

mikz Oct 2, 2018

Choose a reason for hiding this comment

davidor Oct 3, 2018

Choose a reason for hiding this comment