Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Prometheus metrics for the upstream #918

Merged
merged 4 commits into from
Oct 3, 2018
Merged

Add Prometheus metrics for the upstream #918

merged 4 commits into from
Oct 3, 2018

Conversation

davidor
Copy link
Contributor

@davidor davidor commented Oct 2, 2018

Part of #745

@davidor davidor requested a review from a team as a code owner October 2, 2018 16:40
@@ -110,4 +112,8 @@ function _M:metrics()
end
end

function _M:log()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unused argument 'self'

@davidor davidor requested a review from mikz October 2, 2018 16:55
@@ -110,4 +112,8 @@ function _M:metrics()
end
end

function _M.log()
upstream_metrics.report(ngx.var.upstream_status, ngx.var.upstream_response_time)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should use labels to differentiate different upstreams. Having just one metric for all upstreams won't make it much useful as you won't see which one it is taking longer or failing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it can be helpful to check if there's a noticeable mismatch between apicast vs upstream status codes.

I agree that it would be more useful to have a label for the upstreams. However, I'm not sure about the performance impact that it might have. The number of upstreams could be very large and the Prometheus docs recommend against using labels that can have many different values. See the caution note here: https://prometheus.io/docs/practices/naming/#labels . Also: https://prometheus.io/docs/practices/instrumentation/#do-not-overuse-labels

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davidor btw upstream_status can be missing if any policy terminates the request.
We have this code in APIcast Cloud Hosted to measure the difference between upstream_status and ngx.status: https://github.com/3scale/apicast-cloud-hosted/blob/e5da0fc2f9df0d0ea89339ab8bfab3d27d169f78/apicast/policies/cloud_hosted.balancer_blacklist/0.1/balancer_blacklist.lua#L104-L117

The prometheus docs say that we should avoid cardinality higher than 100. I don't think we could get there. If we just store the host and port as a label it should be fine.

But also it is something that can be done later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added specs and modified the code to make sure that we cover the cases where the status or the latency are nil or empty.
de684be

@davidor davidor merged commit f7145e6 into master Oct 3, 2018
@davidor davidor deleted the upstream-metrics branch October 3, 2018 10:08
@davidor davidor added this to the 3.4 milestone Oct 4, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants