-
Notifications
You must be signed in to change notification settings - Fork 170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Prometheus metrics for the upstream #918
Conversation
@@ -110,4 +112,8 @@ function _M:metrics() | |||
end | |||
end | |||
|
|||
function _M:log() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unused argument 'self'
fa1a8fb
to
38a0595
Compare
@@ -110,4 +112,8 @@ function _M:metrics() | |||
end | |||
end | |||
|
|||
function _M.log() | |||
upstream_metrics.report(ngx.var.upstream_status, ngx.var.upstream_response_time) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should use labels to differentiate different upstreams. Having just one metric for all upstreams won't make it much useful as you won't see which one it is taking longer or failing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it can be helpful to check if there's a noticeable mismatch between apicast vs upstream status codes.
I agree that it would be more useful to have a label for the upstreams. However, I'm not sure about the performance impact that it might have. The number of upstreams could be very large and the Prometheus docs recommend against using labels that can have many different values. See the caution note here: https://prometheus.io/docs/practices/naming/#labels . Also: https://prometheus.io/docs/practices/instrumentation/#do-not-overuse-labels
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@davidor btw upstream_status can be missing if any policy terminates the request.
We have this code in APIcast Cloud Hosted to measure the difference between upstream_status and ngx.status: https://github.com/3scale/apicast-cloud-hosted/blob/e5da0fc2f9df0d0ea89339ab8bfab3d27d169f78/apicast/policies/cloud_hosted.balancer_blacklist/0.1/balancer_blacklist.lua#L104-L117
The prometheus docs say that we should avoid cardinality higher than 100. I don't think we could get there. If we just store the host and port as a label it should be fine.
But also it is something that can be done later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added specs and modified the code to make sure that we cover the cases where the status or the latency are nil or empty.
de684be
e686e4f
to
68122b9
Compare
Part of #745