-
Notifications
You must be signed in to change notification settings - Fork 170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Basic prometheus metrics #745
Comments
APIcast policy does several things:
I think the APIcast policy metrics should be focused on those operations (and maybe on some other I missed). IMO that does not include nginx error log, free shdict space (unless monitoring shdicts APIcast uses), nginx connections. If we want those metrics, then they should be in some other (possible active by default ) policy. Error log and shdict space monitoring are very important metrics that should be available somehow. One metric we could do is how many requests (and with what status) was terminated a policy and not came from upstream). |
Reading between the lines, it sounds like there's a slight difference in where the metrics are implemented. Joaquim lists a list of metrics, and suggests doing in apicast policy. Michal lists a set that are related to apicast policy operations and would make sense to do there. But not clear about how to do the ones that are not apicast policy related? BTW: I think I saw people asking for "#BytesTransferred" also in sme-apis. |
@andrewdavidmackenzie metrics is a phase implemented by policies. Each policy can expose metrics about its' operation and they are in the end merged together. So APIcast policy would expose metrics about itself and other policies would expose metrics about their functionality. Of course there could be policies that just expose metrics. |
OK, cool. My main point was that it sounded to me like Joaquim was asking for some metrics that are not related to any specific policy, but the underlying NGINX? |
Yes. And we can expose them in some other policy or make every policy responsible for monitoring own free space. But we will need some global non apicast/3scale metrics anyway, so probably better to shove it to some nginx metrics policy. |
👍 |
@mikz Yes, it makes sense to have specific apicast policy metrics (related mostly to threescale, and operation mode) and then have another policy for basic metrics. |
Ping @3scale/product |
@davidor I think this is not necessary for the next release, but it is possible will be there done by ostia team. |
This was raised last week by Product as a last minute request. |
I recommend the base set of metrics to start: 3scale-auth status codes: Total, 2xx, 4xx, 5xx
|
This is what is going to be included in 3.3: #860 |
or close and have a new enhancement issue to discuss what to add? (It's nice to see issues get closed.....) |
Are prometheus metrics available in the current master version of apicast? I'm curling the /metrics endpoint where you would normally find prometheus metrics and not seeing anything:
Is there an environment variable that needs to be enabled? |
@gnunn1 Metrics are exposed on port $ curl localhost:9421/metrics
# HELP nginx_http_connections Number of HTTP connections
# TYPE nginx_http_connections gauge
nginx_http_connections{state="accepted"} 1
nginx_http_connections{state="active"} 1
nginx_http_connections{state="handled"} 1
nginx_http_connections{state="reading"} 0
nginx_http_connections{state="total"} 1
nginx_http_connections{state="waiting"} 0
nginx_http_connections{state="writing"} 1
# HELP nginx_metric_errors_total Number of nginx-lua-prometheus errors
# TYPE nginx_metric_errors_total counter
nginx_metric_errors_total 0
# HELP openresty_shdict_capacity OpenResty shared dictionary capacity
# TYPE openresty_shdict_capacity gauge
openresty_shdict_capacity{dict="api_keys"} 10485760
openresty_shdict_capacity{dict="batched_reports"} 1048576
openresty_shdict_capacity{dict="batched_reports_locks"} 1048576
openresty_shdict_capacity{dict="cached_auths"} 1048576
openresty_shdict_capacity{dict="configuration"} 10485760
openresty_shdict_capacity{dict="init"} 16384
openresty_shdict_capacity{dict="limiter"} 1048576
openresty_shdict_capacity{dict="locks"} 1048576
openresty_shdict_capacity{dict="prometheus_metrics"} 16777216
# HELP openresty_shdict_free_space OpenResty shared dictionary free space
# TYPE openresty_shdict_free_space gauge
openresty_shdict_free_space{dict="api_keys"} 10412032
openresty_shdict_free_space{dict="batched_reports"} 1032192
openresty_shdict_free_space{dict="batched_reports_locks"} 1032192
openresty_shdict_free_space{dict="cached_auths"} 1032192
openresty_shdict_free_space{dict="configuration"} 10412032
openresty_shdict_free_space{dict="init"} 4096
openresty_shdict_free_space{dict="limiter"} 1032192
openresty_shdict_free_space{dict="locks"} 1032192
openresty_shdict_free_space{dict="prometheus_metrics"} 16662528 |
Thanks @mikz, that works fine. Are 4xx and 5xx response codes supposed to be captured in the prometheus metrics like the 2xx response codes? If I execute a request in postman that generates a 404 response, i.e. requesting a REST entity that doesn't exist or where a mapping rule hasn't been set in 3scale, I don't see the 4xx response status codes being returned with I do have Here's the output of metrics after making a few 404 requests:
|
@gnunn1 , I'm using the version in the master branch and it works for me. I made a request with a valid user_key and another with an invalid one and this is what I get:
Keep in mind that the When a request does not match any mapping rules, APIcast does not need to contact the 3scale backend because mapping rules are stored in the APIcast configuration. APIcast only needs to call backend to validate credentials (user_key, app_key, etc.) and to report metrics. |
@davidor The URL I am using to hit the service is:
This returns a 200 since order 3 is an available item. However if I change 3 to 5 as follows:
The backend service returns a 404 since order 5 doesn't exist. Interestingly the prometheus metrics increments the 2xx response as a result of this despite postman showing that 404 is returned. Is this a case where a 404 is considered "successful" since in REST calls it can be a valid response? Doesn't feel intuitive though if this is the case and maybe deserving of it's own category? Validating this with curl against apicast:
And then directly against the backend:
If I change the user-key to an invalid entry then the 4xx is incremented in response to 403 forbidden as per your findings. With regards to your explanation about why the Not Matching mapping rules scenario doesn't increment the counter, I'm curious why a bad user-key increments the 4xx counter on a 403 since presumably apicast never calls the backend in this scenario either? |
@gnunn1 the metric But it is a good point to rename the metric, as |
"backend" is an internal term we try to avoid using "externally" (customer visible). (3scale) "Service Management API" is the official term of the API that is used and returns that response. Either a generic "3scale" or "authrep request" or something is needed to clarify this. |
We've added several metrics in different PRs. All of them are linked in this issue. |
@davidor I installed the apicast from master and can see the response times however if I am reading them correctly they are global to the gateway rather then service or mapping/metric specific. Are there any plans to make these more granular so we could build more specific dashboards in something like grafana?
|
@gnunn1 Including services, upstreams, metrics, etc. in the Prometheus labels is something we'll evaluate in the future. Prometheus might not be the right tool to store that kind of information. According to the Prometheus guidelines it is not recommended to use labels for dimensions that can have a large number of values, and in some deployments, the number of services, upstreams, and 3scale metrics can be very high. |
APIcast ships with prometheus support, but only exposes the
nginx_metric_errors_total
metric.I would like to propose some basic metrics to be added to the APIcast base policy:
Counters
Request:
Connections:
Nginx Error Log
Free Dictionary Space
Threescale (fetching config):
Histogram
Some of them where already added to apicast-cloud-hosted: https://github.com/3scale/apicast-cloud-hosted/pull/5/files#diff-047b1780b0ffeb4eba7b3d05beb76d5e
What do you think? Any other metrics to add?
The text was updated successfully, but these errors were encountered: