Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metrics policy #860

Merged
merged 11 commits into from
Aug 30, 2018
Merged

Metrics policy #860

merged 11 commits into from
Aug 30, 2018

Conversation

davidor
Copy link
Contributor

@davidor davidor commented Aug 29, 2018

Closes (partially) #745

This is an initial version with a reduced number of metrics. For now, it includes:

  • 3scale backend response status codes.
  • connections (accepted, active, handled, reading, total, waiting, writing).
  • nginx errors.
  • capacity of every shared dictionary.
  • free space of every shared dictionary.

This is an example output:

# HELP backend_response Response status codes from 3scale's backend
# TYPE backend_response counter
backend_response{status="2xx"} 1
# HELP nginx_http_connections Number of HTTP connections
# TYPE nginx_http_connections gauge
nginx_http_connections{state="accepted"} 3
nginx_http_connections{state="active"} 1
nginx_http_connections{state="handled"} 3
nginx_http_connections{state="reading"} 0
nginx_http_connections{state="total"} 3
nginx_http_connections{state="waiting"} 0
nginx_http_connections{state="writing"} 1
# HELP nginx_metric_errors_total Number of nginx-lua-prometheus errors
# TYPE nginx_metric_errors_total counter
nginx_metric_errors_total 0
# HELP openresty_shdict_capacity OpenResty shared dictionary capacity
# TYPE openresty_shdict_capacity gauge
openresty_shdict_capacity{dict="api_keys"} 10485760
openresty_shdict_capacity{dict="batched_reports"} 1048576
openresty_shdict_capacity{dict="batched_reports_locks"} 1048576
openresty_shdict_capacity{dict="cached_auths"} 1048576
openresty_shdict_capacity{dict="configuration"} 10485760
openresty_shdict_capacity{dict="init"} 16384
openresty_shdict_capacity{dict="limiter"} 1048576
openresty_shdict_capacity{dict="locks"} 1048576
openresty_shdict_capacity{dict="prometheus_metrics"} 16777216
# HELP openresty_shdict_free_space OpenResty shared dictionary free space
# TYPE openresty_shdict_free_space gauge
openresty_shdict_free_space{dict="api_keys"} 10412032
openresty_shdict_free_space{dict="batched_reports"} 1032192
openresty_shdict_free_space{dict="batched_reports_locks"} 1032192
openresty_shdict_free_space{dict="cached_auths"} 1032192
openresty_shdict_free_space{dict="configuration"} 10412032
openresty_shdict_free_space{dict="init"} 4096
openresty_shdict_free_space{dict="limiter"} 1032192
openresty_shdict_free_space{dict="locks"} 1032192
openresty_shdict_free_space{dict="prometheus_metrics"} 16662528

@davidor davidor requested a review from a team as a code owner August 29, 2018 13:13
end

local function filter_level()
local filter_level = resty_env.value(log_level_env) or log_level_default
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shadowing upvalue 'filter_level' on line 42

end

local function filter_level()
local filter_level = resty_env.value(log_level_env) or log_level_default
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shadowing upvalue 'filter_level' on line 43

)

function _M.inc(status)
if status >= 200 and status < 300 then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't be better to use math ?

string.format('%dxx', (404/100))

And handle nil and 0 separately.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍


local new = _M.new

local log_map = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be log_list? map would mean a hash table?

'debug',
}

local log_level_env = 'METRICS_LOG_LEVEL'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should prefix it with APICAST_ ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's better to name it NGINX_METRICS_LOG_LEVEL as per your comment below about the name of the policy.

local log_level_default = 'error'
local max_logs_default = 100

local function find_i(t, value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now when thinking about it. It might be good to hash this as value => i so we have constant time lookup.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that'd be better for when we want the index, but there's also https://github.com/3scale/apicast/blob/f371e5e4ec18c5046175eb2abc07d7b1c6928e7e/gateway/src/apicast/policy/metrics/metrics.lua#L88 below, so looks like we need both (value=>i, i=>value)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, thats fine no? Just need to process it once.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now it only iterates through all the elements (and there are only 8 of them) in init().
In metrics() it's fetching an element from the list given an index, so this is fine as it is now. I changed the name of the var as you suggested above.

@@ -0,0 +1,113 @@
local _M = require('apicast.policy').new('Metrics')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to call this Nginx Metrics ? Probably need some input from @3scale/product.

Several policies will be exposing metrics and this one is really focused on internal nginx stuff like shared dictionaries, connections etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. I think we can go ahead and change it.
This is a policy included by default that does not have a JSON manifest. That means that the name will not be visible anywhere in the UI.

function _M:metrics()
local logs = get_logs(self.max_logs)

for i = 1, #logs, 3 do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each of those 3 different metrics could be own function to reduce complexity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather address refactorings in a future PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@davidor
Copy link
Contributor Author

davidor commented Aug 30, 2018

@mikz I addressed all the comments except one #860 (comment) which I'd rather address separately.
I've also added an integration test for the backend response metrics: c70096d

@davidor davidor changed the title [WIP] Metrics policy Metrics policy Aug 30, 2018

function _M.inc(status)
if not status or status == 0 then
ngx.log(ngx.WARN, 'Invalid status received')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it better to actually measure this ? It would be interesting metric no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I think the http client might return 0 in some error cases, for example. Would be interesting to monitor those as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor

@mikz mikz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is fine 👍

I'd really consider measuring also the invalid responses from 3scale backend. Those are even more interesting than 2xx and 4xx because you actually want to have alerts when that happens.

And it would be good to have a list of metrics and an example in the PR description.

@davidor
Copy link
Contributor Author

davidor commented Aug 30, 2018

I added the list of metrics and an example in the PR description.

@davidor davidor merged commit c27e7f5 into master Aug 30, 2018
@davidor davidor deleted the metrics-policy branch August 30, 2018 14:13
@davidor davidor mentioned this pull request Aug 31, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants