-
-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add per_host
option to metrics
handler
#6279
Conversation
Cardinality is a Caddy problem, because some users serve thousands of domains via On-Demand TLS. That said, I'll let @hairyhenderson address this, I'm not in touch with metrics stuff. |
Cardinality limitation is a Prometheus limitation not a Caddy one. It is a consideration that you would have to deal with in Caddy or in any other system that uses Prometheus. I think Caddy should not make it inherently impossible to do this, but have the option there with a sane default – much like having |
I'm OK in general with this feature, as long as it defaults to @hussam-almarzoq do you think you could issue a companion PR for caddyserver/website that we could merge soon after this? I have a few review comments - will post those inline. |
modules/caddyhttp/metrics.go
Outdated
@@ -33,7 +35,7 @@ var httpMetrics = struct { | |||
func initHTTPMetrics() { | |||
const ns, sub = "caddy", "http" | |||
|
|||
basicLabels := []string{"server", "handler"} | |||
basicLabels := []string{"server", "handler", "host"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rather than always setting host
even when per_host
is false, can you pass through the perHost
bool and set it conditionally?
basicLabels := []string{"server", "handler", "host"} | |
basicLabels := []string{"server", "handler"} | |
if perHost { | |
basicLabels = append(basicLabels, "host") | |
} |
(same for httpLabels
below on ln62)
modules/caddyhttp/metrics.go
Outdated
@@ -144,7 +152,7 @@ func (h *metricsInstrumentedHandler) ServeHTTP(w http.ResponseWriter, r *http.Re | |||
// probably falling through with an empty handler. | |||
if statusLabels["code"] == "" { | |||
// we still sanitize it, even though it's likely to be 0. A 200 is | |||
// returned on fallthrough so we want to reflect that. | |||
// returned on fallthrough, so we want to reflect that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is unrelated to this PR - can you revert it? (I don't disagree with it, but it's also entirely unnecessary as the previous sentence was also grammatically correct)
// returned on fallthrough, so we want to reflect that. | |
// returned on fallthrough so we want to reflect that. |
modules/caddyhttp/metrics.go
Outdated
server := serverNameFromContext(r.Context()) | ||
labels := prometheus.Labels{"server": server, "handler": h.handler} | ||
labels := prometheus.Labels{"server": server, "handler": h.handler, "host": host} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should conditionally set host
:
labels := prometheus.Labels{"server": server, "handler": h.handler, "host": host} | |
labels := prometheus.Labels{"server": server, "handler": h.handler} | |
if h.perHost { | |
labels["host"] = host | |
} |
modules/caddyhttp/metrics_test.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please also add some tests for the case where per_host
is set?
In chatting with @francislavoie we realized that only adding the Until that's done, I think this change is blocked. |
I still disagree with cardinality not being an issue. If a user has this enabled and is serving a wildcard domain where the left-most label is mapped to a username in the app (e.g. This means the host label's cardinality is infinite. This means more and more RAM will be consumed until Caddy get's OOM-killed. That's not ok. It could be argued we just need to document "don't enable this if the amount of hosts you have is unbound (e.g. wildcard domains or On-Demand TLS)" but I don't think that's a good idea, some users will not properly read the disclaimer and will enable it and have a server that's vulnerable to being OOM-killed by an attacker. Also yeah, reloading config is still a problem. Caddy needs to support graceful reloads, so if the config changes we need to update the metrics config. AFAIK this isn't possible right now because of global state, and if we changed it to be non-global state then all metrics will be reset on reload (so all counts/averages/etc are lost on reload). I don't know how we'll solve that. |
4ec4272
to
56685f7
Compare
I think we can avoid the reload issue by having a default value for the label when it is disabled. This way the config can be changed without changing the label count |
@hairyhenderson any thoughts? |
@hussam-almarzoq I'm not excited about that option, because it's now adding a new label to everyone's metrics whether they want it or not. The key here is:
This is necessary for other metrics work as well, not just for this PR. To respond to what @francislavoie said:
Ultimately I think it's OK for everything to reset on reload. It'll appear as if the process died and was restarted, but that's fine. |
This addresses parts of #3784 and #4016. I understand the considerations mentioned by @hairyhenderson regarding the cardinality issue, but I think this is more of a Prometheus issue rather than a Caddy issue. Unlike #4644, no change in Caddy would solve it.
I think adding a clear caution in the docs explaining the implications of enabling this option, much like enabling server metrics, should be enough for the users to understand the tradeoff.