Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

metrics: scope metrics to active config, add optional per-host metrics #6531

Merged
merged 15 commits into from
Oct 2, 2024

Conversation

mohammed90
Copy link
Member

This PR builds atop @hussam-almarzoq's work in PR #6279, addresses the comments, and scopes the metrics per active config. The admin metrics are retained and not reset per config reload because they're instance/process-oriented. The HTTP metrics are reset at config reload, mostly because it's hard to know the diff to add/remove at reload, especially because radical config change may render previous metrics meaningless.

This also adds 2 more metrics:

  • caddy_config_last_reload_success_timestamp_seconds: records the timestamp of the last successful load
  • caddy_config_last_reload_successful: records whether the last config load was successful or not

Supersede #6279
Closes #3784

@mohammed90 mohammed90 added feature ⚙️ New feature or request under review 🧐 Review is pending before merging labels Aug 20, 2024
@mohammed90 mohammed90 added this to the v2.9.0 milestone Aug 20, 2024
@mohammed90 mohammed90 changed the title metrics: add optional per-host metrics metrics: scope metrics to active config, add optional per-host metrics Aug 20, 2024
@mholt
Copy link
Member

mholt commented Aug 23, 2024

I don't fully understand the metrics aspects, but if this works, that's big if true, yeah?

I will defer to others more proficient with the metrics libs, but if it works and doesn't cause significant regressions in performance, LGTM.

@mohammed90
Copy link
Member Author

I don't fully understand the metrics aspects, but if this works, that's big if true, yeah?

There may be a small uptick in memory utilization, but should be negligible according to the discussion on #4644 (comment). I don't expect the code to be any slower than what's currently experienced in #4644.

Copy link
Member

@mholt mholt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. Well I'm approving for the sake of "let's try it" -- not in the sense of "I have read and understand exactly what this is doing" 😅

When someone else with more knowledge approves, then we can merge it in 👍

Copy link
Collaborator

@hairyhenderson hairyhenderson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tested this locally and there are still references to the default registry, especially see https://github.com/caddyserver/caddy/blob/master/modules/metrics/metrics.go#L111 - the effect is that the /metrics endpoint on the admin port exposes the default metrics registered there (Go runtime metrics, for example), but none of the Caddy metrics

@mholt mholt modified the milestones: v2.9.0-beta.1, v2.9.0-beta.2 Oct 2, 2024
@mholt mholt merged commit 41f5dd5 into caddyserver:master Oct 2, 2024
33 checks passed
@mholt mholt removed the under review 🧐 Review is pending before merging label Oct 2, 2024
@mohammed90 mohammed90 deleted the per-host-metrics branch October 2, 2024 14:29
@ueffel
Copy link
Contributor

ueffel commented Nov 22, 2024

I'm checking out the 2.9.0 beta and I noticed that there aren't any of the Go or process metrics exposed (collectors.NewProcessCollector, collectors.NewProcessCollector from github.com/prometheus/client_golang/prometheus/collectors). Is that intended? Should the metrics endpoint omit these now?

@mohammed90
Copy link
Member Author

I'm checking out the 2.9.0 beta and I noticed that there aren't any of the Go or process metrics exposed (collectors.NewProcessCollector, collectors.NewProcessCollector from github.com/prometheus/client_golang/prometheus/collectors). Is that intended? Should the metrics endpoint omit these now?

I don't think that collector was ever used before. Why'd you assume it's included?

@ueffel
Copy link
Contributor

ueffel commented Nov 22, 2024

I saw the metrics before in v2.8.4. Things like go_info, go_goroutines, go_threads, go_gc_duration_seconds or process_cpu_seconds_total.
Maybe before the default registry was used so these 2 were always registered via this init() function: https://github.com/prometheus/client_golang/blob/6e3f4b1091875216850a486b1c2eb0e5ea852f98/prometheus/registry.go#L60-L63

@mohammed90
Copy link
Member Author

I saw the metrics before in v2.8.4. Things like go_info, go_goroutines, go_threads, go_gc_duration_seconds or process_cpu_seconds_total. Maybe before the default registry was used so these 2 were always registered via this init() function: https://github.com/prometheus/client_golang/blob/6e3f4b1091875216850a486b1c2eb0e5ea852f98/prometheus/registry.go#L60-L63

Ah, I'll fix it shortly. Thanks for the catch and the heads up!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature ⚙️ New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support per-host Prometheus metrics
5 participants