-
Notifications
You must be signed in to change notification settings - Fork 60
policy-engine/metrics: don't use a global registry #141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
policy-engine/metrics: don't use a global registry #141
Conversation
|
❤️ |
policy-engine/src/config/settings.rs
Outdated
|
|
||
| /// Common prefix for policy-engine metrics. | ||
| #[default(DEFAULT_METRICS_PREFIX.to_string())] | ||
| pub metrics_prefix: String, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather not have these two configuration knobs.
We can discuss them separately if you want, but I wouldn't put them in this PR anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can discuss them separately if you want, but I wouldn't put them in this PR anyway.
I think a discussion will be helpful here. Why would you prefer not to have them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For several reasons, most of them related to putting this into a larger context:
- metrics names are fixed by the application, not by runtime configuration, and we want them to be stable/reliable across deployments
- metrics gathering is a first-class part of the application, and we want to encourage more people to consume them (without breakages)
- an high number of unused configuration knobs just increase the cognitive load on operators and the maintenance cost on developers
- the status service is an important (i.e. non-conditional) piece of the graph-builder. I'd like the policy-engine to not diverge from GB, unless where strictly required
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* metrics names are fixed by the application, not by runtime configuration, and we want them to be stable/reliable across deployments
I guess they are hard-coded until they're not? In a similar fashion we also have the path_prefix for the main service in configurable, which is also quite an important service. For reliability we could remove the default and force the operator to provide the value.
* the status service is an important (i.e. non-conditional) piece of the graph-builder. I'd like the policy-engine to not diverge from GB, unless where strictly required
I was going to suggest to make the equivalent changes to the GB.
* an high number of unused configuration knobs just increase the cognitive load on operators and the maintenance cost on developers
By removing the default we would force it to be used (and known) by the operator.
* metrics gathering is a first-class part of the application, and we want to encourage more people to consume them (without breakages)
Not sure what to say about this though I of course agree that we try hard to not break consumers.
Now that I gave my view and envisioned directions on the above items, I want to state that my opinion isn't strong enough to force this through. I just added the setting for completeness after splitting the prefix out of the metrics for not repeating the magic string in the tests. I'll agree to remove the user-facing setting from this PR and possibly revisit this at a later time. ACK?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We started without a path_prefix and we added it once we had a usecase/consumer for that. My point here is that a fully-configurable registry has no such usecase/consumer (surely now, and possible in the future), and additionally it isn't an encouraged pattern for metrics.
For context, people are expected to write and share monitoring mixins, like https://github.com/kubernetes-monitoring/kubernetes-mixin/blob/master/dashboards/apiserver.libsonnet.
I just added the setting for completeness after splitting the prefix out of the metrics for not repeating the magic string in the tests
Thanks, now I see where this is coming from. Based on that, my preference would be:
- unifying PE towards GB, i.e. a non-conditional status service, with shared logic
- not introducing neither a
registry_enablednor aregistry_prefixconfiguration knob - carrying the registry_prefix somewhere in the application (for code sharing), but making it hardcoded
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thanks for elaborating! As far as the metrics is concerned the code is shared in the latest commit of this PR. I would postpone the full status service for the PE as this PR is only about refactoring the metrics code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack, SGTM. Ping me back when you drop the WIP label.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lucab 🔔
cb53f0d to
76192ce
Compare
|
@lucab PTAL at 76192ced4839c58de9404bff354bc8aba7be134c which shares the metrics module among the two services. Some trickery is involved with how the registry is stored and retrieved into the web service; it looks bearable to me. |
76192ce to
90acec4
Compare
f48e9f7 to
0d52e4e
Compare
0d52e4e to
752a718
Compare
* Move the policy-engine metrics module to commons and arranges both components to use it. * Remove the global registry instance in favor of a static reference which is shared with the metric services.
752a718 to
a58aea5
Compare
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: lucab, steveeJ The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Depends on