Skip to content

Conversation

@steveej
Copy link
Contributor

@steveej steveej commented Aug 20, 2019

@openshift-ci-robot openshift-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Aug 20, 2019
@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 20, 2019
@lucab
Copy link
Contributor

lucab commented Aug 20, 2019

❤️


/// Common prefix for policy-engine metrics.
#[default(DEFAULT_METRICS_PREFIX.to_string())]
pub metrics_prefix: String,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather not have these two configuration knobs.
We can discuss them separately if you want, but I wouldn't put them in this PR anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can discuss them separately if you want, but I wouldn't put them in this PR anyway.

I think a discussion will be helpful here. Why would you prefer not to have them?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For several reasons, most of them related to putting this into a larger context:

  • metrics names are fixed by the application, not by runtime configuration, and we want them to be stable/reliable across deployments
  • metrics gathering is a first-class part of the application, and we want to encourage more people to consume them (without breakages)
  • an high number of unused configuration knobs just increase the cognitive load on operators and the maintenance cost on developers
  • the status service is an important (i.e. non-conditional) piece of the graph-builder. I'd like the policy-engine to not diverge from GB, unless where strictly required

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

* metrics names are fixed by the application, not by runtime configuration, and we want them to be stable/reliable across deployments

I guess they are hard-coded until they're not? In a similar fashion we also have the path_prefix for the main service in configurable, which is also quite an important service. For reliability we could remove the default and force the operator to provide the value.

* the status service is an important (i.e. non-conditional) piece of the graph-builder. I'd like the policy-engine to not diverge from GB, unless where strictly required

I was going to suggest to make the equivalent changes to the GB.

* an high number of unused configuration knobs just increase the cognitive load on operators and the maintenance cost on developers

By removing the default we would force it to be used (and known) by the operator.

* metrics gathering is a first-class part of the application, and we want to encourage more people to consume them (without breakages)

Not sure what to say about this though I of course agree that we try hard to not break consumers.


Now that I gave my view and envisioned directions on the above items, I want to state that my opinion isn't strong enough to force this through. I just added the setting for completeness after splitting the prefix out of the metrics for not repeating the magic string in the tests. I'll agree to remove the user-facing setting from this PR and possibly revisit this at a later time. ACK?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We started without a path_prefix and we added it once we had a usecase/consumer for that. My point here is that a fully-configurable registry has no such usecase/consumer (surely now, and possible in the future), and additionally it isn't an encouraged pattern for metrics.

For context, people are expected to write and share monitoring mixins, like https://github.com/kubernetes-monitoring/kubernetes-mixin/blob/master/dashboards/apiserver.libsonnet.

I just added the setting for completeness after splitting the prefix out of the metrics for not repeating the magic string in the tests

Thanks, now I see where this is coming from. Based on that, my preference would be:

  • unifying PE towards GB, i.e. a non-conditional status service, with shared logic
  • not introducing neither a registry_enabled nor a registry_prefix configuration knob
  • carrying the registry_prefix somewhere in the application (for code sharing), but making it hardcoded

Copy link
Contributor Author

@steveej steveej Aug 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thanks for elaborating! As far as the metrics is concerned the code is shared in the latest commit of this PR. I would postpone the full status service for the PE as this PR is only about refactoring the metrics code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack, SGTM. Ping me back when you drop the WIP label.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lucab 🔔

@steveej steveej force-pushed the pr/pe-metrics-non-global-registry branch from cb53f0d to 76192ce Compare August 22, 2019 10:49
@openshift-ci-robot openshift-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Aug 22, 2019
@steveej
Copy link
Contributor Author

steveej commented Aug 22, 2019

@lucab PTAL at 76192ced4839c58de9404bff354bc8aba7be134c which shares the metrics module among the two services. Some trickery is involved with how the registry is stored and retrieved into the web service; it looks bearable to me.

@steveej steveej force-pushed the pr/pe-metrics-non-global-registry branch from 76192ce to 90acec4 Compare August 22, 2019 12:54
@steveej steveej force-pushed the pr/pe-metrics-non-global-registry branch 2 times, most recently from f48e9f7 to 0d52e4e Compare August 22, 2019 20:17
@steveej steveej changed the title [WIP] policy-engine/metrics: don't use a global registry policy-engine/metrics: don't use a global registry Aug 22, 2019
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 22, 2019
@steveej steveej force-pushed the pr/pe-metrics-non-global-registry branch from 0d52e4e to 752a718 Compare August 22, 2019 20:26
* Move the policy-engine metrics module to commons and arranges both
components to use it.
* Remove the global registry instance in favor of a static
reference which is shared with the metric services.
@steveej steveej force-pushed the pr/pe-metrics-non-global-registry branch from 752a718 to a58aea5 Compare August 23, 2019 08:11
@lucab
Copy link
Contributor

lucab commented Aug 23, 2019

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Aug 23, 2019
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: lucab, steveeJ

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit ebe2274 into openshift:master Aug 23, 2019
@steveej steveej deleted the pr/pe-metrics-non-global-registry branch August 23, 2019 09:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants