Conversation
rexagod
commented
Jan 4, 2026
- One-line PR description: Alpha features were implemented a while ago, and after discussing the future of this at length with @chrischdi and @mrueg, I believe the proposed criteria for BETA makes sense.
- Issue link: Resource State Metrics #4785
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: rexagod The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Alpha features were implemented a while ago, and after discussing the future of this at length with other maintainers and stakeholders, I believe the proposed criteria for BETA makes sense. Signed-off-by: Pranshu Srivastava <rexagod@gmail.com>
|
@rexagod thanks for the BETA proposal enhancement We would like to explicitly call out that, from the Crossplane community’s perspective, having the RSM repository hosted under a k8s-sig org would be highly valuable. ( End of January This would:
Regarding resolver: we believe CEL is a strong default and will cover the majority of use-cases well ( and it has already seen meaningful testing in crossplane contexts with XRs, Claims and managed Resources). Importantly, we are intentionally holding back further independent efforts in projects like We understand that moving the repo ( https://github.com/rexagod/resource-state-metrics ) into the k8s-sig introduces additional overhead ( CI, image publishing, release processes) To accelerate feedback already, we currently maintain a fork ( https://github.com/haarchri/resource-state-metrics ) where we have added CI, image publishing and release automation. ( already shared with a few crossplane community folks ) Happy to help where possible and looking forward to contributing once the repo setup enables broader community involvement. |
| expressed using `gauge`s. | ||
| - All generated metrics are hardcoded to `gauge`s by design, as metrics | ||
| backends in the ecosystem do not support some OpenMetrics-specified | ||
| metrics' types, such as `Info` and `StateSets`, but more importantly, |
There was a problem hiding this comment.
Could this be part of an auto-negotiation on the protocol level?
There was a problem hiding this comment.
Right, I pushed the same negotiation behavior as KSM to RSM, which respects the selected KEP content above. However, one could argue that we must maintain compatibility with Prometheus' progress on OpenMetrics, and thus, atleast support Counters as well right now.
The only reason I didn't do this and had a hard-pin on Gauges was only because this would open us up to supporting all relevant metric types that RMM allows users to construct, which, in the future, will, at the very least, entail Info and Stateset types.
But I realize this makes sense, and the complexity is worth the benefits the community gets out of RSM, so I'll put this in the TODO before we go stable. PLMK if that makes sense, or if you'd like a different approach.
There was a problem hiding this comment.
| - Add support for `starlark` resolver to help define more complex logic, | ||
| in addition to mainstream DSLs seen across the ecosystem for similar | ||
| use-cases, such as `jq` and [`jsonpath`]. | ||
| - Explore areas that could benefit from (gzipped) compression, such as |
There was a problem hiding this comment.
maybe consider supporting zstd as well, if it makes sense?
| the configuration field in the managed resource, or the payload | ||
| endpoint(s). | ||
|
|
||
| [`jsonpath`]: https://github.com/openmcp-project/metrics-operator#metric |
There was a problem hiding this comment.
This supports jsonpath as well: https://github.com/prometheus-community/json_exporter
| @@ -392,31 +395,35 @@ generation. | |||
| - At its core, the controller relies on its managed resource, | |||
| `ResourceMetricsMonitor` to fetch the metric generation configuration. Parts | |||
There was a problem hiding this comment.
Will we allow supplying configuration as a regular config file as well? I wonder if there's a usecase to run RSM outside of the cluster.
There was a problem hiding this comment.
Folks can use similar semantics as seen in make local to run this locally. That being said, doing away with RMMs and relying on file blobs will strip away the added feature-set of the controller lifecycle (for e.g., status updates) that users can benefit from. RSM could mirror that blob to an RMM resource, but I'm not sure if the file-only direction would be worth the added complexity.
|
👋🏼 Thank you for the reviews here, folks! @haarchri Apologies for the delay here, I'm currently working on this. I've also managed to allocate some spare cycles in the upcoming week to dedicate to this, and I'll make sure the migration, along with planned updates, happens in the same time period. I'll also ensure your thoughts above are reflected in the patches. |
|
FYI @haarchri I've migrated the repository to https://github.com/kubernetes-sigs/resource-state-metrics. |
|
(bump) Also please note that I've implemented everything I had in mind for BETA as well as GA for RSM, PTAL at the TODO section. We should be good to graduate this for BETA this release and use the soak time between that and STABLE graduation to get stakeholders' inputs, both community and downstream, to build confidence. |
| #### Stable | ||
|
|
||
| - Consider supporting all relevant metric types that Prometheus' OpenMetrics | ||
| implementation allows. This would entail `Counter`s currently, and `Info` and |
There was a problem hiding this comment.
there is a use case for counters in rsm?
There was a problem hiding this comment.
I wanted RSM to be as close to the OM spec as possible, and respect the nuances it entails.
For eg., with CRS, a counter will be created as a gauge, with no peripherals, i.e., not only will a _total metric showcase metadata for its type as gauge (even though the help text may say otherwise, increasing the overall ambiguity), the user may or may not create a _created scalar to accompany that metric so it's helpful for Prometheus to calculate precise rates between resets.
RSM does not allow the user to specify a type, instead, it looks at the suffix, and if it matches upto a supported OM type, does everything that the OM spec advises.
| addition to mainstream DSLs seen across the ecosystem for similar use-cases, | ||
| such as `jq` and `jsonpath` ([`prometheus-community/json_exporter`], | ||
| [`openmcp-project/metrics-operator`], etc.). | ||
| - Explore areas that could benefit from (gzipped and zstd) compression, such as |
There was a problem hiding this comment.
Since this is a graduation criteria, it should be concrete. Either you think that implementing compression is mandatory for GA and add it as a requirement here, or this could be something for the future, that is not mentioned as a criteria.
There was a problem hiding this comment.
Right, I'll put this up for GA.
| - Consider supporting all relevant metric types that Prometheus' OpenMetrics | ||
| implementation allows. This would entail `Counter`s currently, and `Info` and | ||
| `Stateset` in the future. See [expfmt.MetricFamilyToOpenMetrics] for more. |
There was a problem hiding this comment.
Similar comment to the one I made on https://github.com/kubernetes/enhancements/pull/5766/changes#r2945360746. Supporting OpenMetrics types could be done outside of this KEP's scope.
Generally we also don't put additional functionalities as GA criterias as it is meant for testing and gathering user feedback.
There was a problem hiding this comment.
I believe I rooted for OM types support, alongwith every other planned effort to be supported as soon as possible, so the soak time before GA could help us test out most of the feature-sets that CRS already had a blueprint for, which users would expect out of RSM OOTB with GA, while allowing us to move fast owing to no GA-like stability guarantees. These features, shipped at an earlier stage, reinforce what RSM sees integral to its functionality, while less important efforts have been moved beyond GA.
This can be seen at play in https://github.com/crossplane-contrib/resource-state-metrics, owing to which we resolved bugs and addressed feature requests, to work towards a more feature-complete GA that the community looks forward to. There are golden rules to test out all expected behaviors based on the user feedback so far, which I anticipate to increase (even before GA) owing to the soak time we have saved by shipping these features earlier in the cycle.
PS. We support relevant OM types in RSM as of now, please see kubernetes-sigs/resource-state-metrics@81dfeaa#diff-bd1b0d7b07d02221f7aa583d7fe1f0d1822d6476f452cdb9f071c0d4c81f53faR74-R89.