-
Notifications
You must be signed in to change notification settings - Fork 215
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: now possible to only output non-resource related metrics #1823
Conversation
ae1c020
to
de0b4fc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that without per resource metrics we ended totally blind. Would it be possible to keep the metrics per resource kind ?
...pport/src/main/java/io/javaoperatorsdk/operator/monitoring/micrometer/MicrometerMetrics.java
Outdated
Show resolved
Hide resolved
...pport/src/main/java/io/javaoperatorsdk/operator/monitoring/micrometer/MicrometerMetrics.java
Outdated
Show resolved
Hide resolved
...pport/src/main/java/io/javaoperatorsdk/operator/monitoring/micrometer/MicrometerMetrics.java
Outdated
Show resolved
Hide resolved
It's still possible to get them but you would need to activate a flag in future versions, i.e. staring from v5 probably. This PR should keep the current behavior as-is. |
Also, we could indeed remove the name/namespace tags only |
+1 |
Turns out that removing meters without delay doesn't work reliably due to the fact that cleaning operations open in different threads and heavily depend on how / when the API server responds to the SDK so I will remove the possibility of immediately removing the meters. |
@scrocquesel made the per-resource collection more granular and added docs, would appreciate a new review 😄 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some few things.
Also want to share my thoughts on per schedule cleaner as I don't know if creating lots of timer has memory/perf issues. I think the objective is to keep stale meters until they have been collected one last time by the scrapper. Thus the configured delay should be a little more than the scrapping delay. We may remove meters by batch after they have spend delay
seconds in a per generation collection. I guess with two generations, we can rotate every delay
seconds where stale meters are recorded and clean them upon next switch.
Ideally, I would love to have some sort of a event souurce and being able to clean all stale meters when the event is raised. The event would be raised by the metric endpoint route handler after returning metrics to the scrapper. I checked quarkus route handler implementation but currenlty it is not possible to know when metrics are collected. If it make sense we may open an issue at quarkus. JOSDK could expose the a signal or method to be wired up in QOSDK.
is a matter of instantiating `MicrometerMetrics` with the desired values and tell `ConfigurationServiceProvider` about | ||
it as shown above. | ||
The micrometer implementation is typically created using one of the provided factory methods which, depending on which | ||
is used, will return either a ready to use instance or a builder allowing users to customized how the implementation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't builder pattern be consistent so we have only one entry point to the configuration ? MicrometerMetrics.builder()
and then be self documented with javadoc. By default, the builder is not configuring per resource and you can switch to the per resource builder if needed.
MicrometerMetrics.newMicrometerMetricsBuilder(new LoggingMeterRegistry())
.collectingMetricsPerResource(perResourceBuilder ->
perResourceBuilder.withCleanUpDelayInSeconds(60))
.build();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having several entry points allows to provide often used configurations without users having to create them with the builder. It also allows to have stable API calls since these factory method implementations can be changed if we change the default behavior while the semantics of the method won't so I'd be in favor of keeping things like they are for now.
...pport/src/main/java/io/javaoperatorsdk/operator/monitoring/micrometer/MicrometerMetrics.java
Outdated
Show resolved
Hide resolved
I have to admit that I'm not too familiar with how the scrapping work and I'm not sure whether our implementation can actually be notified that scrapping occurred.
That's an interesting idea. I'm not sure how feasible that is, though. |
Looking at |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Find some discrepancies between doc and impl
@@ -135,40 +167,35 @@ public <T> T timeControllerExecution(ControllerExecution<T> execution) { | |||
}); | |||
final var successType = execution.successTypeName(result); | |||
registry | |||
.counter(execName + ".success", "controller", name, "type", successType) | |||
.counter(execName + SUCCESS_SUFFIX, CONTROLLER, name, TYPE, successType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doc says it has resoure metadata. I guess it should reuse the tags list above. Same apply for the failure below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should leave it like this for this release to keep the current behavior. I'll fix the doc instead. I'm also unsure whether adding more metadata is useful for this metric (at least, the GVK can be determined by looking at the controller definition from its name).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That said, the metadata is on the timers which doesn't really make sense either and should probably be removed there as well in the next major release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense there is operator.sdk.reconciliations.failed
if the user want to build alerts on failure for a specific GVK
I don't actually want to make things too configurable or too fine-grained with the default implementation. It shouldn't be too hard for people to use that as a starting point for their own implementation. |
This is needed because the finalizer will trigger a reconciliation that adds a resource-specific metric.
Co-authored-by: Sébastien CROCQUESEL <[email protected]>
We now still collect GVK information when per-resource collection is switched off.
[skip ci]
Co-authored-by: Sébastien CROCQUESEL <[email protected]>
4f1589b
to
4505761
Compare
Kudos, SonarCloud Quality Gate passed! |
...pport/src/main/java/io/javaoperatorsdk/operator/monitoring/micrometer/MicrometerMetrics.java
Outdated
Show resolved
Hide resolved
[skip ci] Co-authored-by: Attila Mészáros <[email protected]>
Fixes #1812.