-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve UTO logs #9718
Improve UTO logs #9718
Conversation
This change improves the UTO logs. It now logs every periodic reconciliation and every batch reconciliation. It also logs individual topic reconciliations in case of failure. All YAML paylodas have been replaced with topic name. --- The KafkaTopic informer resync configuration had a small issue which was causing the skip of some periodic reconciliations. They were often triggering at 4 minutes, instead of 2 minutes. The informer interval acts like a heartbeat, then each handler interval will cause a resync at some interval of the overall heartbeat. The closer these values are together the more likely it is that the handler skips one informer intervals. Setting both intervals to the same value generates just enough skew that when the informer checks if the handler is ready for resync it sees that it still needs another couple of micro-seconds and skips to the next informer level resync. This is fixed by introducing a small fixed resync period for the informer. The resync operation is all in memory and results in a noop most of the time, so this causes no harm. Signed-off-by: Federico Valeri <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it is a good start. The other operators normally log on the INFO level the reconciliation of each resource. That is useful to detect if for example the KafkaTopic resource has the right label etc. But it shows that something is going on and it is not excessive. So as far as I'm concerned we can start with this and see what more do we need in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of minor points, but this LGTM.
topic-operator/src/main/java/io/strimzi/operator/topic/v2/BatchingLoop.java
Outdated
Show resolved
Hide resolved
topic-operator/src/main/java/io/strimzi/operator/topic/v2/BatchingTopicController.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Federico Valeri <[email protected]>
This change improves the UTO logs. It now logs every periodic reconciliation and every batch reconciliation. It also logs individual topic reconciliations in case of failure. All YAML paylodas have been replaced with topic name.
Log example: uto.log. This should close #9465.
The KafkaTopic informer resync configuration had a small issue which was causing the skip of some periodic reconciliations. They were often triggering at 4 minutes, instead of 2 minutes.
The informer interval acts like a heartbeat, then each handler interval will cause a resync at some interval of the overall heartbeat. The closer these values are together the more likely it is that the handler skips one informer intervals. Setting both intervals to the same value generates just enough skew that when the informer checks if the handler is ready for resync it sees that it still needs another couple of micro-seconds and skips to the next informer level resync.
This is fixed by introducing a small fixed resync period for the informer. The resync operation is all in memory and results in a noop most of the time, so this causes no harm.