-
Notifications
You must be signed in to change notification settings - Fork 511
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[noderesourcetopology] move logging to the logr interface #710
[noderesourcetopology] move logging to the logr interface #710
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ffromani The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/cc @PiotrProkop (for discardednodes and least_numa) |
✅ Deploy Preview for kubernetes-sigs-scheduler-plugins canceled.
|
/hold The PR is reviewable but I'm still polishing it a bit and validating the changes (slicing the logs with grep mostly) |
de5ee40
to
b015657
Compare
the test failures are known. It seems better to me to push some code down the stack: k8stopologyawareschedwg/noderesourcetopology-api#35 |
655b387
to
23910b8
Compare
/hold cancel Bar fixes and review comments, I think this PR has all the content we need for this cleanup. |
ea82f02
to
4960b2c
Compare
/cc @swatisehgal |
Should we squash history to 2 commits?
|
Consider how the project seems to prefer, I'm in favor of squashing once reviewers are happy. |
/hold need to squash once the reviewers are happy |
4960b2c
to
663e763
Compare
/hold cancel squashed as requested |
/hold need to review the integration with the scheduler framework >= 1.29 EDIT: point in case: https://github.com/kubernetes/kubernetes/blob/v1.29.4/pkg/scheduler/framework/runtime/framework.go#L819 |
df75f99
to
ed3ef23
Compare
/hold cancel proper integration with framework and ongoing work to fully support the contextual logging (KEP-3077) would need a followup PR. changelog:
|
now I call this PR done :) |
Bump NRT API package to v0.1.2; there is no API change, but we have now a better replacement for the internal `getID` helper, which we can now remove. Signed-off-by: Francesco Romani <[email protected]>
Move to the logr.Logger interface everywhere, instead of the global `klog` instance. This enable named logger, presetting values for simple and automatic consistency, enables pluggable loggers and comes for free since we already depend on the logr package and klog has a native logr integration. In addition, add minimal support to make it easy to replace the logr reference, to help integrators of this code. The default is still (and will still be) klog for backward compatibility and ecosystem integration. Signed-off-by: Francesco Romani <[email protected]>
ed3ef23
to
045a43d
Compare
rebased on top of the 1.29 rebase |
Looks good to me 👍 |
After more review and conversations, we have a better understanding of how integration with contextual logging should loom like. First and foremost, injecting loggers would conflict with the very goals of contextual logging. So, let's drop this code we added in kubernetes-sigs#710. The contextual logger doesn't do key/values deduplication. This is let to (some) backends. To avoid log clutter, trim down the extra key/value pairs and add only those we really need to ensure a good troubleshooting experience. Still let's make sure to add critical key/value pairs in the relevant entries, at cost of a possible duplication. When reporting the current assumed resources, the current representation is neither concise nor very human friendly. Additionally, multi-line log entries are harder to process and should be avoided. So let's move to a more concise representation, which turns out not obviously less human friendly and is no longer multiline. Review verbosiness of log entries. Move down to verbose=2 logs which are really key to understand the behavior. We should set a hard limit to log entries to minimize the log spam while keeping at least some observability without requiring v=4 or greater. The level v=4 is usually/often the highest-not-spammy log. When debug logs are needed we often set v=4, and higher verbosity levels are often used only in desperate times. Thus, promote to v=4 the debug logs we should really see. Everywhere else in the kubernetes ecosystem, and most notably in the scheduler, the pod namespace/name pair is called "pod", while we called it "logID". We do it to use the same name for all the flows, being the cache resync (which is driven by time, not by an object) the odd one. It seems better to be externally consistent (with the ecosystem) rather than internally consistent (all the flows in the same plugin), so we rename "logID" to "pod" in the log entries. Signed-off-by: Francesco Romani <[email protected]>
After more review and conversations, we have a better understanding of how integration with contextual logging should loom like. First and foremost, injecting loggers would conflict with the very goals of contextual logging. So, let's drop this code we added in kubernetes-sigs#710. The contextual logger doesn't do key/values deduplication. This is let to (some) backends. To avoid log clutter, trim down the extra key/value pairs and add only those we really need to ensure a good troubleshooting experience. Still let's make sure to add critical key/value pairs in the relevant entries, at cost of a possible duplication. When reporting the current assumed resources, the current representation is neither concise nor very human friendly. Additionally, multi-line log entries are harder to process and should be avoided. So let's move to a more concise representation, which turns out not obviously less human friendly and is no longer multiline. Review verbosiness of log entries. Move down to verbose=2 logs which are really key to understand the behavior. We should set a hard limit to log entries to minimize the log spam while keeping at least some observability without requiring v=4 or greater. The level v=4 is usually/often the highest-not-spammy log. When debug logs are needed we often set v=4, and higher verbosity levels are often used only in desperate times. Thus, promote to v=4 the debug logs we should really see. Everywhere else in the kubernetes ecosystem, and most notably in the scheduler, the pod namespace/name pair is called "pod", while we called it "logID". We do it to use the same name for all the flows, being the cache resync (which is driven by time, not by an object) the odd one. It seems better to be externally consistent (with the ecosystem) rather than internally consistent (all the flows in the same plugin), so we rename "logID" to "pod" in the log entries. Signed-off-by: Francesco Romani <[email protected]>
before kubernetes-sigs#710 and kubernetes-sigs#725, we logged the container being processed alongside the pod (identified by namespace/name pair). It was dropped by mistake and not deliberately. This is useful information when troubleshooting, so let's add it back. Signed-off-by: Francesco Romani <[email protected]>
before kubernetes-sigs#710 and kubernetes-sigs#725, we logged the container being processed alongside the pod (identified by namespace/name pair). It was dropped by mistake and not deliberately. This is useful information when troubleshooting, so let's add it back. Signed-off-by: Francesco Romani <[email protected]>
before kubernetes-sigs#710 and kubernetes-sigs#725, we logged the container being processed alongside the pod (identified by namespace/name pair). It was dropped by mistake and not deliberately. This is useful information when troubleshooting, so let's add it back. Signed-off-by: Francesco Romani <[email protected]>
After more review and conversations, we have a better understanding of how integration with contextual logging should loom like. First and foremost, injecting loggers would conflict with the very goals of contextual logging. So, let's drop this code we added in kubernetes-sigs#710. The contextual logger doesn't do key/values deduplication. This is let to (some) backends. To avoid log clutter, trim down the extra key/value pairs and add only those we really need to ensure a good troubleshooting experience. Still let's make sure to add critical key/value pairs in the relevant entries, at cost of a possible duplication. When reporting the current assumed resources, the current representation is neither concise nor very human friendly. Additionally, multi-line log entries are harder to process and should be avoided. So let's move to a more concise representation, which turns out not obviously less human friendly and is no longer multiline. Review verbosiness of log entries. Move down to verbose=2 logs which are really key to understand the behavior. We should set a hard limit to log entries to minimize the log spam while keeping at least some observability without requiring v=4 or greater. The level v=4 is usually/often the highest-not-spammy log. When debug logs are needed we often set v=4, and higher verbosity levels are often used only in desperate times. Thus, promote to v=4 the debug logs we should really see. Everywhere else in the kubernetes ecosystem, and most notably in the scheduler, the pod namespace/name pair is called "pod", while we called it "logID". We do it to use the same name for all the flows, being the cache resync (which is driven by time, not by an object) the odd one. It seems better to be externally consistent (with the ecosystem) rather than internally consistent (all the flows in the same plugin), so we rename "logID" to "pod" in the log entries. Signed-off-by: Francesco Romani <[email protected]> (cherry picked from commit 7098181)
before kubernetes-sigs#710 and kubernetes-sigs#725, we logged the container being processed alongside the pod (identified by namespace/name pair). It was dropped by mistake and not deliberately. This is useful information when troubleshooting, so let's add it back. Signed-off-by: Francesco Romani <[email protected]> (cherry picked from commit 7a8afdf)
After more review and conversations, we have a better understanding of how integration with contextual logging should loom like. First and foremost, injecting loggers would conflict with the very goals of contextual logging. So, let's drop this code we added in kubernetes-sigs#710. The contextual logger doesn't do key/values deduplication. This is let to (some) backends. To avoid log clutter, trim down the extra key/value pairs and add only those we really need to ensure a good troubleshooting experience. Still let's make sure to add critical key/value pairs in the relevant entries, at cost of a possible duplication. When reporting the current assumed resources, the current representation is neither concise nor very human friendly. Additionally, multi-line log entries are harder to process and should be avoided. So let's move to a more concise representation, which turns out not obviously less human friendly and is no longer multiline. Review verbosiness of log entries. Move down to verbose=2 logs which are really key to understand the behavior. We should set a hard limit to log entries to minimize the log spam while keeping at least some observability without requiring v=4 or greater. The level v=4 is usually/often the highest-not-spammy log. When debug logs are needed we often set v=4, and higher verbosity levels are often used only in desperate times. Thus, promote to v=4 the debug logs we should really see. Everywhere else in the kubernetes ecosystem, and most notably in the scheduler, the pod namespace/name pair is called "pod", while we called it "logID". We do it to use the same name for all the flows, being the cache resync (which is driven by time, not by an object) the odd one. It seems better to be externally consistent (with the ecosystem) rather than internally consistent (all the flows in the same plugin), so we rename "logID" to "pod" in the log entries. Signed-off-by: Francesco Romani <[email protected]>
before kubernetes-sigs#710 and kubernetes-sigs#725, we logged the container being processed alongside the pod (identified by namespace/name pair). It was dropped by mistake and not deliberately. This is useful information when troubleshooting, so let's add it back. Signed-off-by: Francesco Romani <[email protected]>
before kubernetes-sigs#710 and kubernetes-sigs#725, we logged the container being processed alongside the pod (identified by namespace/name pair). It was dropped by mistake and not deliberately. This is useful information when troubleshooting, so let's add it back. Signed-off-by: Francesco Romani <[email protected]>
After more review and conversations, we have a better understanding of how integration with contextual logging should loom like. First and foremost, injecting loggers would conflict with the very goals of contextual logging. So, let's drop this code we added in kubernetes-sigs#710. The contextual logger doesn't do key/values deduplication. This is let to (some) backends. To avoid log clutter, trim down the extra key/value pairs and add only those we really need to ensure a good troubleshooting experience. Still let's make sure to add critical key/value pairs in the relevant entries, at cost of a possible duplication. When reporting the current assumed resources, the current representation is neither concise nor very human friendly. Additionally, multi-line log entries are harder to process and should be avoided. So let's move to a more concise representation, which turns out not obviously less human friendly and is no longer multiline. Review verbosiness of log entries. Move down to verbose=2 logs which are really key to understand the behavior. We should set a hard limit to log entries to minimize the log spam while keeping at least some observability without requiring v=4 or greater. The level v=4 is usually/often the highest-not-spammy log. When debug logs are needed we often set v=4, and higher verbosity levels are often used only in desperate times. Thus, promote to v=4 the debug logs we should really see. Everywhere else in the kubernetes ecosystem, and most notably in the scheduler, the pod namespace/name pair is called "pod", while we called it "logID". We do it to use the same name for all the flows, being the cache resync (which is driven by time, not by an object) the odd one. It seems better to be externally consistent (with the ecosystem) rather than internally consistent (all the flows in the same plugin), so we rename "logID" to "pod" in the log entries. Signed-off-by: Francesco Romani <[email protected]>
What type of PR is this?
/kind feature
What this PR does / why we need it:
Implement the changes proposed in #709 to review logging and make it consistent, easy to extend and to get right, and for bonus points easy to replace should it be needed.
Worth pointing out this PR goes in the direction of KEP 3077 (contextual logging). The code becomes almost completely ready for a switch to the contextual logger, with only few selected pieces to change in the future.
Which issue(s) this PR fixes:
Fixes #709
Special notes for your reviewer:
PR split in many smallish commit to make it easier to review (and bisect)
Excluding the obvious intended logging fixes/improvements, no intended changes in behavior.
Ultimately, this PR is not user-facing because the users should see no difference except for more consistent logs, easier to grep and/or process in general.
Does this PR introduce a user-facing change?