[noderesourcetopology] move logging to the logr interface #710

ffromani · 2024-04-10T11:40:56Z

What type of PR is this?

/kind feature

What this PR does / why we need it:

Implement the changes proposed in #709 to review logging and make it consistent, easy to extend and to get right, and for bonus points easy to replace should it be needed.

Worth pointing out this PR goes in the direction of KEP 3077 (contextual logging). The code becomes almost completely ready for a switch to the contextual logger, with only few selected pieces to change in the future.

Which issue(s) this PR fixes:

Fixes #709

Special notes for your reviewer:

PR split in many smallish commit to make it easier to review (and bisect)
Excluding the obvious intended logging fixes/improvements, no intended changes in behavior.
Ultimately, this PR is not user-facing because the users should see no difference except for more consistent logs, easier to grep and/or process in general.

Does this PR introduce a user-facing change?

NONE

k8s-ci-robot · 2024-04-10T11:41:12Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ffromani

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [ffromani]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

ffromani · 2024-04-10T11:41:29Z

/cc @PiotrProkop (for discardednodes and least_numa)

netlify · 2024-04-10T11:41:42Z

✅ Deploy Preview for kubernetes-sigs-scheduler-plugins canceled.

Name	Link
🔨 Latest commit	`045a43d`
🔍 Latest deploy log	https://app.netlify.com/sites/kubernetes-sigs-scheduler-plugins/deploys/66225d1eadbe400008b871cb

ffromani · 2024-04-10T11:42:34Z

/hold

The PR is reviewable but I'm still polishing it a bit and validating the changes (slicing the logs with grep mostly)

pkg/noderesourcetopology/cache/overreserve.go

pkg/noderesourcetopology/cache/passthrough.go

pkg/noderesourcetopology/config.go

pkg/noderesourcetopology/filter.go

pkg/noderesourcetopology/podprovider/podprovider.go

ffromani · 2024-04-11T12:24:55Z

the test failures are known. It seems better to me to push some code down the stack: k8stopologyawareschedwg/noderesourcetopology-api#35

ffromani · 2024-04-15T08:56:34Z

/hold cancel

Bar fixes and review comments, I think this PR has all the content we need for this cleanup.

ffromani · 2024-04-16T13:53:42Z

/cc @swatisehgal

PiotrProkop · 2024-04-16T14:34:57Z

Should we squash history to 2 commits?

bump nrt-api
rework logging?

ffromani · 2024-04-16T15:07:51Z

Should we squash history to 2 commits?
* bump nrt-api

* rework logging?

Consider how the project seems to prefer, I'm in favor of squashing once reviewers are happy.

ffromani · 2024-04-16T15:22:55Z

/hold

need to squash once the reviewers are happy

ffromani · 2024-04-17T12:29:51Z

/hold cancel

squashed as requested

ffromani · 2024-04-19T06:32:19Z

/hold

need to review the integration with the scheduler framework >= 1.29

EDIT: point in case: https://github.com/kubernetes/kubernetes/blob/v1.29.4/pkg/scheduler/framework/runtime/framework.go#L819

ffromani · 2024-04-19T08:40:41Z

/hold cancel

proper integration with framework and ongoing work to fully support the contextual logging (KEP-3077) would need a followup PR.
Let's stop with this already huge PR right here, I feel this is a massive improvement (I know I'm biased here :) ) already.

changelog:

added constants for well-known keys for structured logging
annotated all the plugin entry points like

	lh.V(4).Info(logging.FlowBegin)
	defer lh.V(4).Info(logging.FlowEnd)

ffromani · 2024-04-19T08:40:54Z

now I call this PR done :)

Bump NRT API package to v0.1.2; there is no API change, but we have now a better replacement for the internal `getID` helper, which we can now remove. Signed-off-by: Francesco Romani <[email protected]>

Move to the logr.Logger interface everywhere, instead of the global `klog` instance. This enable named logger, presetting values for simple and automatic consistency, enables pluggable loggers and comes for free since we already depend on the logr package and klog has a native logr integration. In addition, add minimal support to make it easy to replace the logr reference, to help integrators of this code. The default is still (and will still be) klog for backward compatibility and ecosystem integration. Signed-off-by: Francesco Romani <[email protected]>

ffromani · 2024-04-19T12:02:41Z

rebased on top of the 1.29 rebase

PiotrProkop · 2024-04-19T15:09:19Z

Looks good to me 👍
/lgtm

After more review and conversations, we have a better understanding of how integration with contextual logging should loom like. First and foremost, injecting loggers would conflict with the very goals of contextual logging. So, let's drop this code we added in kubernetes-sigs#710. The contextual logger doesn't do key/values deduplication. This is let to (some) backends. To avoid log clutter, trim down the extra key/value pairs and add only those we really need to ensure a good troubleshooting experience. Still let's make sure to add critical key/value pairs in the relevant entries, at cost of a possible duplication. When reporting the current assumed resources, the current representation is neither concise nor very human friendly. Additionally, multi-line log entries are harder to process and should be avoided. So let's move to a more concise representation, which turns out not obviously less human friendly and is no longer multiline. Review verbosiness of log entries. Move down to verbose=2 logs which are really key to understand the behavior. We should set a hard limit to log entries to minimize the log spam while keeping at least some observability without requiring v=4 or greater. The level v=4 is usually/often the highest-not-spammy log. When debug logs are needed we often set v=4, and higher verbosity levels are often used only in desperate times. Thus, promote to v=4 the debug logs we should really see. Everywhere else in the kubernetes ecosystem, and most notably in the scheduler, the pod namespace/name pair is called "pod", while we called it "logID". We do it to use the same name for all the flows, being the cache resync (which is driven by time, not by an object) the odd one. It seems better to be externally consistent (with the ecosystem) rather than internally consistent (all the flows in the same plugin), so we rename "logID" to "pod" in the log entries. Signed-off-by: Francesco Romani <[email protected]>

before kubernetes-sigs#710 and kubernetes-sigs#725, we logged the container being processed alongside the pod (identified by namespace/name pair). It was dropped by mistake and not deliberately. This is useful information when troubleshooting, so let's add it back. Signed-off-by: Francesco Romani <[email protected]>

After more review and conversations, we have a better understanding of how integration with contextual logging should loom like. First and foremost, injecting loggers would conflict with the very goals of contextual logging. So, let's drop this code we added in kubernetes-sigs#710. The contextual logger doesn't do key/values deduplication. This is let to (some) backends. To avoid log clutter, trim down the extra key/value pairs and add only those we really need to ensure a good troubleshooting experience. Still let's make sure to add critical key/value pairs in the relevant entries, at cost of a possible duplication. When reporting the current assumed resources, the current representation is neither concise nor very human friendly. Additionally, multi-line log entries are harder to process and should be avoided. So let's move to a more concise representation, which turns out not obviously less human friendly and is no longer multiline. Review verbosiness of log entries. Move down to verbose=2 logs which are really key to understand the behavior. We should set a hard limit to log entries to minimize the log spam while keeping at least some observability without requiring v=4 or greater. The level v=4 is usually/often the highest-not-spammy log. When debug logs are needed we often set v=4, and higher verbosity levels are often used only in desperate times. Thus, promote to v=4 the debug logs we should really see. Everywhere else in the kubernetes ecosystem, and most notably in the scheduler, the pod namespace/name pair is called "pod", while we called it "logID". We do it to use the same name for all the flows, being the cache resync (which is driven by time, not by an object) the odd one. It seems better to be externally consistent (with the ecosystem) rather than internally consistent (all the flows in the same plugin), so we rename "logID" to "pod" in the log entries. Signed-off-by: Francesco Romani <[email protected]> (cherry picked from commit 7098181)

before kubernetes-sigs#710 and kubernetes-sigs#725, we logged the container being processed alongside the pod (identified by namespace/name pair). It was dropped by mistake and not deliberately. This is useful information when troubleshooting, so let's add it back. Signed-off-by: Francesco Romani <[email protected]> (cherry picked from commit 7a8afdf)

After more review and conversations, we have a better understanding of how integration with contextual logging should loom like. First and foremost, injecting loggers would conflict with the very goals of contextual logging. So, let's drop this code we added in kubernetes-sigs#710. The contextual logger doesn't do key/values deduplication. This is let to (some) backends. To avoid log clutter, trim down the extra key/value pairs and add only those we really need to ensure a good troubleshooting experience. Still let's make sure to add critical key/value pairs in the relevant entries, at cost of a possible duplication. When reporting the current assumed resources, the current representation is neither concise nor very human friendly. Additionally, multi-line log entries are harder to process and should be avoided. So let's move to a more concise representation, which turns out not obviously less human friendly and is no longer multiline. Review verbosiness of log entries. Move down to verbose=2 logs which are really key to understand the behavior. We should set a hard limit to log entries to minimize the log spam while keeping at least some observability without requiring v=4 or greater. The level v=4 is usually/often the highest-not-spammy log. When debug logs are needed we often set v=4, and higher verbosity levels are often used only in desperate times. Thus, promote to v=4 the debug logs we should really see. Everywhere else in the kubernetes ecosystem, and most notably in the scheduler, the pod namespace/name pair is called "pod", while we called it "logID". We do it to use the same name for all the flows, being the cache resync (which is driven by time, not by an object) the odd one. It seems better to be externally consistent (with the ecosystem) rather than internally consistent (all the flows in the same plugin), so we rename "logID" to "pod" in the log entries. Signed-off-by: Francesco Romani <[email protected]>

before kubernetes-sigs#710 and kubernetes-sigs#725, we logged the container being processed alongside the pod (identified by namespace/name pair). It was dropped by mistake and not deliberately. This is useful information when troubleshooting, so let's add it back. Signed-off-by: Francesco Romani <[email protected]>

After more review and conversations, we have a better understanding of how integration with contextual logging should loom like. First and foremost, injecting loggers would conflict with the very goals of contextual logging. So, let's drop this code we added in kubernetes-sigs#710. The contextual logger doesn't do key/values deduplication. This is let to (some) backends. To avoid log clutter, trim down the extra key/value pairs and add only those we really need to ensure a good troubleshooting experience. Still let's make sure to add critical key/value pairs in the relevant entries, at cost of a possible duplication. When reporting the current assumed resources, the current representation is neither concise nor very human friendly. Additionally, multi-line log entries are harder to process and should be avoided. So let's move to a more concise representation, which turns out not obviously less human friendly and is no longer multiline. Review verbosiness of log entries. Move down to verbose=2 logs which are really key to understand the behavior. We should set a hard limit to log entries to minimize the log spam while keeping at least some observability without requiring v=4 or greater. The level v=4 is usually/often the highest-not-spammy log. When debug logs are needed we often set v=4, and higher verbosity levels are often used only in desperate times. Thus, promote to v=4 the debug logs we should really see. Everywhere else in the kubernetes ecosystem, and most notably in the scheduler, the pod namespace/name pair is called "pod", while we called it "logID". We do it to use the same name for all the flows, being the cache resync (which is driven by time, not by an object) the odd one. It seems better to be externally consistent (with the ecosystem) rather than internally consistent (all the flows in the same plugin), so we rename "logID" to "pod" in the log entries. Signed-off-by: Francesco Romani <[email protected]>

k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/feature Categorizes issue or PR as related to a new feature. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Apr 10, 2024

k8s-ci-robot requested a review from seanmalloy April 10, 2024 11:41

k8s-ci-robot requested a review from Tal-or April 10, 2024 11:41

k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Apr 10, 2024

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 10, 2024

PiotrProkop reviewed Apr 10, 2024

View reviewed changes

ffromani force-pushed the nrt-klog-to-logr branch from de5ee40 to b015657 Compare April 11, 2024 10:26

ffromani force-pushed the nrt-klog-to-logr branch 4 times, most recently from 655b387 to 23910b8 Compare April 15, 2024 08:54

k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 15, 2024

ffromani force-pushed the nrt-klog-to-logr branch 2 times, most recently from ea82f02 to 4960b2c Compare April 16, 2024 13:26

k8s-ci-robot requested a review from swatisehgal April 16, 2024 13:53

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 16, 2024

ffromani force-pushed the nrt-klog-to-logr branch from 4960b2c to 663e763 Compare April 16, 2024 15:34

k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 17, 2024

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 19, 2024

ffromani force-pushed the nrt-klog-to-logr branch from df75f99 to ed3ef23 Compare April 19, 2024 08:39

k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 19, 2024

ffromani added 2 commits April 19, 2024 13:57

nrt: deps: bump to noderesourcetopology API v0.1.2

8d9a4cd

Bump NRT API package to v0.1.2; there is no API change, but we have now a better replacement for the internal `getID` helper, which we can now remove. Signed-off-by: Francesco Romani <[email protected]>

ffromani force-pushed the nrt-klog-to-logr branch from ed3ef23 to 045a43d Compare April 19, 2024 12:01

k8s-ci-robot assigned PiotrProkop Apr 19, 2024

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 19, 2024

k8s-ci-robot merged commit 51d27b6 into kubernetes-sigs:master Apr 19, 2024
10 checks passed

ffromani deleted the nrt-klog-to-logr branch April 22, 2024 07:55

ffromani mentioned this pull request Apr 23, 2024

[noderesourcetopology] complete the contextual logging integration #725

Merged

ffromani mentioned this pull request May 22, 2024

Migrate to contextual logging #743

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[noderesourcetopology] move logging to the logr interface #710

[noderesourcetopology] move logging to the logr interface #710

ffromani commented Apr 10, 2024 •

edited

Loading

k8s-ci-robot commented Apr 10, 2024

ffromani commented Apr 10, 2024

netlify bot commented Apr 10, 2024 •

edited

Loading

ffromani commented Apr 10, 2024

ffromani commented Apr 11, 2024

ffromani commented Apr 15, 2024

ffromani commented Apr 16, 2024

PiotrProkop commented Apr 16, 2024 •

edited

Loading

ffromani commented Apr 16, 2024

ffromani commented Apr 16, 2024

ffromani commented Apr 17, 2024

ffromani commented Apr 19, 2024 •

edited

Loading

ffromani commented Apr 19, 2024

ffromani commented Apr 19, 2024

ffromani commented Apr 19, 2024

PiotrProkop commented Apr 19, 2024

[noderesourcetopology] move logging to the logr interface #710

[noderesourcetopology] move logging to the logr interface #710

Conversation

ffromani commented Apr 10, 2024 • edited Loading

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

k8s-ci-robot commented Apr 10, 2024

ffromani commented Apr 10, 2024

netlify bot commented Apr 10, 2024 • edited Loading

✅ Deploy Preview for kubernetes-sigs-scheduler-plugins canceled.

ffromani commented Apr 10, 2024

ffromani commented Apr 11, 2024

ffromani commented Apr 15, 2024

ffromani commented Apr 16, 2024

PiotrProkop commented Apr 16, 2024 • edited Loading

ffromani commented Apr 16, 2024

ffromani commented Apr 16, 2024

ffromani commented Apr 17, 2024

ffromani commented Apr 19, 2024 • edited Loading

ffromani commented Apr 19, 2024

ffromani commented Apr 19, 2024

ffromani commented Apr 19, 2024

PiotrProkop commented Apr 19, 2024

ffromani commented Apr 10, 2024 •

edited

Loading

netlify bot commented Apr 10, 2024 •

edited

Loading

PiotrProkop commented Apr 16, 2024 •

edited

Loading

ffromani commented Apr 19, 2024 •

edited

Loading