Skip to content

Conversation

skeeey
Copy link
Member

@skeeey skeeey commented Oct 17, 2025

Summary

  1. using klog FromContext instead of klog
  2. record the reconnecting info on log level 2
  3. for pub/sub, record the event meta on log level 2 and record the whole event on log level 5

Related issue(s)

Fixes #

Summary by CodeRabbit

  • Chores
    • Improved structured, contextual logging across connect, publish, and subscribe paths for clearer operational traces.
    • Added a reconnection signal dispatch to better surface successful reconnect events.
    • Refined latency monitoring and reporting with more consistent contextual logs and threshold-based alerts.

@openshift-ci openshift-ci bot requested review from deads2k and qiujian16 October 17, 2025 08:52
Copy link

coderabbitai bot commented Oct 17, 2025

Walkthrough

Replaces klog calls with contextual loggers (klog.FromContext(ctx)) and updates logging across connect, publish, subscribe, and receiver flows. Adds a call to c.sendReconnectedSignal() after successful reconnection and adjusts latency logging and message fields/verbosity; no exported APIs changed. (45 words)

Changes

Cohort / File(s) Summary
Logging refactor & reconnect signal
pkg/cloudevents/generic/baseclient.go
Replaced direct klog calls with contextual logger := klog.FromContext(ctx) and logger.V(...).Info(...) usages across connect/reconnect, publish, subscribe, and receiver code paths; added c.sendReconnectedSignal() after successful reconnect; changed latency logging approach and updated log fields/message text and verbosity levels; no public API changes.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Suggested labels

approved, lgtm

Suggested reviewers

  • deads2k
  • qiujian16

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Title Check ✅ Passed The pull request title ":seedling: using context log" is directly and clearly related to the main change in the changeset. The raw summary confirms that the primary modification is replacing klog-based log statements with a contextual logger obtained via klog.FromContext(ctx), which is precisely what the title conveys. The title is concise, uses the appropriate emoji convention from the repository's guidelines (:seedling: for other/misc changes), and is specific enough that a teammate would understand the core change when scanning history.
Description Check ✅ Passed The pull request description follows the required template structure with both "Summary" and "Related issue(s)" sections present. The Summary section is substantively filled with three specific points describing the changes: using klog FromContext instead of direct klog, recording reconnecting information at log level 2, and recording event metadata at log level 2 with full events at level 5. The "Related issue(s)" section contains the template text "Fixes #" without an issue number, which appears intentional as the pr_objectives indicate no related issue is specified. The description is clear and specific rather than vague or generic.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6b2a935 and 46800f3.

📒 Files selected for processing (1)
  • pkg/cloudevents/generic/baseclient.go (9 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-09-16T02:22:20.929Z
Learnt from: skeeey
PR: open-cluster-management-io/sdk-go#144
File: pkg/cloudevents/generic/options/grpc/protocol/protocol.go:200-213
Timestamp: 2025-09-16T02:22:20.929Z
Learning: In the GRPC CloudEvents protocol implementation, when startEventsReceiver encounters a stream error, it sends the error to reconnectErrorChan. The consumer of this channel handles the error by calling Close() on the protocol, which triggers close(p.closeChan), causing OpenInbound to unblock and call cancel() to properly terminate both the events receiver and heartbeat watcher goroutines.

Applied to files:

  • pkg/cloudevents/generic/baseclient.go
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: verify
  • GitHub Check: integration
  • GitHub Check: unit
🔇 Additional comments (8)
pkg/cloudevents/generic/baseclient.go (8)

51-52: LGTM: Contextual logger initialization.

Correctly obtains the logger from context for structured logging throughout the connect flow.


63-63: LGTM: Reconnection logging.

Appropriately logs reconnection attempts and success at V(2) using the contextual logger.

Also applies to: 75-75


113-127: LGTM: Throttling latency logging.

The latency measurement is correct (captures now at line 114 before the wait, calculates latency at line 120 after). The structured logging with context fields at V(3) is appropriate for debugging throttling delays.


138-139: LGTM: Tiered event logging.

Excellent use of log levels: V(2) logs event metadata, V(5) logs the full event. The deferred function func() any { return evt.String() } correctly avoids expensive string generation unless V(5) is enabled.


151-154: LGTM: Subscribe initialization and guard.

Correctly initializes the contextual logger and guards against duplicate subscription starts.


169-170: LGTM: Received event logging.

Mirrors the sending pattern with tiered logging at V(2) for metadata and V(5) for the full event, maintaining consistency across the codebase.


193-193: LGTM: Receiver lifecycle logging.

Appropriately logs receiver state transitions (restart/stop) at V(2) for operational visibility.

Also applies to: 198-198


79-79: The original review comment is incorrect.

The baseClient is a private struct that can only be instantiated through the public constructors NewCloudEventSourceClient and NewCloudEventAgentClient, both of which explicitly initialize reconnectedChan with make(chan struct{}) before calling connect(ctx). Since sendReconnectedSignal() is only invoked from within the connect() method after the channel has already been initialized, there is no risk of sending on a nil channel.

Likely an incorrect or invalid review comment.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
pkg/cloudevents/generic/baseclient.go (3)

51-51: Tag logs with client identity once.

Use WithValues to attach clientID to all logs from this context (apply similarly in publish/subscribe).

- logger := klog.FromContext(ctx)
+ logger := klog.FromContext(ctx).WithValues("clientID", c.clientID)

Apply analogous changes:

- logger := klog.FromContext(ctx)
+ logger := klog.FromContext(ctx).WithValues("clientID", c.clientID)

at Line 113 and Line 150.


153-153: LGTM: duplicate subscription guard message at V(2).

Optionally include clientID for quicker correlation.

- logger.V(2).Info("the subscription has already started")
+ logger.V(2).Info("the subscription has already started")

(Will be auto-included if you adopt WithValues as suggested earlier.)


192-198: LGTM: receiver restart/stop logs at V(2).

Consider mirroring the non-blocking signal pattern for sendReceiverSignal to avoid holding RLock while sending.

-func (c *baseClient) sendReceiverSignal(signal int) {
-    c.RLock()
-    defer c.RUnlock()
-    if c.receiverChan != nil {
-        c.receiverChan <- signal
-    }
-}
+func (c *baseClient) sendReceiverSignal(signal int) {
+    c.RLock()
+    ch := c.receiverChan
+    c.RUnlock()
+    if ch == nil {
+        return
+    }
+    select {
+    case ch <- signal:
+    default:
+    }
+}
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b82d34b and 6b2a935.

📒 Files selected for processing (1)
  • pkg/cloudevents/generic/baseclient.go (8 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-09-16T02:22:20.929Z
Learnt from: skeeey
PR: open-cluster-management-io/sdk-go#144
File: pkg/cloudevents/generic/options/grpc/protocol/protocol.go:200-213
Timestamp: 2025-09-16T02:22:20.929Z
Learning: In the GRPC CloudEvents protocol implementation, when startEventsReceiver encounters a stream error, it sends the error to reconnectErrorChan. The consumer of this channel handles the error by calling Close() on the protocol, which triggers close(p.closeChan), causing OpenInbound to unblock and call cancel() to properly terminate both the events receiver and heartbeat watcher goroutines.

Applied to files:

  • pkg/cloudevents/generic/baseclient.go
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: integration
  • GitHub Check: verify
  • GitHub Check: unit
🔇 Additional comments (6)
pkg/cloudevents/generic/baseclient.go (6)

63-76: Contextual reconnect logs look good.

Verbosity and messages make sense; structured logger is used correctly.


113-113: LGTM: contextual logger in publish.


137-139: Event logging OK; confirm no sensitive payload at V(5).

Metadata at V(2) and full event at V(5) meets the stated objectives. Verify that evt.String() cannot leak secrets/PII in your deployment.


150-150: LGTM: contextual logger in subscribe.


168-170: LGTM: received-event logs with metadata (V2) and full event (V5).


75-80: Review comment is based on an incorrect premise and should be disregarded.

The review expresses concern that reconnectedChan may be nil and panic, but verification shows:

  • reconnectedChan is always initialized in both sourceclient.go:50 and agentclient.go:50 via make(chan struct{})
  • The channel is never explicitly closed anywhere in the codebase
  • Sending to a nil channel would never occur—there is no nil panic risk

While the code does hold an RLock during the channel send (which could theoretically be optimized for better concurrency on an unbuffered channel), this is not a panic risk and is a separate performance concern from the nil-related issue claimed in the review. The current implementation is safe and functional.

The suggested diff is unnecessary.

Likely an incorrect or invalid review comment.

Signed-off-by: Wei Liu <[email protected]>
@skeeey
Copy link
Member Author

skeeey commented Oct 17, 2025

/assign @qiujian16

@qiujian16
Copy link
Member

/approve
/lgtm

@openshift-ci openshift-ci bot added the lgtm label Oct 20, 2025
Copy link

openshift-ci bot commented Oct 20, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: qiujian16, skeeey

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot openshift-merge-bot bot merged commit a3f7625 into open-cluster-management-io:main Oct 20, 2025
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants