Skip to content

[client] Use ctx.Err() instead of gRPC codes.Canceled to detect shutdown#6019

Open
pappz wants to merge 1 commit intomainfrom
fix/grpc-canceled-retry
Open

[client] Use ctx.Err() instead of gRPC codes.Canceled to detect shutdown#6019
pappz wants to merge 1 commit intomainfrom
fix/grpc-canceled-retry

Conversation

@pappz
Copy link
Copy Markdown
Collaborator

@pappz pappz commented Apr 28, 2026

Detecting shutdown by inspecting the gRPC status code conflates a local context cancellation with a server- or proxy-sent codes.Canceled. When the latter occurs (e.g. an intermediary proxy resets the stream), the retry loop silently terminates and the client never reconnects.

Switch to ctx.Err() in the signal Receive loop and management Sync/Job handlers, and stop matching gRPC Canceled/DeadlineExceeded in the flow client's isContextDone helper. With this change, a server-sent Canceled is treated as a transient error and the backoff retry loop continues.

Describe your changes

Issue ticket number and link

Stack

Checklist

  • Is it a bug fix
  • Is a typo/documentation fix
  • Is a feature enhancement
  • It is a refactor
  • Created tests that fail without the change (if possible)

By submitting this pull request, you confirm that you have read and agree to the terms of the Contributor License Agreement.

Documentation

Select exactly one:

  • I added/updated documentation for this change
  • Documentation is not needed for this change (explain why)

Docs PR URL (required if "docs added" is checked)

Paste the PR link from https://github.com/netbirdio/docs here:

https://github.com/netbirdio/docs/pull/__

Summary by CodeRabbit

Release Notes

  • Bug Fixes

    • Improved context cancellation handling across client components when retrying failed operations.
    • Refined error detection to prioritize local context state over remote gRPC status codes.
    • Enhanced shutdown behavior when operations are canceled or exceed deadlines.
  • Refactor

    • Simplified error handling logic in network stream management.

Detecting shutdown by inspecting the gRPC status code conflates a local
context cancellation with a server- or proxy-sent codes.Canceled. When
the latter occurs (e.g. an intermediary proxy resets the stream), the
retry loop silently terminates and the client never reconnects.

Switch to ctx.Err() in the signal Receive loop and management Sync/Job
handlers, and stop matching gRPC Canceled/DeadlineExceeded in the flow
client's isContextDone helper. With this change, a server-sent Canceled
is treated as a transient error and the backoff retry loop continues.
@sonarqubecloud
Copy link
Copy Markdown

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 28, 2026

📝 Walkthrough

Walkthrough

The PR refactors error-handling and retry logic in three gRPC client files to prioritize context cancellation (ctx.Err() != nil) over gRPC status code interpretation. The isContextDone helper in flow/client/client.go now exclusively checks the local context state rather than also examining gRPC status codes. Related changes in management and signal stream clients align error handling to follow this same pattern.

Changes

Cohort / File(s) Summary
Retry helper refactor
flow/client/client.go
Modified isContextDone to check only context cancellation/deadline; removed gRPC status code checks and associated imports for grpc/codes and grpc/status.
Stream error handling
shared/management/client/grpc.go, shared/signal/client/grpc.go
Refactored job, sync, and signal stream error handlers to prioritize context cancellation checks (ctx.Err() != nil) before interpreting gRPC status codes; removed special handling of codes.Canceled and consolidated retry-triggering error paths.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested reviewers

  • lixmal
  • mlsmaycon

Poem

A rabbit hops through context streams so clear, 🐰
No status codes to cloud the path from here,
When deadlines loom or cancels ring the bell,
The context speaks—no gRPC spell.
Retries halt where truth takes root,
In ctx.Err's decisive boot! 🌟

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: replacing gRPC status code inspection with ctx.Err() for shutdown detection.
Description check ✅ Passed The description provides clear context about the bug, the fix applied, and the impact. It follows the template with checklist items marked appropriately.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/grpc-canceled-retry

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@flow/client/client.go`:
- Around line 302-309: The current isContextDone(err error) helper incorrectly
infers local shutdown from the returned error; change its signature to
isContextDone(ctx context.Context, err error) and make it return true if
ctx.Err() != nil OR errors.Is(err, context.Canceled) || errors.Is(err,
context.DeadlineExceeded) so local cancellations represented via ctx are
detected reliably; update all callers (e.g., the retry loop that currently calls
isContextDone(err)) to pass the current context variable into isContextDone(ctx,
err).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 763ff032-5c73-49e2-9538-60e29e648c04

📥 Commits

Reviewing files that changed from the base of the PR and between e5474e1 and 48390f5.

📒 Files selected for processing (3)
  • flow/client/client.go
  • shared/management/client/grpc.go
  • shared/signal/client/grpc.go

Comment thread flow/client/client.go
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant