Skip to content

[client] Add tri-state connection status to guard for smarter ICE retry#5828

Merged
pappz merged 4 commits intorefactor/force-relayfrom
refactor/ice-retry-finetune
Apr 14, 2026
Merged

[client] Add tri-state connection status to guard for smarter ICE retry#5828
pappz merged 4 commits intorefactor/force-relayfrom
refactor/ice-retry-finetune

Conversation

@pappz
Copy link
Copy Markdown
Collaborator

@pappz pappz commented Apr 8, 2026

Describe your changes

Refactor: optimize ICE retry strategy

  • Refactored isConnectedOnAllWay to return a ConnStatus enum:

    • Connected
    • Disconnected
    • PartiallyConnected
  • Updated connection handling logic:

    • When relay is up but ICE is not (PartiallyConnected):

      • Limit ICE offers to 3 retries with exponential backoff
      • After retries are exhausted, fall back to hourly retry attempts
      • Reduces unnecessary signaling traffic
    • When fully disconnected:

      • Continue aggressive retry behavior
  • Improved resiliency:

    • External events (relay/ICE disconnect, signal/relay reconnect) reset retry state
    • Ensures ICE gets a fresh retry cycle after state changes

Issue ticket number and link

Stack

Checklist

  • Is it a bug fix
  • Is a typo/documentation fix
  • Is a feature enhancement
  • It is a refactor
  • Created tests that fail without the change (if possible)

By submitting this pull request, you confirm that you have read and agree to the terms of the Contributor License Agreement.

Documentation

Select exactly one:

  • I added/updated documentation for this change
  • Documentation is not needed for this change (explain why)

Docs PR URL (required if "docs added" is checked)

Paste the PR link from https://github.com/netbirdio/docs here:

https://github.com/netbirdio/docs/pull/__

Summary by CodeRabbit

  • Bug Fixes
    • Improved handling of partial peer connectivity when only some connection paths are available.
    • Enhanced connection retry mechanism for faster and more reliable peer-to-peer connection establishment in challenging network conditions.

Refactor isConnectedOnAllWay to return a ConnStatus enum (Connected,
Disconnected, PartiallyConnected) instead of a boolean. When relay is
up but ICE is not (PartiallyConnected), limit ICE offers to 3 retries
with exponential backoff then fall back to hourly attempts, reducing
unnecessary signaling traffic. Fully disconnected peers continue to
retry aggressively. External events (relay/ICE disconnect, signal/relay
reconnect) reset retry state to give ICE a fresh chance.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 8, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: c0201d04-e72b-4b1c-a064-8adf261d11ff

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • ✅ Review completed - (🔄 Check again to review again)
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch refactor/ice-retry-finetune

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@client/internal/peer/conn.go`:
- Around line 718-757: The isConnectedOnAllWay method can nil-deref
conn.workerICE when handshaker.RemoteICESupported() is true but workerICE is
nil; update isConnectedOnAllWay to check conn.workerICE != nil before calling
conn.workerICE.InProgress(), e.g. only evaluate conn.workerICE.InProgress() when
conn.handshaker.RemoteICESupported() && conn.workerICE != nil, and keep the
existing logic using conn.statusICE.Get() to determine iceConnected; ensure
references are to isConnectedOnAllWay, conn.handshaker.RemoteICESupported(),
conn.workerICE, conn.workerICE.InProgress(), and conn.statusICE.Get().
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e50e9e1c-a1fb-45a1-b6ee-6dfee7d5bce4

📥 Commits

Reviewing files that changed from the base of the PR and between 2734a33 and 0fa1925.

📒 Files selected for processing (3)
  • client/internal/peer/conn.go
  • client/internal/peer/guard/guard.go
  • client/internal/peer/guard/ice_retry_state.go

Comment thread client/internal/peer/conn.go
@pappz pappz marked this pull request as ready for review April 8, 2026 17:01
pappz added 2 commits April 10, 2026 21:34
@sonarqubecloud
Copy link
Copy Markdown

@pappz pappz merged commit d2c18fd into refactor/force-relay Apr 14, 2026
42 of 44 checks passed
@pappz pappz deleted the refactor/ice-retry-finetune branch April 14, 2026 16:19
@coderabbitai coderabbitai Bot mentioned this pull request Apr 14, 2026
7 tasks
pappz added a commit that referenced this pull request Apr 21, 2026
* [client] Suppress ICE signaling and periodic offers in force-relay mode

When NB_FORCE_RELAY is enabled, skip WorkerICE creation entirely,
suppress ICE credentials in offer/answer messages, disable the
periodic ICE candidate monitor, and fix isConnectedOnAllWay to
only check relay status so the guard stops sending unnecessary offers.

* [client] Dynamically suppress ICE based on remote peer's offer credentials

Track whether the remote peer includes ICE credentials in its
offers/answers. When remote stops sending ICE credentials, skip
ICE listener dispatch, suppress ICE credentials in responses, and
exclude ICE from the guard connectivity check. When remote resumes
sending ICE credentials, re-enable all ICE behavior.

* [client] Fix nil SessionID panic and force ICE teardown on relay-only transition

Fix nil pointer dereference in signalOfferAnswer when SessionID is nil
(relay-only offers). Close stale ICE agent immediately when remote peer
stops sending ICE credentials to avoid traffic black-hole during the
ICE disconnect timeout.

* [client] Add relay-only fallback check when ICE is unavailable

Ensure the relay connection is supported with the peer when ICE is disabled to prevent connectivity issues.

* [client] Add tri-state connection status to guard for smarter ICE retry (#5828)

* [client] Add tri-state connection status to guard for smarter ICE retry

Refactor isConnectedOnAllWay to return a ConnStatus enum (Connected,
Disconnected, PartiallyConnected) instead of a boolean. When relay is
up but ICE is not (PartiallyConnected), limit ICE offers to 3 retries
with exponential backoff then fall back to hourly attempts, reducing
unnecessary signaling traffic. Fully disconnected peers continue to
retry aggressively. External events (relay/ICE disconnect, signal/relay
reconnect) reset retry state to give ICE a fresh chance.

* [client] Clarify guard ICE retry state and trace log trigger

Split iceRetryState.attempt into shouldRetry (pure predicate) and
enterHourlyMode (explicit state transition) so the caller in
reconnectLoopWithRetry reads top-to-bottom. Restore the original
trace-log behavior in isConnectedOnAllWay so it only logs on full
disconnection, not on the new PartiallyConnected state.

* [client] Extract pure evalConnStatus and add unit tests

Split isConnectedOnAllWay into a thin method that snapshots state and
a pure evalConnStatus helper that takes a connStatusInputs struct, so
the tri-state decision logic can be exercised without constructing
full Worker or Handshaker objects. Add table-driven tests covering
force-relay, ICE-unavailable and fully-available code paths, plus
unit tests for iceRetryState budget/hourly transitions and reset.

* [client] Improve grammar in logs and refactor ICE credential checks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants