Skip to content

[client] Supress ICE signaling#5820

Merged
pappz merged 9 commits intomainfrom
refactor/force-relay
Apr 21, 2026
Merged

[client] Supress ICE signaling#5820
pappz merged 9 commits intomainfrom
refactor/force-relay

Conversation

@pappz
Copy link
Copy Markdown
Collaborator

@pappz pappz commented Apr 7, 2026

Describe your changes

Suppress ICE signaling and periodic offers in force-relay mode

  • When NB_FORCE_RELAY is enabled, skip WorkerICE creation entirely,
  • suppress ICE credentials in offer/answer messages
  • disable the periodic ICE candidate monitor
  • fix isConnectedOnAllWay to only check relay status so the guard stops sending unnecessary offers.

Optimize ICE retry strategy

In case established Relayed connection will stop ICE signaling after some retry.

  • When relay is up but ICE is not (PartiallyConnected):

    • Limit ICE offers to 3 retries with exponential backoff
    • After retries are exhausted, fall back to hourly retry attempts
    • Reduces unnecessary signaling traffic
  • When fully disconnected:

    • Continue aggressive retry behavior
  • Improved resiliency:

    • External events (relay/ICE disconnect, signal/relay reconnect) reset retry state
    • Ensures ICE gets a fresh retry cycle after state changes

Issue ticket number and link

Stack

Checklist

  • Is it a bug fix
  • Is a typo/documentation fix
  • Is a feature enhancement
  • It is a refactor
  • Created tests that fail without the change (if possible)

By submitting this pull request, you confirm that you have read and agree to the terms of the Contributor License Agreement.

Documentation

Select exactly one:

  • I added/updated documentation for this change
  • Documentation is not needed for this change (explain why)

Docs PR URL (required if "docs added" is checked)

Paste the PR link from https://github.com/netbirdio/docs here:

https://github.com/netbirdio/docs/pull/__

Summary by CodeRabbit

  • Refactor

    • Conditional startup/monitoring now adapts between relay and ICE modes (including forced-relay), and connection readiness reports Connected / PartiallyConnected / Disconnected with tailored reconnect backoff.
    • Improved detection and tracking of whether a remote peer supports ICE.
  • Bug Fixes

    • Safer handling to avoid nil ICE workers and related errors.
    • Preserved session ID/credential behavior when session data is absent.
  • Tests

    • Added unit tests for connection-status evaluation and ICE retry behavior.

pappz added 3 commits April 7, 2026 14:08
When NB_FORCE_RELAY is enabled, skip WorkerICE creation entirely,
suppress ICE credentials in offer/answer messages, disable the
periodic ICE candidate monitor, and fix isConnectedOnAllWay to
only check relay status so the guard stops sending unnecessary offers.
…tials

Track whether the remote peer includes ICE credentials in its
offers/answers. When remote stops sending ICE credentials, skip
ICE listener dispatch, suppress ICE credentials in responses, and
exclude ICE from the guard connectivity check. When remote resumes
sending ICE credentials, re-enable all ICE behavior.
… transition

Fix nil pointer dereference in signalOfferAnswer when SessionID is nil
(relay-only offers). Close stale ICE agent immediately when remote peer
stops sending ICE credentials to avoid traffic black-hole during the
ICE disconnect timeout.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 7, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

SR watcher and connection startup now respect a "force relayed" flag: ICE monitor and ICE worker creation can be disabled, Handshaker tracks remote ICE support and omits ICE credentials when appropriate, and Guard uses a tri-state ConnStatus with limited ICE retry policy and hourly fallback.

Changes

Cohort / File(s) Summary
Engine / SR Watcher
client/internal/engine.go, client/internal/peer/guard/sr_watcher.go
Engine.Start forwards force-relay to SRWatcher.Start(disableICEMonitor bool); SRWatcher conditionally skips creating/starting the ICE monitor when disabled.
Connection init & lifecycle
client/internal/peer/conn.go
Conn.Open checks IsForceRelayed() to skip creating the ICE worker and avoiding ICE listener registration; guarded nil checks for conn.workerICE; connection readiness now uses tri-state guard.ConnStatus.
Force-relay helper
client/internal/peer/env.go
Renamed isForceRelayed()IsForceRelayed() (identifier visibility changed; behavior unchanged).
Handshaker ICE handling
client/internal/peer/handshaker.go
Added remoteICESupported atomic flag; detect ICE presence in remote offers/answers; only include ICE creds/session in outgoing messages when local ICE exists and remote supports ICE; expose RemoteICESupported(); close local ICE worker when remote stops advertising ICE.
Guard / Reconnect & ICE retry
client/internal/peer/guard/guard.go, client/internal/peer/guard/ice_retry_state.go
Replaced boolean connectivity callback with ConnStatus enum; reconnect loop handles PartiallyConnected using new iceRetryState (retry budget, can switch to hourly ticker); ticker helper renamed to newReconnectTicker.
Conn status helpers & tests
client/internal/peer/conn_status.go, client/internal/peer/conn_status_eval_test.go
Added connStatusInputs and table-driven tests validating evalConnStatus across force-relay, ICE-unavailable, and fully-available scenarios.
Guard tests
client/internal/peer/guard/ice_retry_state_test.go
Added unit tests for iceRetryState (retry budget behavior, hourly mode, reset/idempotence).
Signaler
client/internal/peer/signaler.go
signalOfferAnswer only converts SessionID to bytes when non-nil; otherwise passes nil session bytes.

Sequence Diagram(s)

sequenceDiagram
    participant Engine
    participant Conn
    participant SRWatcher
    participant Handshaker
    participant RemotePeer
    participant Guard

    rect rgba(100,150,200,0.5)
    Note over Engine,RemotePeer: Force-relayed path (ICE disabled)
    Engine->>Conn: Open(forceRelay=true)
    Conn->>Conn: Skip ICE worker creation
    Conn->>Handshaker: Register relay-only listeners
    Engine->>SRWatcher: Start(disableICEMonitor=true)
    SRWatcher->>SRWatcher: Skip ICE monitor goroutine
    Handshaker->>RemotePeer: Send Offer (no ICE creds)
    RemotePeer->>Handshaker: Send Answer (no ICE creds)
    Handshaker->>Handshaker: remoteICESupported = false
    Guard->>Guard: Eval ConnStatus (relay-based)
    end

    rect rgba(150,200,100,0.5)
    Note over Engine,RemotePeer: Normal path (ICE enabled)
    Engine->>Conn: Open(forceRelay=false)
    Conn->>Conn: Create ICE worker
    Conn->>Handshaker: Register ICE listener
    Engine->>SRWatcher: Start(disableICEMonitor=false)
    SRWatcher->>SRWatcher: Start ICE monitor goroutine
    Handshaker->>RemotePeer: Send Offer (with ICE creds)
    RemotePeer->>Handshaker: Send Answer (with ICE creds)
    Handshaker->>Handshaker: remoteICESupported = true
    Guard->>Guard: Eval ConnStatus (ICE+relay -> Connected/PartiallyConnected)
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • lixmal
  • crn4

Poem

🐇 I hop where signals change their tune,
Skip ICE in moonlight, keep relay in June,
I tuck my creds when the other won't share,
Retry a few hops, then wait hourly with care—
A rabbit applauds this nimble network affair.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 5.56% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title '[client] Supress ICE signaling' is directly related to the main changes in the PR, which focus on suppressing ICE signaling when force relay is enabled, but contains a typo ('Supress' should be 'Suppress') and is incomplete—it omits the critical condition 'in case of force relay' mentioned in the PR's full objectives.
Description check ✅ Passed The PR description clearly outlines changes, includes a detailed explanation section, marks documentation as not needed with rationale, and completes the CLA confirmation.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch refactor/force-relay

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
client/internal/peer/conn.go (1)

735-749: ⚠️ Potential issue | 🟠 Major

Require at least one transport when remote ICE is disabled.

Once RemoteICESupported() is false, this path skips the ICE gate. If relay is also unsupported for the peer, the function now falls through to true, so the guard stops re-offering even though there is no working transport.

💡 Suggested fix
 func (conn *Conn) isConnectedOnAllWay() (connected bool) {
@@
-	// For non-forced platforms: check ICE connection status only if remote peer supports ICE
-	if conn.handshaker.RemoteICESupported() {
+	relaySupported := conn.workerRelay.IsRelayConnectionSupportedWithPeer()
+
+	// For non-forced platforms: check ICE connection status only if remote peer supports ICE.
+	// If the remote disabled ICE, relay has to be available instead.
+	if conn.handshaker.RemoteICESupported() {
 		if conn.statusICE.Get() == worker.StatusDisconnected && !conn.workerICE.InProgress() {
 			return false
 		}
+	} else if !relaySupported {
+		return false
 	}
 
 	// If relay is supported with peer, it must also be connected
-	if conn.workerRelay.IsRelayConnectionSupportedWithPeer() {
+	if relaySupported {
 		if conn.statusRelay.Get() == worker.StatusDisconnected {
 			return false
 		}
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/internal/peer/conn.go` around lines 735 - 749, The current
transport-readiness check incorrectly returns true when RemoteICESupported() is
false and the peer also doesn't support relay; update the logic in the readiness
function that contains conn.handshaker.RemoteICESupported(), conn.statusICE, and
conn.workerRelay.IsRelayConnectionSupportedWithPeer() so that when
RemoteICESupported() is false you still require at least one working transport:
if relay is not supported with the peer
(workerRelay.IsRelayConnectionSupportedWithPeer() == false) then return false;
otherwise continue to validate the relay status as currently done (i.e., require
statusRelay != StatusDisconnected).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@client/internal/peer/conn.go`:
- Around line 735-749: The current transport-readiness check incorrectly returns
true when RemoteICESupported() is false and the peer also doesn't support relay;
update the logic in the readiness function that contains
conn.handshaker.RemoteICESupported(), conn.statusICE, and
conn.workerRelay.IsRelayConnectionSupportedWithPeer() so that when
RemoteICESupported() is false you still require at least one working transport:
if relay is not supported with the peer
(workerRelay.IsRelayConnectionSupportedWithPeer() == false) then return false;
otherwise continue to validate the relay status as currently done (i.e., require
statusRelay != StatusDisconnected).

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 97653c97-31db-42e6-973f-df416c625882

📥 Commits

Reviewing files that changed from the base of the PR and between 0588d2d and 2734a33.

📒 Files selected for processing (6)
  • client/internal/engine.go
  • client/internal/peer/conn.go
  • client/internal/peer/env.go
  • client/internal/peer/guard/sr_watcher.go
  • client/internal/peer/handshaker.go
  • client/internal/peer/signaler.go

@pappz pappz marked this pull request as ready for review April 7, 2026 15:46
lixmal
lixmal previously approved these changes Apr 10, 2026
Ensure the relay connection is supported with the peer when ICE is disabled to prevent connectivity issues.
lixmal
lixmal previously approved these changes Apr 10, 2026
…ry (#5828)

* [client] Add tri-state connection status to guard for smarter ICE retry

Refactor isConnectedOnAllWay to return a ConnStatus enum (Connected,
Disconnected, PartiallyConnected) instead of a boolean. When relay is
up but ICE is not (PartiallyConnected), limit ICE offers to 3 retries
with exponential backoff then fall back to hourly attempts, reducing
unnecessary signaling traffic. Fully disconnected peers continue to
retry aggressively. External events (relay/ICE disconnect, signal/relay
reconnect) reset retry state to give ICE a fresh chance.
lixmal
lixmal previously approved these changes Apr 16, 2026
pappz added 2 commits April 20, 2026 11:31
Split iceRetryState.attempt into shouldRetry (pure predicate) and
enterHourlyMode (explicit state transition) so the caller in
reconnectLoopWithRetry reads top-to-bottom. Restore the original
trace-log behavior in isConnectedOnAllWay so it only logs on full
disconnection, not on the new PartiallyConnected state.
Split isConnectedOnAllWay into a thin method that snapshots state and
a pure evalConnStatus helper that takes a connStatusInputs struct, so
the tri-state decision logic can be exercised without constructing
full Worker or Handshaker objects. Add table-driven tests covering
force-relay, ICE-unavailable and fully-available code paths, plus
unit tests for iceRetryState budget/hourly transitions and reset.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
client/internal/peer/conn.go (2)

620-629: Minor: defensive nil-check for workerICE in onWGDisconnected.

In force-relay mode conn.workerICE is nil. The ICEP2P/ICETurn branch is today unreachable in that mode (priority can only be None or Relay when there is no ICE worker), so this isn't an active bug — but the rest of the file (lines 253, 298) now guards workerICE explicitly, so it would be consistent (and future-proof against a refactor that introduces an ICE priority without a worker) to guard here too.

🛡️ Defensive guard
 	case conntype.ICEP2P, conntype.ICETurn:
-		conn.workerICE.Close()
+		if conn.workerICE != nil {
+			conn.workerICE.Close()
+		}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/internal/peer/conn.go` around lines 620 - 629, The ICE branch in the
switch on conn.currentConnPriority calls conn.workerICE.Close() without a
nil-check; make it consistent with other uses by guarding before calling Close
(e.g., if conn.workerICE != nil { conn.workerICE.Close() } else {
conn.Log.Debugf(...) }) so that when priority is ICEP2P or ICETurn we only call
Close on an existing workerICE; keep the Relay branch behavior
(workerRelay.Close() and conn.handleRelayDisconnectedLocked()) unchanged.

738-746: Nit: rename iceStatusConnecting — it covers Connected too.

iceStatusConnecting is populated as conn.statusICE.Get() != worker.StatusDisconnected, so it is true in both connecting and connected states. The name reads as "only connecting", which is misleading for readers of evalConnStatus (where it's OR'd with iceInProgress to form iceUp). Consider iceStatusActive or iceNotDisconnected to match the semantics. No functional change.

♻️ Rename suggestion
-	return evalConnStatus(connStatusInputs{
-		forceRelay:          IsForceRelayed(),
-		peerUsesRelay:       conn.workerRelay.IsRelayConnectionSupportedWithPeer(),
-		relayConnected:      conn.statusRelay.Get() == worker.StatusConnected,
-		remoteSupportsICE:   conn.handshaker.RemoteICESupported(),
-		iceWorkerCreated:    iceWorkerCreated,
-		iceStatusConnecting: conn.statusICE.Get() != worker.StatusDisconnected,
-		iceInProgress:       iceInProgress,
-	})
+	return evalConnStatus(connStatusInputs{
+		forceRelay:        IsForceRelayed(),
+		peerUsesRelay:     conn.workerRelay.IsRelayConnectionSupportedWithPeer(),
+		relayConnected:    conn.statusRelay.Get() == worker.StatusConnected,
+		remoteSupportsICE: conn.handshaker.RemoteICESupported(),
+		iceWorkerCreated:  iceWorkerCreated,
+		iceStatusActive:   conn.statusICE.Get() != worker.StatusDisconnected,
+		iceInProgress:     iceInProgress,
+	})

(Also update connStatusInputs, evalConnStatus, and the test file accordingly.)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/internal/peer/conn.go` around lines 738 - 746, Rename the misleading
field name iceStatusConnecting in connStatusInputs and its usage sites
(including the call site in conn.go and the evalConnStatus function) to
something that reflects it is true for both connecting and connected states,
e.g., iceStatusActive or iceNotDisconnected; update the struct definition of
connStatusInputs, the parameter name in evalConnStatus, all references in
conn.go (the call that constructs connStatusInputs), and any tests that assert
on this field so they use the new name—no behavior changes, only a consistent
rename across connStatusInputs, evalConnStatus, and tests.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@client/internal/peer/conn.go`:
- Around line 620-629: The ICE branch in the switch on conn.currentConnPriority
calls conn.workerICE.Close() without a nil-check; make it consistent with other
uses by guarding before calling Close (e.g., if conn.workerICE != nil {
conn.workerICE.Close() } else { conn.Log.Debugf(...) }) so that when priority is
ICEP2P or ICETurn we only call Close on an existing workerICE; keep the Relay
branch behavior (workerRelay.Close() and conn.handleRelayDisconnectedLocked())
unchanged.
- Around line 738-746: Rename the misleading field name iceStatusConnecting in
connStatusInputs and its usage sites (including the call site in conn.go and the
evalConnStatus function) to something that reflects it is true for both
connecting and connected states, e.g., iceStatusActive or iceNotDisconnected;
update the struct definition of connStatusInputs, the parameter name in
evalConnStatus, all references in conn.go (the call that constructs
connStatusInputs), and any tests that assert on this field so they use the new
name—no behavior changes, only a consistent rename across connStatusInputs,
evalConnStatus, and tests.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 5a4bb3d8-1696-49dd-8e63-ce610260d2ba

📥 Commits

Reviewing files that changed from the base of the PR and between d2c18fd and 3a52a3a.

📒 Files selected for processing (6)
  • client/internal/peer/conn.go
  • client/internal/peer/conn_status.go
  • client/internal/peer/conn_status_eval_test.go
  • client/internal/peer/guard/guard.go
  • client/internal/peer/guard/ice_retry_state.go
  • client/internal/peer/guard/ice_retry_state_test.go
✅ Files skipped from review due to trivial changes (1)
  • client/internal/peer/conn_status.go

@pappz pappz requested a review from lixmal April 20, 2026 11:22
lixmal
lixmal previously approved these changes Apr 20, 2026
Copy link
Copy Markdown
Collaborator

@mlsmaycon mlsmaycon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pending title and description update

@pappz pappz changed the title [client] Supress ICE signaling in case of force relay setting [client] Supress ICE signaling Apr 21, 2026
@pappz
Copy link
Copy Markdown
Collaborator Author

pappz commented Apr 21, 2026

pending title and description update

Done

mlsmaycon
mlsmaycon previously approved these changes Apr 21, 2026
@pappz pappz dismissed stale reviews from mlsmaycon and lixmal via 8b76426 April 21, 2026 13:01
@sonarqubecloud
Copy link
Copy Markdown

@pappz pappz requested a review from mlsmaycon April 21, 2026 13:17
@pappz pappz requested a review from lixmal April 21, 2026 13:17
@pappz pappz merged commit 5a89e66 into main Apr 21, 2026
44 of 45 checks passed
@pappz pappz deleted the refactor/force-relay branch April 21, 2026 13:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants