[client, management] Phase 1+2+3 of #5989: ConnectionMode foundation, two-timer lifecycle, iceBackoff (stack 1/4) by MichaelUray · Pull Request #6081 · netbirdio/netbird

MichaelUray · 2026-05-05T22:40:28Z

[client+management] Phase 1+2+3 of #5989: ConnectionMode foundation, two-timer lifecycle, iceBackoff

Branch: pr/a-p2p-dynamic-foundation → base main
Compare: https://github.com/netbirdio/netbird/compare/main...MichaelUray:netbird:pr/a-p2p-dynamic-foundation?expand=1

Summary

This PR is the foundation for the work proposed in issue #5989 ("Consolidate connection-mode flags; add p2p-dynamic and p2p-dynamic-lazy modes"). It introduces a single first-class ConnectionMode enum on the wire and across Settings, replaces the asymmetric LazyConnectionEnabled / IsForceRelayed precedence with that enum, adds the per-peer two-timer lifecycle (P2pTimeoutSeconds for ICE-inactivity detach, RelayTimeoutSeconds for full close), and lands the per-peer iceBackoffState so failed pair-checks don't loop forever (P2pRetryMaxSeconds controls the cap).

It supersedes the older PR #6047 (which only landed Phase 1 and is no longer mergeable).

Why

NetBird currently has two flags fighting for the same job:

LazyConnectionEnabled (account-wide bool) — eager vs. lazy initial connect.
IsForceRelayed (per-peer hint) — skip ICE entirely.

A user who wants "all my peers should start lazy, but on demand try P2P, and if P2P never comes up, fall back to relay" can't express that with two booleans. The proposed ConnectionMode is one of p2p, p2p-lazy, p2p-dynamic, p2p-dynamic-lazy, relay-forced. Each mode unambiguously selects one combination of (eager-vs-lazy, skip-ICE-yes-no, ICE-fallback-to-relay-yes-no).

The p2p-dynamic variants additionally need two timers (one for ICE-only inactivity, one for full conn close) and an exponential ICE-failure backoff so a chronically-broken NAT path can't pin a peer to relay-with-stalled-retries forever.

What's in this PR

Phase 1 — Wire + Settings + CLI plumbing

New proto field PeerConfig.ConnectionMode (enum) + P2pTimeoutSeconds, RelayTimeoutSeconds.
New OpenAPI AccountSettings.connection_mode + the two timeouts; mgmt PUT handler accepts and audit-logs them.
Client: --connection-mode, --p2p-timeout, --relay-timeout CLI flags + env-var bridge with deprecation warnings for the old flags.
connectionmode package with Mode type, proto bridge, ResolveModeFromEnv.

Phase 2 — Lifecycle mechanism

Conn.AttachICE() / DetachICE() — register/remove the ICE listener on the handshaker on demand instead of at Open().
client/peer/conn.Open() defers ICE-listener registration in p2p-dynamic; the ICE worker is created but the dispatch is held back until the activity listener fires.
Two-timer client/lazyconn/inactivity (per-peer ICE-inactivity vs. relay-timeout, separate channels).
conn_mgr.ActivatePeer/DeactivatePeer per-mode plumbing for ICE attach/detach.

Phase 3 — Backoff

client/peer.iceBackoffState truncated exponential schedule; markFailure, markSuccess, Reset API.
Conn.AttachICE no-ops while suspended; pion ICE state changes drive markFailure/markSuccess.
conn_mgr resolves P2pRetryMaxSeconds from server-pushed PeerConfig and propagates updates to active conns.
netbird status -d shows the current backoff state (Failures, NextRetry, Suspended) per peer; line is suppressed when nextRetry has passed.

Tests

go test ./client/internal/peer ./client/internal/peer/guard ./client/internal/lazyconn/... — pass.
go build ./client/... ./management/... — pass on linux/amd64, linux/arm64, windows/amd64.
Hardware-validated end-to-end on a 4-peer testbed (Windows 11 daemon + 3 OpenWrt routers) running the full p2p-dynamic lifecycle (cold-start → P2P → idle-detach to Relay → idle-close to "Idle" → wake → P2P recovery). The follow-up PRs (B, C, D) build on this foundation and the full lifecycle test goes through them.

Test plan

Unit tests for iceBackoffState (initial, exponential, reset, suspended-expired, max-cap, disabled-when-zero, grace-after-reset, no-grace-without-reset).
mgmt/grpc tests cover toPeerConfig mode resolution paths.
All proto regenerated cleanly (bash client/proto/generate.sh, bash shared/management/proto/generate.sh).
Maintainer review of the proto schema for backwards compatibility on legacy clients (no behavioural change for unset / unknown enum values; existing LazyConnectionEnabled still honoured by older daemons).

Use case

Real-world deployment: a fleet of OpenWrt routers + a small number of always-on Linux/Windows boxes. For roaming/laptop peers we want strict relay (relay-forced) for predictable latency; for the routers we want p2p-dynamic so the daemon attempts P2P on demand but doesn't hold an ICE pair when the link is quiet. With the current two-flag approach this configuration is impossible from the dashboard.

Linked work

Closes part of Consolidate connection-mode flags; add p2p-dynamic and p2p-dynamic-lazy modes #5989.
Supersedes Phase 1: connection-mode enum + backwards-compat (issue #5989) #6047 (Phase 1 only, no longer mergeable due to drift).
Independent of [client] Fix NetworkAddresses discovery on Android and re-sync on network change #5807 / [management] Handle missing NetworkAddresses in peer network range posture check #5806 / [client] expose LazyConnectionEnabled in Android Preferences bridge #5994 (Android NetworkAddresses, posture-check, lazy-toggle UI — those remain stand-alone).

What's not here (follow-up PRs)

PR-B: Phase 3.5 + 3.7d-h network-change handling, signal-trigger re-attach, ICE-grace-window for stale NAT mappings, GUI mode tab.
PR-C: Phase 3.7i peer-status visibility (per-peer remote meta, conn-state pusher with adaptive heartbeat, Peers tab in client UI).
PR-D: Activity-trigger fast-path (relay-state activity → re-attach ICE), Guard activity-driven retry-budget reset, hardening (legacy-fallback, Codex review fixes), Windows ICE filter case-insensitive (subset, branched separately as PR-Q2), keepWgPeer routed-subnet fix.

The PRs are stacked: B's base = A's branch; C's base = B; D's base = C. Reviewing A first lets the foundation land standalone.

Maintainers are welcome to push directly to this branch.

Summary by CodeRabbit

New Features
- Configurable peer connection modes (relay, p2p, p2p-lazy, p2p-dynamic), account/client timeout and retry settings, new CLI flags, per-peer ICE attach/detach and exponential backoff; status now reflects ICE backoff state.
Documentation
- OpenAPI and HTTP API responses updated to expose new experimental connection-mode and timeout fields.
Tests
- Extensive unit tests for mode resolution, inactivity managers, ICE lifecycle/backoff, and related flows.
Chores
- Audit events added for connection-mode and timeout setting changes.

Documentation

Documentation is not needed

These changes are internal lifecycle / behavioural improvements; no user-visible API or CLI flag added that warrants new public docs. Existing flags/Settings already documented at netbirdio/docs cover the surface area.

…nfig Additive change for issue netbirdio#5989 Phase 1. New fields use new tag numbers (11, 12, 13); existing fields (including LazyConnectionEnabled tag 6) are unchanged so old clients ignore the additions and old servers send UNSPECIFIED, which the new client maps back via the legacy boolean. Note: the regenerated pb.go files now report protoc v5.29.3 in their header (this branch was generated with locally-installed protoc 29.3 instead of upstream's v7.34.1). Functionally identical; header diff is the only delta beyond the actual schema additions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Defines Mode enum (relay-forced, p2p, p2p-lazy, p2p-dynamic plus the client-only sentinels Unspecified and FollowServer), ParseString for CLI/env input, ToProto/FromProto for wire translation, and the two backwards-compat helpers ResolveLegacyLazyBool / ToLazyConnectionEnabled that bridge the old Settings.LazyConnectionEnabled boolean. Phase 1 of issue netbirdio#5989. Pure addition -- no existing callers touched in this commit; the engine/conn_mgr migration follows in subsequent commits in the same PR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…on warns NB_CONNECTION_MODE wins over the legacy pair (NB_FORCE_RELAY, NB_ENABLE_EXPERIMENTAL_LAZY_CONN); when the legacy pair is set together, NB_FORCE_RELAY wins (most-restrictive, mirrors the group-conflict rule from issue netbirdio#5990). Each legacy var emits a one-shot deprecation warning when it actually contributes to the resolved mode. NB_LAZY_CONN_INACTIVITY_THRESHOLD becomes an alias for the future relay_timeout setting and warns once. IsForceRelayed() is kept for callers that have not yet been migrated (conn.go, statusrecorder); they will be updated in the engine/conn refactor commits later in this PR. Phase 1 of issue netbirdio#5989. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Three new CLI flags map onto the new connection-mode plumbing: - --connection-mode <relay-forced|p2p|p2p-lazy|p2p-dynamic|follow-server> - --relay-timeout <seconds> - --p2p-timeout <seconds> Plumbed through three sites in cmd/up.go (SetConfigRequest, ConfigInput, LoginRequest), persisted in profilemanager.Config, and added as new fields on the daemon.proto IPC messages. Empty / not-changed flags fall back to the server-pushed value (which itself falls back to the legacy lazy_connection_enabled boolean for old servers). Phase 1 of issue netbirdio#5989. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

EngineConfig gains ConnectionMode, RelayTimeoutSeconds, P2pTimeoutSeconds. ConnMgr now stores the resolved Mode plus the raw inputs (env, config) so it can re-resolve when the server pushes a new PeerConfig. UpdatedRemoteFeatureFlag is renamed to UpdatedRemotePeerConfig and takes the full PeerConfig pointer; a thin shim with the old name delegates to it for callers that haven't been updated yet. connect.go copies the three new fields from profilemanager.Config into the EngineConfig builder, with a tolerant parser that logs and falls through to Unspecified on invalid input. Phase 1 of issue netbirdio#5989. peer/conn.go forwarding follows in C4. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

ConnConfig gains a Mode field forwarded from the engine. Open() now checks Mode == ModeRelayForced instead of calling the global env-reader IsForceRelayed(). The local 'forceRelay' variable name is renamed to 'skipICE' to make the new branching intent explicit. The PeerStateUpdate block at the end of Open() also reads from conn.config.Mode now, so the StatusRecorder sees the per-peer mode rather than the global env var. A single remaining caller of IsForceRelayed() (srWatcher.Start in engine.go) is left for a follow-up; that path uses a process-wide flag not per-peer state, so it can be migrated in Phase 2 once srWatcher itself learns about ConnectionMode. Phase 1 of issue netbirdio#5989. Engine forwarding (C5) follows. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

createPeerConn now reads ConnMgr.Mode() and copies it into peer.ConnConfig, so the per-peer Open() loop in conn.go can take the ModeRelayForced skip-ICE branch without reading the global env var. This is the last wiring commit for the client side of Phase 1; the server-side mgmt changes (Settings + OpenAPI + handler + audit + NetworkMap-build) follow in Section D. Phase 1 of issue netbirdio#5989. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

All three fields are nullable to distinguish 'use built-in default' (NULL) from explicit values (incl. 0 = never tear down). Copy() now deep-clones the new pointer fields via two small helpers. GORM AutoMigrate creates the new columns at first start; existing accounts have NULL in all three columns and resolve via the legacy LazyConnectionEnabled boolean. Phase 1 of issue netbirdio#5989. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…tings Three new optional, nullable fields with descriptions of the NULL = built-in-default semantics and the Phase-1-vs-Phase-2 status of p2p-dynamic. Regenerated types.gen.go via the existing oapi-codegen tooling. The generated AccountSettingsConnectionMode enum has the canonical values relay-forced / p2p / p2p-lazy / p2p-dynamic, plus a Valid() helper for handler-side validation. Phase 1 of issue netbirdio#5989. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

PUT /api/accounts/{id} now accepts connection_mode (validated against the four-value enum via the generated AccountSettingsConnectionMode. Valid()), p2p_timeout_seconds and relay_timeout_seconds. NULL in the JSON body keeps the existing value untouched (= "no client-side override on this round-trip"); explicit NULL-clear via API uses a distinct PATCH-style call which is out-of-scope for Phase 1. Response payload mirrors the input fields back as nullable so the dashboard can distinguish "use default" from "explicit value". Phase 1 of issue netbirdio#5989. Audit-event emission follows in D5. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

AccountConnectionModeChanged (121), AccountRelayTimeoutChanged (122), AccountP2pTimeoutChanged (123) -- emitted from account.go when settings change. Per-peer / per-group event codes are reserved for Phase 3 (issue netbirdio#5990). Phase 1 of issue netbirdio#5989. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

handleConnectionModeSettings is invoked from the same diff-detection block as handleLazyConnectionSettings; emits one StoreEvent per changed field (ConnectionMode, RelayTimeoutSeconds, P2pTimeoutSeconds) with old/new values in the meta payload. Four small ptr-equality / deref helpers are added for nullable string and uint32 fields. They are package-private and named after the existing convention used elsewhere in the package. Phase 1 of issue netbirdio#5989. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…elds Two changes in one commit because they're inseparable: 1. Move client/internal/peer/connectionmode/ to shared/connectionmode/. The package now needs to be importable from BOTH client/ and management/ (which is impossible while it lives under client/internal/ per Go's internal-package rule). All imports updated; tests pass on both sides. 2. Extend management/internals/shared/grpc/conversion.go::toPeerConfig to populate the three new PeerConfig fields (ConnectionMode, P2PTimeoutSeconds, RelayTimeoutSeconds) using the connectionmode helpers. The legacy LazyConnectionEnabled boolean is now derived from the resolved Mode via ToLazyConnectionEnabled() rather than copied verbatim from Settings -- this is the central backwards-compat contract: old clients see only the boolean, new clients prefer the explicit enum and ignore the bool. Resolution rules (Phase 1, account-wide only): - Settings.ConnectionMode != nil and parses -> wins - Otherwise -> ResolveLegacyLazyBool(LazyConnectionEnabled) - timeouts: Settings.RelayTimeoutSeconds / P2pTimeoutSeconds when non-NULL, else 0 (= server has no preference; client uses built-in default) Per-peer / per-group resolution comes in Phase 3 (netbirdio#5990). Phase 1 of issue netbirdio#5989. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Nine sub-cases cover the Phase-1 resolution matrix from spec section 3: - no settings -> default (P2P + lazy=false) - legacy bool only -> mapped via ResolveLegacyLazyBool - explicit ConnectionMode -> wins over the legacy bool - timeouts propagate - garbage ConnectionMode value -> tolerant fallback to legacy bool Particular attention to the structural compat gap: relay-forced cannot be expressed via the legacy boolean, so the wire field for old clients is sent as false. Documented in the spec, asserted here. Existing TestAccount_GetPeerNetworkMap remains green: existing test peers have ConnectionMode=NULL in Settings, falls through to the legacy ResolveLegacyLazyBool(false) -> ModeP2P -> wire bool false. Phase 1 of issue netbirdio#5989. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Phase 2 of netbirdio#5989 needs to dynamically detach the ICE listener at runtime (when p2p-dynamic mode hits its ICE-inactivity threshold). Adds the RemoveICEListener method and extends the Add/RemoveICEListener pair to hold h.mu; Listen() now reads h.iceListener through readICEListener under the same mutex, fixing a latent race that didn't matter while AddICEListener was the only writer. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Phase 2 of netbirdio#5989 needs the daemon to fire ICE-teardown and relay- teardown at different idle thresholds. Adds NewManagerWithTwoTimers plus iceInactiveChan / relayInactiveChan; the periodic check loop fires each channel when its threshold elapses. Threshold == 0 disables that channel (peer.lazy: iceTimeout=0; peer.dynamic: both>0; eager modes don't register peers at all). NewManager (Phase-1 entry point) becomes a thin wrapper that delegates to newManager with iceTimeout=0, preserving p2p-lazy semantics exactly. The Phase-1 InactivePeersChan now aliases the new RelayInactiveChan so existing callers (engine.go p2p-lazy path) continue to work unchanged. Five sub-tests cover the timeout matrix: - TwoTimers_OnlyICEFires (peer idle between thresholds) - TwoTimers_BothFire (peer idle past both) - TwoTimers_ICEDisabled (iceTimeout=0) - TwoTimers_RelayDisabled (relayTimeout=0) - TwoTimers_BothDisabled (both 0 = manager inert) Plus Phase1_LazyEquivalence proves NewManager-call-site behaviour is unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Two-tier timeouts for Phase-2 p2p-dynamic. The deprecated InactivityThreshold pointer field becomes a backwards-compat alias that maps onto RelayInactivityThreshold when set, so Phase-1 callers that still pass it (engine.go, p2p-lazy tests) keep working without edits. resolvedTimeouts() helper centralises the alias logic. NewManager picks the appropriate inactivity-manager constructor based on the resolved timeouts; legacy callers continue through the Phase-1 single-timer code path. InactivityManager() getter exposes the inner manager so the conn_mgr can subscribe to ICEInactiveChan / RelayInactiveChan in the p2p-dynamic dispatch loop. Phase 2 of issue netbirdio#5989. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

AttachICE registers the ICE-offer listener on the handshaker after the activity-detector observes traffic; emits a fresh offer so the remote side learns we are now ICE-capable. DetachICE removes the listener and calls workerICE.Close() so the ICE worker tears down without affecting the relay tunnel. Both methods are idempotent and use Handshaker.readICEListener() under its mutex, avoiding the race with Handshaker.Listen() that the prior direct field read had. Used in Phase 2 by the inactivity manager to flip ICE on/off independently of the relay tunnel. Refs netbirdio#5989 (Phase 2 / connection-mode-phase2).

Phase 2 of netbirdio#5989. ModeP2PDynamic now takes a third branch in Open(): workerICE is constructed eagerly (so the activity-trigger does not have to wait for ICE setup) but the AddICEListener call is skipped. The handshaker still has workerICE wired in via NewHandshaker, so buildOfferAnswer can include local ICE credentials when the listener is later attached by Conn.AttachICE() on activity-trigger. Behavior matrix in Open(): - ModeRelayForced: no workerICE, no ICE listener. - ModeP2P / ModeP2PLazy: workerICE created, listener registered eagerly (Phase-1 lazy-tunnel defer remains at conn_mgr level). - ModeP2PDynamic: workerICE created, listener deferred to AttachICE. Refs netbirdio#5989 (Phase 2).

Phase 2 of netbirdio#5989. Fixes the lazy/eager mismatch where a remote peer's GO_IDLE was silently dropped whenever the local manager was not in lazy mode, leaving the eager local end to immediately reconnect and defeating the remote's lazy/dynamic intent. DeactivatePeer now dispatches by locally-resolved connection mode: p2p-lazy -> existing lazy-manager teardown (whole tunnel) p2p-dynamic-> DetachICEForPeer (ICE only, relay tunnel stays up) p2p / relay-forced / unspecified -> silent no-op (eager modes deliberately ignore GO_IDLE because they are always-on) DetachICEForPeer is also the entry point used by the inactivity manager when its iceTimeout fires (engine wiring follows in E1). Lookup miss is treated as a no-op: a peer can be removed concurrently with a GO_IDLE or inactivity-timer event. Refs netbirdio#5989 (Phase 2).

Phase 2 of netbirdio#5989. ConnMgr.Start now also brings up the lazy/dynamic manager when mode==ModeP2PDynamic. A new runDynamicInactivityLoop goroutine reads from inactivity.Manager.ICEInactiveChan() and RelayInactiveChan() and dispatches DetachICEForPeer or full Conn.Close per peer. initLazyManager populates Config.RelayInactivityThreshold from e.relayTimeoutSecs and Config.ICEInactivityThreshold from e.p2pTimeoutSecs (only in dynamic mode; in lazy mode iceTimeout stays 0 so the ICE channel never fires, preserving Phase-1 semantics exactly). UpdatedRemotePeerConfig generalized to handle the dynamic mode the same way it handled lazy: enable the manager on entry, tear down on exit, restart on lazy<->dynamic transitions so the new timeouts take effect. Refs netbirdio#5989 (Phase 2).

Phase-2 follow-up. The dynamic-mode iceTimeout was always 0 in the field because resolveConnectionMode only resolved relay-timeout, not p2p-timeout, and the manager-init only consulted e.p2pTimeoutSecs which never got updated from server pushes. Resolution chain for P2pTimeoutSeconds is now symmetric to relay: cfgP2pTimeout (client config) > serverPC.GetP2PTimeoutSeconds() (no env var; reserved for Phase 3). UpdatedRemotePeerConfig now also stores the resolved p2p-timeout and includes it in the early-return short-circuit so changes to the server-pushed value trigger a re-init. Refs netbirdio#5989 (Phase 2).

Phase 2 of netbirdio#5989. Closes the activity-attach loop: ActivatePeer is called by the engine when a remote peer signals it is active. In p2p-lazy this opens the relay tunnel; in p2p-dynamic the relay was already eager-opened, so we additionally call Conn.AttachICE() to register the deferred ICE listener and emit a fresh offer. Without this, p2p-dynamic peers would never upgrade from relay to ICE. Refs netbirdio#5989 (Phase 2).

…5989) Phase 3 introduces per-peer ICE-failure backoff. Server-pushed cap on the truncated-exponential schedule lives in this new field 14. Wire-format sentinels: 0 -> "not set", daemon falls back to built-in 15 min default max -> user explicitly disabled backoff (Phase-2 behavior) N -> N seconds as the MaxInterval of cenkalti/backoff Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Phase 3 of netbirdio#5989. Account-wide cap on the ICE-failure backoff schedule. Nullable: NULL = use daemon default (15 min), 0 = treated as "user-explicit disable" by the conversion layer (signaled via uint32-max sentinel on the wire). Plus deep-copy. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Phase 3 of netbirdio#5989. New field documented in the public API. Defaults to 900 seconds (15 min). 0 disables backoff. Regenerated types.gen.go via oapi-codegen. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Phase 3 of netbirdio#5989. The PUT handler now persists the new account-wide ICE-backoff cap, and the GET response surfaces it back. NULL stays NULL (= "use daemon default"). Mirrors the Phase-1+2 plumbing of relay_timeout_seconds and p2p_timeout_seconds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Phase 3 of netbirdio#5989. New activity-code 124 (AccountP2pRetryMaxChanged) plus the StoreEvent call inside handleConnectionModeSettings, mirroring the Phase-1 Relay-timeout and Phase-2 P2P-timeout patterns. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…apping Phase 3 of netbirdio#5989. The conversion layer translates between DB-NULL (= "use daemon default") and Settings-zero (= "user-explicit disable") using a uint32-max sentinel on the wire. Test matrix covers all three states: NULL, 0, and a normal positive value. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Phase 3 of netbirdio#5989. Per-peer ICE-failure backoff using cenkalti/backoff v4 (already imported by Guard). InitialInterval 1m, Multiplier 2.0, RandomizationFactor 0.1 -- produces ~1m, ~2m, ~4m, ~8m, ~15m capped sequence with default cap. Public API: newIceBackoff, markFailure (returns delay for logging), markSuccess, Reset (alias), IsSuspended, Snapshot (read-only view for status output), SetMaxBackoff (live update). maxBackoff=0 disables the backoff entirely (Phase-2 behavior). Test matrix covers initial state, exponential growth, cap override, disabled mode, success-reset, hard-reset, suspended-expiry, and live SetMaxBackoff. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Phase 3 of netbirdio#5989. New ConnConfig.P2pRetryMaxSeconds field plus an iceBackoff field on Conn. Open() initializes the backoff state with the resolved cap; if cap is 0, uses DefaultP2PRetryMax (15 min). The wire-format sentinel uint32-max for "user-explicit disable" is translated to cap=0 by the resolver in engine.go (next commit), so this code only sees positive caps OR 0-meaning-default. Refs netbirdio#5989 (Phase 3).

Phase 3 of netbirdio#5989. The Phase-2 AttachICE primitive now consults the iceBackoffState before doing anything. When suspended (= recent ICE failure within the backoff window), AttachICE silently returns nil and stays on relay. When the backoff expires, AttachICE proceeds with the normal listener-attach + offer-send. This is the actual bug-fix for the reset-loop where each iCEConn- Disconnected event re-enabled the 3-quick-retries budget. The backoff is consulted INSIDE AttachICE, so any path that triggers AttachICE (activity, signal-reconnect, candidate-change) goes through the gate. Refs netbirdio#5989 (Phase 3).

Phase 3 of netbirdio#5989. ConnectionStateFailed -> onICEFailed (markFailure + DetachICE), ConnectionStateConnected/Completed -> onICEConnected (markSuccess). Closes the loop: actual ICE outcomes now drive the backoff state machine. This is what makes the AttachICE-gate from the previous commit actually engage. Without these hooks, the backoff state would never transition. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Phase 3 of netbirdio#5989. Extends resolveConnectionMode to a 4-tuple return (mode, relay-timeout, p2p-timeout, p2p-retry-max). Resolution chain analogous to relay/p2p timeouts: client-config wins over server-push. ConnMgr struct stores both the resolved value (p2pRetryMaxSecs) and the raw client-config input (cfgP2pRetryMax) so subsequent server pushes can be re-resolved correctly. UpdatedRemotePeerConfig early-return shortcut now also checks p2p-retry-max for changes. Refs netbirdio#5989 (Phase 3). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Phase 3 of netbirdio#5989. New propagateP2pRetryMaxToConns() iterates all active peers and calls Conn.SetIceBackoffMax with the freshly resolved value. Translates the uint32-max sentinel ("user-explicit disable") to time.Duration(0) for the daemon-side semantics. NULL on server (wire 0) maps to peer.DefaultP2PRetryMax (15 min built-in default). Conn.SetIceBackoffMax updates the iceBackoffState live or, if the Conn is not yet opened, stashes the value in config for Open() to pick up. IceBackoffSnapshot exposes a read-only view for the status output (implemented in Section E). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Phase 3 of netbirdio#5989. Local override hierarchy mirrors p2p-timeout: config.json profile field > CLI flag > server-pushed. New entry points for the account-wide ICE-failure backoff cap: - CLI: netbird up --p2p-retry-max=600 - Profile (config.json): "p2p_retry_max_seconds": 600 EngineConfig.P2pRetryMaxSeconds wired through connect.go and profilemanager (ConfigInput + Config + updateConfig handler). daemon.proto gains p2p_retry_max_seconds = 43 on LoginRequest and SetConfigRequest; daemon.pb.go regenerated. Removes the D1 placeholder in conn_mgr.go: cfgP2pRetryMax now reads from EngineConfig.P2pRetryMaxSeconds. Env var NB_P2P_RETRY_MAX_SECONDS not added (matches Phase-2 pattern: p2p-timeout has no env-var path either). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Phase 3 of netbirdio#5989. Each new Conn now receives the resolved cap via ConnConfig. Conn.Open() then initializes its iceBackoffState. The sentinel-translation (uint32-max -> 0) happens inside Conn so the engine-level code stays simple. Refs netbirdio#5989 (Phase 3).

Phase 3 (E1) of netbirdio#5989. Full pipeline implemented: - daemon.proto: add iceBackoffFailures/NextRetry/Suspended fields to PeerState (field IDs 20-22) - client/internal/peer/status.go: add IceBackoff* fields to State struct + UpdatePeerIceBackoff() method - client/internal/peer/conn.go: push snapshot to statusRecorder after onICEFailed / onICEConnected - client/status/status.go: wire fields through ToProtoFullStatus(), PeerStateDetailOutput, parsePeers() - parsePeers() appends "ICE backoff: suspended for ..." line only when suspended - client/status/status_test.go: update JSON/YAML expectations for new fields When not suspended, the line is omitted (no noise on healthy peers). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Phase 3 of netbirdio#5989 follow-up. The PeerState.IceBackoffSuspended flag is only refreshed on ICE state-change events (markFailure / markSuccess), so it stays true even after the suspension window has elapsed. The status detail-output now also wall-clock-checks IceBackoffNextRetry before printing the line, avoiding lines like "suspended for -1m45s" between the expiry and the next event-driven snapshot push. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…RetryMaxSeconds proto changes Picked-up commits in this branch added new fields to daemon.proto and management.proto (ConnectionMode enum, P2pTimeoutSeconds, RelayTimeoutSeconds, P2pRetryMaxSeconds, ICE-backoff fields). The generated .pb.go files in the original Phase-3.7i branch were left out-of-date during the cherry-pick; regenerated here so the package builds.

coderabbitai · 2026-05-05T22:40:45Z

📝 Walkthrough

Walkthrough

Adds a connection-mode system with env/CLI/server precedence, new proto/API fields, dual inactivity timers and per-peer ICE backoff, mode-driven conn lifecycle (Attach/Detach ICE), client/server wiring, and extensive tests across components.

Changes

Connection Mode, Timeouts, and ICE Backoff

Layer / File(s)	Summary
Enum & Parsing `shared/connectionmode/mode.go`, `shared/connectionmode/mode_test.go`	New `Mode` type, parsing, proto mappings, legacy converters, and unit tests.
Proto / API Schema `shared/management/proto/management.proto`, `client/proto/daemon.proto`, `shared/management/http/api/openapi.yml`, `shared/management/http/api/types.gen.go`	Added `connection_mode`, `p2p_timeout_seconds`, `relay_timeout_seconds`, `p2p_retry_max_seconds` to PeerConfig/LoginRequest/SetConfigRequest and OpenAPI/types.
Server Settings & Persistence `management/server/types/settings.go`, `management/server/types/settings_test.go`	Settings gains nullable pointer fields for ConnectionMode and timeouts; deep-copy helpers and tests added.
Server Audit & Activity Codes `management/server/account.go`, `management/server/activity/codes.go`	Account manager emits audit events for changes to connection-mode and timeout/retry settings; new activity codes and map entries added.
gRPC Conversion & Server Tests `management/internals/shared/grpc/conversion.go`, `management/internals/shared/grpc/conversion_test.go`	toPeerConfig resolves ConnectionMode (settings override legacy lazy flag), computes timeouts and P2P retry sentinel, maps into wire PeerConfig; tests added.
HTTP Handlers & Tests `management/server/http/handlers/accounts/accounts_handler.go`, `management/server/http/handlers/accounts/accounts_handler_test.go`	API input validation for ConnectionMode; response includes new fields; PUT settings test for p2p_retry_max_seconds added.
CLI Flags & Request Wiring `client/cmd/root.go`, `client/cmd/up.go`	New CLI flags: `--connection-mode`, `--relay-timeout`, `--p2p-timeout`, `--p2p-retry-max`; flags populate LoginRequest, SetConfigRequest, and local ConfigInput when set.
Profile & Engine Wiring `client/internal/profilemanager/config.go`, `client/internal/connect.go`, `client/internal/engine.go`	Profile Config/ConfigInput accept new fields; EngineConfig parses connection mode and carries relay/p2p/retry values; engine populates `peer.ConnConfig.Mode` and `P2pRetryMaxSeconds`.
ConnMgr Resolution & Orchestration `client/internal/conn_mgr.go`, `client/internal/conn_mgr_test.go`	ConnMgr stores env/cfg/server inputs, resolves mode/timeouts (env > client > server), exposes Mode/RelayTimeout/P2pRetryMax accessors, starts/restarts lazy/dynamic inactivity manager on changes, propagates P2P retry max to peers; tests validate precedence and actions.
LazyConn Inactivity Manager `client/internal/lazyconn/inactivity/manager.go`, `client/internal/lazyconn/inactivity/manager_test.go`, `client/internal/lazyconn/manager/manager.go`, `client/internal/lazyconn/env.go`	Introduces two-timer inactivity manager (ICE vs Relay) with separate channels and Phase-1 compatibility alias; Manager selects single- or two-timer mode; tests for combinations added; deprecation docs for legacy env.
Peer ICE Backoff & Lifecycle `client/internal/peer/ice_backoff.go`, `client/internal/peer/ice_backoff_test.go`, `client/internal/peer/conn.go`, `client/internal/peer/conn_test.go`, `client/internal/peer/worker_ice.go`	Per-peer exponential ICE backoff state (markFailure/markSuccess/SetMaxBackoff/Snapshot), ConnConfig adds Mode and P2pRetryMaxSeconds, Open initializes/updates backoff, mode-driven skip/defer ICE logic, `AttachICE`/`DetachICE` added, worker calls on success/failure to update backoff; tests added.
Handshaker Thread-Safety `client/internal/peer/handshaker.go`, `client/internal/peer/handshaker_test.go`	Guard ICE listener with mutex, add `RemoveICEListener`, `readICEListener`; Listen uses safe reads; tests added.
Status & CLI Output `client/internal/peer/status.go`, `client/status/status.go`, `client/status/status_test.go`, `client/server/setconfig_test.go`	Peer state includes IceBackoffFailures/NextRetry/Suspended; `Status.UpdatePeerIceBackoff` updates state; ToProto and human-readable output include new fields; tests and SetConfig expectations updated.
Misc Wiring & Tests `client/internal/connect.go`, `client/internal/conn_mgr_test.go`, `client/internal/lazyconn/*`, `client/internal/peer/env.go`, `client/internal/peer/env_test.go`	Env parsing for NB_CONNECTION_MODE and NB_LAZY_CONNK timeout deprecation warnings; many tests validate env resolution and manager behaviors.

Sequence Diagram

sequenceDiagram
    participant Admin as Admin
    participant API as HTTP API
    participant Server as Management Server
    participant GRPC as gRPC Conversion
    participant Client as NetBird Client
    participant ConnMgr as ConnMgr
    participant Peer as Peer

    Admin->>API: PUT /accounts/{id} with connection_mode/timeouts
    API->>Server: validate & persist settings
    Server->>Server: emit audit events
    Server->>GRPC: toPeerConfig maps settings -> PeerConfig
    GRPC->>Client: push PeerConfig (mode, timeouts, retry_max)

    Client->>Client: ResolveModeFromEnv() (NB_CONNECTION_MODE, legacy flags)
    Client->>ConnMgr: init with env + CLI values
    ConnMgr->>ConnMgr: resolveConnectionMode(env > client > server)
    ConnMgr->>ConnMgr: start/restart lazy manager if mode/timeouts changed
    ConnMgr->>Peer: propagateP2pRetryMaxToConns -> update peer backoff max

    ConnMgr->>Peer: ActivatePeer (P2P dynamic) -> Peer.AttachICE()
    Peer->>Peer: ICE events -> onICEConnected()/onICEFailed() update backoff
    Peer->>ConnMgr: status -> Status.UpdatePeerIceBackoff()

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~70 minutes

Possibly related issues

Consolidate connection-mode flags; add p2p-dynamic and p2p-dynamic-lazy modes #5989 — Implements connection-mode enum, dual thresholds, precedence resolution, and per-peer wiring matching this PR's objectives.
Per-peer and per-group connection-mode and inactivity-threshold override on the management server #5990 — Adds PeerConfig/settings/timeouts shaping server-driven per-peer/group connection-mode overrides referenced by this PR.

Possibly related PRs

netbirdio/netbird#5820 — Overlaps in ICE/relay lifecycle and mode-driven connection logic.
netbirdio/netbird#4969 — Related management-to-peer conversion wiring touching toPeerConfig.
netbirdio/netbird#5828 — Related changes in peer connection reachability and force-relay logic.

Suggested reviewers

pappz
lixmal

🐰 "I hopped through code where modes decide,
relay or P2P, with timers side by side.
Backoff counts and channels sing,
peers attach, detach — a lively spring.
A jittered retry, gentle and bright,
the network hums through day and night."

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

MichaelUray · 2026-05-05T22:41:30Z

Stack overview (PRs of issue #5989):

[client, management] Phase 1+2+3 of #5989: ConnectionMode foundation, two-timer lifecycle, iceBackoff (stack 1/4) #6081 [1/4] ← this PR — foundation (ConnectionMode + two-timer + iceBackoff)
[client] Phase 3.5 + 3.7d-h of #5989: network-change ICE recovery + GUI mode tab (stack 2/4) #6082 [2/4] — network-change handling + GUI mode tab (depends on this)
[client, management] Phase 3.7i of #5989: peer-status visibility — RemotePeerConfig + conn-state pusher + Peers UI (stack 3/4) #6083 [3/4] — peer-status visibility (depends on [client] Phase 3.5 + 3.7d-h of #5989: network-change ICE recovery + GUI mode tab (stack 2/4) #6082)
[client, management] Phase 3.7i hardening + activity-trigger fast-path + Codex-review fixes — closes #5989 (stack 4/4) #6084 [4/4] — hardening + activity-trigger fast-path (depends on [client, management] Phase 3.7i of #5989: peer-status visibility — RemotePeerConfig + conn-state pusher + Peers UI (stack 3/4) #6083)

Maintainer note: each PR's branch is rebased on the previous PR's branch in the fork. Once this lands, the next PR can be rebased on main (mergebase will collapse to just its unique commits).

Plus: separate standalone — #6080 Windows ICE filter case-insensitive (no dependencies, applicable to current main directly).

coderabbitai

Actionable comments posted: 13

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)

management/server/types/settings.go (1)
99-126: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Clone JWTAllowGroups in Copy().

Copy() still shares JWTAllowGroups with the source instance, so mutating the returned settings can mutate the original. This PR already tightens copy semantics for the new nullable fields, so leaving this slice aliased will keep producing surprising state bleed.
Suggested fix
 	settings := &Settings{
 		PeerLoginExpirationEnabled: s.PeerLoginExpirationEnabled,
 		PeerLoginExpiration:        s.PeerLoginExpiration,
 		JWTGroupsEnabled:           s.JWTGroupsEnabled,
 		JWTGroupsClaimName:         s.JWTGroupsClaimName,
 		GroupsPropagationEnabled:   s.GroupsPropagationEnabled,
-		JWTAllowGroups:             s.JWTAllowGroups,
+		JWTAllowGroups:             slices.Clone(s.JWTAllowGroups),
 		RegularUsersViewBlocked:    s.RegularUsersViewBlocked,
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@management/server/types/settings.go` around lines 99 - 126, Settings.Copy
currently returns a Settings that shares the JWTAllowGroups slice with the
source, causing mutations to bleed; update the Copy method (function
Settings.Copy) to clone JWTAllowGroups (e.g., use slices.Clone or an explicit
copy) when constructing the new Settings so the returned Settings has its own
independent slice just like PeerExposeGroups is cloned.
management/server/account.go (1)
332-338: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Push the new connection settings to live peers.

These fields are now saved and audited, but they never set updateAccountPeers. A change to ConnectionMode, RelayTimeoutSeconds, P2pTimeoutSeconds, or P2pRetryMaxSeconds therefore skips IncrementNetworkSerial and the later UpdateAccountPeers() call, so connected peers keep stale connection behavior until some unrelated update or reconnect.
Suggested fix
 		if oldSettings.RoutingPeerDNSResolutionEnabled != newSettings.RoutingPeerDNSResolutionEnabled ||
 			oldSettings.LazyConnectionEnabled != newSettings.LazyConnectionEnabled ||
+			!equalStringPtr(oldSettings.ConnectionMode, newSettings.ConnectionMode) ||
+			!equalUint32Ptr(oldSettings.RelayTimeoutSeconds, newSettings.RelayTimeoutSeconds) ||
+			!equalUint32Ptr(oldSettings.P2pTimeoutSeconds, newSettings.P2pTimeoutSeconds) ||
+			!equalUint32Ptr(oldSettings.P2pRetryMaxSeconds, newSettings.P2pRetryMaxSeconds) ||
 			oldSettings.DNSDomain != newSettings.DNSDomain ||
 			oldSettings.AutoUpdateVersion != newSettings.AutoUpdateVersion ||
 			oldSettings.AutoUpdateAlways != newSettings.AutoUpdateAlways {
 			updateAccountPeers = true
 		}
Also applies to: 372-375
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@management/server/account.go` around lines 332 - 338, The code only sets
updateAccountPeers when a subset of settings change; include comparisons for
ConnectionMode, RelayTimeoutSeconds, P2pTimeoutSeconds, and P2pRetryMaxSeconds
(the newly saved/audited connection settings) in the same conditional that sets
updateAccountPeers so that IncrementNetworkSerial() and UpdateAccountPeers() run
when any of those values change; make the same addition in the other identical
comparison block (the one around where other settings are checked) so both code
paths trigger updateAccountPeers when any connection-related field differs
between oldSettings and newSettings.
client/cmd/up.go (1)
438-453: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Reject contradictory connection-mode overrides before building these payloads.

These builders can currently emit both LazyConnectionEnabled and ConnectionMode, so a command like --enable-lazy-connection=true --connection-mode=follow-server sends mutually inconsistent intent. Also, connectionMode is forwarded as an unchecked string here, so typos can be persisted and later fall through as “unspecified” instead of failing fast.

Please validate connectionMode before serialization and make the legacy lazy flag mutually exclusive with --connection-mode across all three payloads.

Also applies to: 572-583, 695-710
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@client/cmd/up.go` around lines 438 - 453, The code currently allows setting
both LazyConnectionEnabled and ConnectionMode together and forwards
connectionMode as an unchecked string; update the builders that set
req.LazyConnectionEnabled and req.ConnectionMode (the branches using
cmd.Flag(enableLazyConnectionFlag).Changed and
cmd.Flag(connectionModeFlag).Changed and the related payload constructors used
in the other two locations) to first validate connectionMode against the allowed
enum/values and return an error if invalid, and also reject the request early
when both flags are present (if cmd.Flag(enableLazyConnectionFlag).Changed &&
cmd.Flag(connectionModeFlag).Changed) with a clear error message; apply the same
mutually-exclusive check and validation to all three payload-building sites so
the legacy lazy flag cannot be used alongside --connection-mode and typos in
connectionMode fail fast.

🧹 Nitpick comments (7)

client/internal/peer/ice_backoff_test.go (1)
113-122: ⚡ Quick win

Add a transition test for disabling backoff while already suspended.

Current zero-cap coverage starts from a fresh state. Please add a case that does markFailure() → SetMaxBackoff(0) and asserts immediate unsuspension plus cleared retry timestamp.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@client/internal/peer/ice_backoff_test.go` around lines 113 - 122, Add a test
that verifies disabling backoff while already suspended immediately clears
suspension: create s := newIceBackoff(somePositiveMax) then call s.markFailure()
and assert s.IsSuspended() is true, then call s.SetMaxBackoff(0) and assert
s.IsSuspended() is false and the internal retry timestamp (e.g., s.retryAt or
equivalent time field) is cleared/zero (use its IsZero check). Name the test
something like TestIceBackoff_DisableWhileSuspended and use newIceBackoff,
markFailure, SetMaxBackoff, IsSuspended and the retry timestamp field to locate
the logic.
client/internal/lazyconn/inactivity/manager_test.go (1)
141-144: ⚡ Quick win

Restore newTicker after each test override.

These tests mutate a package-global and never put it back. That makes the suite order-dependent and brittle if later tests in this package rely on the real ticker or start running in parallel.
Suggested fix
 	fakeTick := make(chan time.Time, 1)
+	oldTicker := newTicker
 	newTicker = func(d time.Duration) Ticker {
 		return &fakeTickerMock{CChan: fakeTick}
 	}
+	t.Cleanup(func() {
+		newTicker = oldTicker
+	})
Also applies to: 181-184, 224-227, 264-267, 303-306, 341-344
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@client/internal/lazyconn/inactivity/manager_test.go` around lines 141 - 144,
Tests in manager_test.go overwrite the package-global newTicker (e.g., setting
newTicker = func(...) Ticker { return &fakeTickerMock{CChan: fakeTick} }) but
never restore it, making the suite order-dependent; fix by capturing the
original function before each override (old := newTicker) and restoring it after
the test (use defer newTicker = old or t.Cleanup(func(){ newTicker = old }))
wherever newTicker is reassigned (instances around the fakeTickerMock overrides)
so the global is returned to its original value after each test.
management/internals/shared/grpc/conversion_test.go (1)
160-169: 💤 Low value

Reference the named sentinel constant in the assertion.

Line 166 asserts against the literal ^uint32(0); this test is in the same grpc package and can read the unexported p2pRetryMaxDisabledSentinel directly, which keeps the test honest if the sentinel value ever changes (e.g. moves to a different magic number) and makes the intent slightly clearer.
♻️ Proposed fix
-	if pc.P2PRetryMaxSeconds != ^uint32(0) {
-		t.Errorf("explicit 0 should map to uint32-max sentinel on the wire, got %d", pc.P2PRetryMaxSeconds)
+	if pc.P2PRetryMaxSeconds != p2pRetryMaxDisabledSentinel {
+		t.Errorf("explicit 0 should map to disabled-sentinel on the wire, got %d", pc.P2PRetryMaxSeconds)
 	}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@management/internals/shared/grpc/conversion_test.go` around lines 160 - 169,
Update TestToPeerConfig_P2pRetryMax_ExplicitDisable to assert against the
package's sentinel constant instead of the literal magic number: replace the
literal ^uint32(0) in the assertion with p2pRetryMaxDisabledSentinel so the test
uses the named sentinel from the grpc package (keep the rest of the test and the
call to toPeerConfigForTest unchanged).
client/internal/peer/env_test.go (1)
33-34: 💤 Low value

Use the package constants instead of literal env var names.

Lines 33–34 hardcode "NB_ENABLE_EXPERIMENTAL_LAZY_CONN" and "NB_LAZY_CONN_INACTIVITY_THRESHOLD" while the test lives in the same peer package and could reference envEnableLazyConn / envInactivityThreshold directly (you already do that for the exported EnvKeyNBConnectionMode / EnvKeyNBForceRelay two lines above). Keeping a single source of truth avoids silent test drift if a constant is ever renamed.
♻️ Proposed fix
-			t.Setenv("NB_ENABLE_EXPERIMENTAL_LAZY_CONN", c.envEnableLazy)
-			t.Setenv("NB_LAZY_CONN_INACTIVITY_THRESHOLD", c.envInactivity)
+			t.Setenv(envEnableLazyConn, c.envEnableLazy)
+			t.Setenv(envInactivityThreshold, c.envInactivity)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@client/internal/peer/env_test.go` around lines 33 - 34, The test currently
hardcodes the environment variable names "NB_ENABLE_EXPERIMENTAL_LAZY_CONN" and
"NB_LAZY_CONN_INACTIVITY_THRESHOLD"; change t.Setenv calls to use the package
constants envEnableLazyConn and envInactivityThreshold (the same constants used
elsewhere in this package) so the test references a single source of truth and
won't drift if names change.
client/internal/peer/conn.go (1)
1025-1054: 💤 Low value

AttachICE swallows SendOffer failures silently.

When SendOffer returns an error at line 1050, AttachICE only logs a warning and returns nil. The local listener is now attached but the remote peer has not been notified that we're ICE-capable, so until something else triggers a fresh offer (e.g. the guard goroutine), the dynamic-mode upgrade window is wasted. Worth at least propagating the error so callers in conn_mgr can decide whether to retry.
♻️ Proposed fix
 	if err := conn.handshaker.SendOffer(); err != nil {
-		conn.Log.Warnf("AttachICE: SendOffer failed: %v", err)
+		conn.Log.Warnf("AttachICE: SendOffer failed: %v", err)
+		return fmt.Errorf("AttachICE: send offer: %w", err)
 	}
 	return nil
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@client/internal/peer/conn.go` around lines 1025 - 1054, AttachICE currently
adds the ICE listener (Conn.handshaker.AddICEListener with
Conn.workerICE.OnNewOffer) and then only logs SendOffer errors, leaving the
local listener attached while remote peer may never be notified; change
AttachICE so that if Conn.handshaker.SendOffer() returns an error it returns
that error to the caller instead of swallowing it, and ensure you undo the
listener attachment on failure (remove the listener or use a defer that removes
it when SendOffer fails) so state stays consistent; update callers in conn_mgr
to handle the returned error (retry/backoff) as needed.
client/internal/peer/conn_test.go (2)
362-384: 💤 Low value

Consider exposing a test helper instead of reaching into iceBackoff internals.

Lines 370-372 lock and directly mutate iceBackoff.mu and iceBackoff.nextRetry from the test. While same-package access is allowed in Go, this tightly couples the test to the unexported layout of iceBackoffState — any rename or restructuring of those fields silently breaks the test.

A small helper such as (b *iceBackoffState) forceExpiredForTest() or an exported Reset(nextRetry time.Time) test seam would decouple the test from internal state.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@client/internal/peer/conn_test.go` around lines 362 - 384, The test mutates
iceBackoff internals (locking iceBackoff.mu and setting iceBackoff.nextRetry)
which couples it to iceBackoffState layout; add a small test seam on
iceBackoffState (e.g., func (b *iceBackoffState) forceExpiredForTest() or func
(b *iceBackoffState) ResetNextRetry(t time.Time)) that performs the mutexed
update of nextRetry, then change TestConn_AttachICE_AfterBackoffExpiry to call
that helper after c.iceBackoff.markFailure() instead of touching mu/nextRetry
directly; this keeps Conn.AttachICE and newIceBackoff/markFailure unchanged but
removes direct field access from the test.
344-360: 💤 Low value

TestConn_AttachICE_NoOpWhenSuspended is implicitly order-dependent.

The test asserts err == nil with a nil workerICE only because AttachICE currently checks the backoff gate before the workerICE nil check. If that ordering ever changes, the test would either start returning a "nil workerICE" error (masking the suspend path) or panic, with no obvious test failure message pointing to the ordering assumption.

Consider adding an explicit comment documenting this dependency, or restructuring so the suspended path is exercised via a dedicated stub workerICE that records whether it was reached.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@client/internal/peer/conn_test.go` around lines 344 - 360, The test
TestConn_AttachICE_NoOpWhenSuspended relies on AttachICE checking the backoff
gate before validating a nil workerICE — make this explicit and robust by
injecting a stub/fake workerICE into the Conn that will fail the test if its
attach path is invoked; set c.workerICE to that stub before calling
c.AttachICE(), keep c.iceBackoff.markFailure(), assert err == nil, and verify
the stub was not called and handshaker.readICEListener() remains nil. This
replaces the implicit nil-workerICE dependency with an explicit assertion that
the suspended path short-circuits AttachICE (referencing
TestConn_AttachICE_NoOpWhenSuspended, Conn.AttachICE, c.workerICE, and
handshaker.readICEListener()).

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@client/internal/conn_mgr.go`:
- Around line 245-291: In UpdatedRemotePeerConfig: when the connection mode
remains managed (isManaged true) but any of the timeout/retry values change
(compare newRelay/newP2P/newP2pRetry against the previously-stored
e.relayTimeoutSecs/e.p2pTimeoutSecs/e.p2pRetryMaxSecs) you must restart the
running lazy manager so it picks up the new timers; call e.closeManager(ctx),
then re-init with e.initLazyManager(ctx), call e.startModeSideEffects(), and
re-add peers via e.addPeersToLazyConnManager() (also ensure
e.statusRecorder.UpdateLazyConnection(true) is set appropriately). Use the
existing symbols (UpdatedRemotePeerConfig, isManaged, e.lazyConnMgr,
closeManager, initLazyManager, startModeSideEffects, addPeersToLazyConnManager,
propagateP2pRetryMaxToConns) to locate and implement this branch.

In `@client/internal/engine.go`:
- Around line 1585-1586: The Mode and P2pRetryMaxSeconds values are only
snapshotted into peer.Conn at creation, so runtime changes via
ConnMgr.UpdatedRemotePeerConfig() won't affect existing conns; modify the code
so connections are reconfigured when the ConnMgr changes by adding an update
path: introduce or use methods on peer.Conn like SetMode(mode string) and
SetP2pRetryMax(seconds int) (or a single UpdateConfig) and, in the
ConnMgr.UpdatedRemotePeerConfig() handler inside engine (the code that currently
constructs conns with Mode: e.connMgr.Mode(), P2pRetryMaxSeconds:
e.connMgr.P2pRetryMax()), iterate existing e.peers and call those methods to
apply the new mode and retry value (and trigger any Open/AttachICE/DetachICE
transitions as needed) so live switches between p2p-lazy and p2p-dynamic take
effect without recreating connections.

In `@client/internal/lazyconn/manager/manager.go`:
- Around line 102-112: The kernel-bind path currently leaves m.inactivityManager
nil which causes panics when methods like Start, activateSinglePeer,
addActivePeer, DeactivatePeer and removePeer unconditionally dereference it; fix
this by ensuring m.inactivityManager is always a valid implementation: add a
lightweight no-op inactivity manager (e.g., inactivity.NewNoopManager returning
an object that implements the same interface as
inactivity.NewManager/NewManagerWithTwoTimers) and instantiate it in the else
branch where kernel mode is detected (replace the current log-only path), so all
subsequent calls to m.inactivityManager are safe without changing callers;
alternatively, if you prefer refusal, return an error from the constructor
instead of creating the manager (but pick one approach and apply it
consistently).

In `@client/internal/peer/conn.go`:
- Around line 96-101: The uint32 sentinel ^uint32(0) for "user-explicit disable"
is being lost; fix by preserving that sentinel in the config and handling it
explicitly: modify propagateP2pRetryMaxToConns() to pass the sentinel value (not
time.Duration(0)) into SetIceBackoffMax when the wire-value is the sentinel,
change SetIceBackoffMax(conn.config.P2pRetryMaxSeconds) so it stores the
sentinel (^uint32(0)) into conn.config.P2pRetryMaxSeconds instead of 0 when
disable was requested, and update conn.Open() to treat
conn.config.P2pRetryMaxSeconds == ^uint32(0) as "disabled (zero cap)" while
treating the genuine 0 (or other unset marker if you prefer) as "not set → apply
DefaultP2PRetryMax"; ensure all logic paths that previously wrote 0 for disable
now preserve/write the sentinel so the distinction remains.

In `@client/internal/peer/ice_backoff.go`:
- Around line 97-103: When resetting or disabling backoff (e.g., in markSuccess)
you must also clear the suspended retry timestamp so IsSuspended() and snapshots
do not retain a stale NextRetry; update markSuccess (and the corresponding
reset/disable handler in the 115-123 block) to set the internal nextRetry field
to the zero time (reset the suspension window) whenever you set s.suspended =
false and bo.Reset(), ensuring snapshots no longer report a stale NextRetry
after SetMaxBackoff(0) or a success.

In `@management/server/account.go`:
- Around line 459-489: The audit currently uses derefUint32Ptr (and
derefStringPtr for timeouts) which collapses nil and 0 to 0; update
handleConnectionModeSettings so timeout fields emit the raw pointer values
instead of dereferenced zeroed values (e.g., use oldSettings.RelayTimeoutSeconds
and newSettings.RelayTimeoutSeconds,
oldSettings.P2pTimeoutSeconds/newSettings.P2pTimeoutSeconds,
oldSettings.P2pRetryMaxSeconds/newSettings.P2pRetryMaxSeconds) in the StoreEvent
payload so nil stays nil and 0 stays 0; keep the equality checks
(equalUint32Ptr) but change the map entries for the three timeout events to pass
the pointer (or a small helper that returns either nil or uint32) instead of
derefUint32Ptr.

In `@management/server/http/handlers/accounts/accounts_handler.go`:
- Around line 223-240: The handler currently treats JSON null as "absent" so
ConnectionMode, P2pTimeoutSeconds, P2pRetryMaxSeconds and RelayTimeoutSeconds
can never be cleared once set; update the request DTO used by the accounts
update path (the req.Settings type referenced in accounts_handler.go) to
implement tri-state semantics (absent vs null vs value) — e.g. replace plain *T
fields with a custom Nullable[T] struct (or use *json.RawMessage / presence
flag) that records whether the field was provided and whether it was explicitly
null, change the handler logic in the update function to inspect that presence
flag and set
returnSettings.<ConnectionMode|P2pTimeoutSeconds|P2pRetryMaxSeconds|RelayTimeoutSeconds>
to nil when the client sent explicit null, set to the concrete value when
provided, and leave unchanged when absent, and update JSON unmarshalling/tests
accordingly.
- Around line 218-221: The handler currently returns a generic fmt.Errorf for an
invalid req.Settings.ConnectionMode which yields a 500; change this to return a
gRPC InvalidArgument error consistent with other validation failures in this
handler (e.g. use status.Errorf(codes.InvalidArgument, "invalid connection_mode
%q", modeStr) or the project’s util helper for invalid arguments) when
req.Settings.ConnectionMode.Valid() is false, referencing
req.Settings.ConnectionMode and modeStr so the error message includes the bad
value.
- Around line 229-240: The code casts int64 timeout fields from req.Settings
(P2pTimeoutSeconds, P2pRetryMaxSeconds, RelayTimeoutSeconds) directly to uint32
which can wrap on negative or too-large values; validate each non-nil
*req.Settings.* value is within 0..math.MaxUint32 before casting, and if out of
range return a validation error (HTTP 400) instead of assigning to
returnSettings.*; after the check perform v := uint32(*req.Settings.<Field>) and
set returnSettings.<Field> = &v.

In `@shared/connectionmode/mode.go`:
- Around line 19-127: Add a new Mode constant for "p2p-dynamic-lazy" and wire it
through the conversions: define ModeP2PDynamicLazy alongside other Mode
constants, return "p2p-dynamic-lazy" from Mode.String(), accept
"p2p-dynamic-lazy" in ParseString(), map
mgmProto.ConnectionMode_CONNECTION_MODE_P2P_DYNAMIC_LAZY <-> ModeP2PDynamicLazy
in FromProto() and ToProto(), and make ModeP2PDynamicLazy return true from
ToLazyConnectionEnabled() (and ensure ResolveLegacyLazyBool behavior remains
correct). Update only the symbols ModeP2PDynamicLazy, String(), ParseString(),
FromProto(), ToProto(), ToLazyConnectionEnabled() to include this new mode and
keep ModeFollowServer behavior unchanged.

In `@shared/management/http/api/openapi.yml`:
- Line 365: The OpenAPI enum for connection_mode is missing the new value;
update the enum array under the connection_mode schema to include
"p2p-dynamic-lazy" alongside the existing values (relay-forced, p2p, p2p-lazy,
p2p-dynamic) so REST clients can set that mode; ensure you add the exact string
"p2p-dynamic-lazy" to the enum in the openapi YAML and run any spec
validation/generation steps you normally use.
- Around line 370-373: Update the stale documentation text in the OpenAPI
description that mentions "Phase 1 of issue `#5989`" and that "p2p-dynamic is
reserved" so it no longer misleads consumers; locate the block around the
p2p_timeout_seconds schema entry in openapi.yml (the paragraph starting with
"Phase 1 of issue `#5989`: relay-forced, p2p, and p2p-lazy are functional...") and
replace or remove the phase-gating sentence to reflect the current status
(either remove the "p2p-dynamic is reserved" clause or update it to state that
p2p-dynamic is now functional), keeping the rest of the p2p_timeout_seconds
description consistent.

In `@shared/management/http/api/types.gen.go`:
- Around line 41-63: The generated enum AccountSettingsConnectionMode is missing
the new value "p2p-dynamic-lazy": add a new constant named
AccountSettingsConnectionModeP2pDynamicLazy with value "p2p-dynamic-lazy" to the
AccountSettingsConnectionMode block and include it in the
AccountSettingsConnectionMode.Valid() switch so the method returns true for that
value; update the OpenAPI source to include p2p-dynamic-lazy and regenerate
shared/management/http/api/types.gen.go so the generated code and Valid() check
are in sync.

---

Outside diff comments:
In `@client/cmd/up.go`:
- Around line 438-453: The code currently allows setting both
LazyConnectionEnabled and ConnectionMode together and forwards connectionMode as
an unchecked string; update the builders that set req.LazyConnectionEnabled and
req.ConnectionMode (the branches using
cmd.Flag(enableLazyConnectionFlag).Changed and
cmd.Flag(connectionModeFlag).Changed and the related payload constructors used
in the other two locations) to first validate connectionMode against the allowed
enum/values and return an error if invalid, and also reject the request early
when both flags are present (if cmd.Flag(enableLazyConnectionFlag).Changed &&
cmd.Flag(connectionModeFlag).Changed) with a clear error message; apply the same
mutually-exclusive check and validation to all three payload-building sites so
the legacy lazy flag cannot be used alongside --connection-mode and typos in
connectionMode fail fast.

In `@management/server/account.go`:
- Around line 332-338: The code only sets updateAccountPeers when a subset of
settings change; include comparisons for ConnectionMode, RelayTimeoutSeconds,
P2pTimeoutSeconds, and P2pRetryMaxSeconds (the newly saved/audited connection
settings) in the same conditional that sets updateAccountPeers so that
IncrementNetworkSerial() and UpdateAccountPeers() run when any of those values
change; make the same addition in the other identical comparison block (the one
around where other settings are checked) so both code paths trigger
updateAccountPeers when any connection-related field differs between oldSettings
and newSettings.

In `@management/server/types/settings.go`:
- Around line 99-126: Settings.Copy currently returns a Settings that shares the
JWTAllowGroups slice with the source, causing mutations to bleed; update the
Copy method (function Settings.Copy) to clone JWTAllowGroups (e.g., use
slices.Clone or an explicit copy) when constructing the new Settings so the
returned Settings has its own independent slice just like PeerExposeGroups is
cloned.

---

Nitpick comments:
In `@client/internal/lazyconn/inactivity/manager_test.go`:
- Around line 141-144: Tests in manager_test.go overwrite the package-global
newTicker (e.g., setting newTicker = func(...) Ticker { return
&fakeTickerMock{CChan: fakeTick} }) but never restore it, making the suite
order-dependent; fix by capturing the original function before each override
(old := newTicker) and restoring it after the test (use defer newTicker = old or
t.Cleanup(func(){ newTicker = old })) wherever newTicker is reassigned
(instances around the fakeTickerMock overrides) so the global is returned to its
original value after each test.

In `@client/internal/peer/conn_test.go`:
- Around line 362-384: The test mutates iceBackoff internals (locking
iceBackoff.mu and setting iceBackoff.nextRetry) which couples it to
iceBackoffState layout; add a small test seam on iceBackoffState (e.g., func (b
*iceBackoffState) forceExpiredForTest() or func (b *iceBackoffState)
ResetNextRetry(t time.Time)) that performs the mutexed update of nextRetry, then
change TestConn_AttachICE_AfterBackoffExpiry to call that helper after
c.iceBackoff.markFailure() instead of touching mu/nextRetry directly; this keeps
Conn.AttachICE and newIceBackoff/markFailure unchanged but removes direct field
access from the test.
- Around line 344-360: The test TestConn_AttachICE_NoOpWhenSuspended relies on
AttachICE checking the backoff gate before validating a nil workerICE — make
this explicit and robust by injecting a stub/fake workerICE into the Conn that
will fail the test if its attach path is invoked; set c.workerICE to that stub
before calling c.AttachICE(), keep c.iceBackoff.markFailure(), assert err ==
nil, and verify the stub was not called and handshaker.readICEListener() remains
nil. This replaces the implicit nil-workerICE dependency with an explicit
assertion that the suspended path short-circuits AttachICE (referencing
TestConn_AttachICE_NoOpWhenSuspended, Conn.AttachICE, c.workerICE, and
handshaker.readICEListener()).

In `@client/internal/peer/conn.go`:
- Around line 1025-1054: AttachICE currently adds the ICE listener
(Conn.handshaker.AddICEListener with Conn.workerICE.OnNewOffer) and then only
logs SendOffer errors, leaving the local listener attached while remote peer may
never be notified; change AttachICE so that if Conn.handshaker.SendOffer()
returns an error it returns that error to the caller instead of swallowing it,
and ensure you undo the listener attachment on failure (remove the listener or
use a defer that removes it when SendOffer fails) so state stays consistent;
update callers in conn_mgr to handle the returned error (retry/backoff) as
needed.

In `@client/internal/peer/env_test.go`:
- Around line 33-34: The test currently hardcodes the environment variable names
"NB_ENABLE_EXPERIMENTAL_LAZY_CONN" and "NB_LAZY_CONN_INACTIVITY_THRESHOLD";
change t.Setenv calls to use the package constants envEnableLazyConn and
envInactivityThreshold (the same constants used elsewhere in this package) so
the test references a single source of truth and won't drift if names change.

In `@client/internal/peer/ice_backoff_test.go`:
- Around line 113-122: Add a test that verifies disabling backoff while already
suspended immediately clears suspension: create s :=
newIceBackoff(somePositiveMax) then call s.markFailure() and assert
s.IsSuspended() is true, then call s.SetMaxBackoff(0) and assert s.IsSuspended()
is false and the internal retry timestamp (e.g., s.retryAt or equivalent time
field) is cleared/zero (use its IsZero check). Name the test something like
TestIceBackoff_DisableWhileSuspended and use newIceBackoff, markFailure,
SetMaxBackoff, IsSuspended and the retry timestamp field to locate the logic.

In `@management/internals/shared/grpc/conversion_test.go`:
- Around line 160-169: Update TestToPeerConfig_P2pRetryMax_ExplicitDisable to
assert against the package's sentinel constant instead of the literal magic
number: replace the literal ^uint32(0) in the assertion with
p2pRetryMaxDisabledSentinel so the test uses the named sentinel from the grpc
package (keep the rest of the test and the call to toPeerConfigForTest
unchanged).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 34dd9c60-fb5b-443e-ae99-ca94bcb9fc13

📥 Commits

Reviewing files that changed from the base of the PR and between b19b746 and 75cbbe6.

⛔ Files ignored due to path filters (4)

client/proto/daemon.pb.go is excluded by !**/*.pb.go
client/proto/daemon_grpc.pb.go is excluded by !**/*.pb.go
shared/management/proto/management.pb.go is excluded by !**/*.pb.go
shared/management/proto/proxy_service.pb.go is excluded by !**/*.pb.go

📒 Files selected for processing (37)

client/cmd/root.go
client/cmd/up.go
client/internal/conn_mgr.go
client/internal/conn_mgr_test.go
client/internal/connect.go
client/internal/engine.go
client/internal/lazyconn/env.go
client/internal/lazyconn/inactivity/manager.go
client/internal/lazyconn/inactivity/manager_test.go
client/internal/lazyconn/manager/manager.go
client/internal/peer/conn.go
client/internal/peer/conn_test.go
client/internal/peer/env.go
client/internal/peer/env_test.go
client/internal/peer/handshaker.go
client/internal/peer/handshaker_test.go
client/internal/peer/ice_backoff.go
client/internal/peer/ice_backoff_test.go
client/internal/peer/status.go
client/internal/peer/worker_ice.go
client/internal/profilemanager/config.go
client/proto/daemon.proto
client/status/status.go
client/status/status_test.go
management/internals/shared/grpc/conversion.go
management/internals/shared/grpc/conversion_test.go
management/server/account.go
management/server/activity/codes.go
management/server/http/handlers/accounts/accounts_handler.go
management/server/http/handlers/accounts/accounts_handler_test.go
management/server/types/settings.go
management/server/types/settings_test.go
shared/connectionmode/mode.go
shared/connectionmode/mode_test.go
shared/management/http/api/openapi.yml
shared/management/http/api/types.gen.go
shared/management/proto/management.proto

coderabbitai · 2026-05-05T22:56:47Z

+func (e *ConnMgr) UpdatedRemotePeerConfig(ctx context.Context, pc *mgmProto.PeerConfig) error {
+	newMode, newRelay, newP2P, newP2pRetry := resolveConnectionMode(
+		e.envMode, e.envRelayTimeout, e.cfgMode, e.cfgRelayTimeout,
+		e.cfgP2pTimeout, e.cfgP2pRetryMax, pc,
+	)
+
+	if newMode == e.mode && newRelay == e.relayTimeoutSecs &&
+		newP2P == e.p2pTimeoutSecs && newP2pRetry == e.p2pRetryMaxSecs {
+		return nil
+	}
+	prev := e.mode
+	e.mode = newMode
+	e.relayTimeoutSecs = newRelay
+	e.p2pTimeoutSecs = newP2P
+	e.p2pRetryMaxSecs = newP2pRetry
+	e.propagateP2pRetryMaxToConns()
+
+	wasManaged := modeUsesLazyMgr(prev)
+	isManaged := modeUsesLazyMgr(newMode)
+	modeChanged := prev != newMode
+
+	if modeChanged && wasManaged && !isManaged {
+		log.Infof("lazy/dynamic connection manager disabled by management push (mode=%s)", newMode)
+		e.closeManager(ctx)
+		e.statusRecorder.UpdateLazyConnection(false)
+		return nil
+	}

+	if modeChanged && wasManaged && isManaged {
+		// Switching between lazy and dynamic at runtime: tear down the
+		// existing manager so initLazyManager picks up the new timeouts.
+		log.Infof("lazy/dynamic mode change %s -> %s, restarting manager", prev, newMode)
+		e.closeManager(ctx)
+		e.statusRecorder.UpdateLazyConnection(false)
+	}
+
+	if isManaged && e.lazyConnMgr == nil {
 		if e.rosenpassEnabled {
-			log.Infof("rosenpass connection manager is enabled, lazy connection manager will not be started")
+			log.Warnf("rosenpass enabled, ignoring lazy/dynamic mode push")
 			return nil
 		}
-
-		log.Warnf("lazy connection manager is enabled by management feature flag")
+		log.Infof("lazy/dynamic connection manager enabled by management push (mode=%s)", newMode)
 		e.initLazyManager(ctx)
-		e.statusRecorder.UpdateLazyConnection(true)
+		e.startModeSideEffects()
 		return e.addPeersToLazyConnManager()
-	} else {
-		if e.lazyConnMgr == nil {
-			return nil
-		}
-		log.Infof("lazy connection manager is disabled by management feature flag")
-		e.closeManager(ctx)
-		e.statusRecorder.UpdateLazyConnection(false)
-		return nil
 	}
+	return nil


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Restart the lazy/dynamic manager when only the timeouts change.

newRelay/newP2P update the stored fields here, but the live inactivity manager keeps the old timer values because it is only built in initLazyManager(). Right now a management push that changes relay_timeout_seconds or p2p_timeout_seconds inside the same mode updates bookkeeping only; the active lazy/dynamic lifecycle keeps running with the previous thresholds until a reconnect.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@client/internal/conn_mgr.go` around lines 245 - 291, In UpdatedRemotePeerConfig: when the connection mode remains managed (isManaged true) but any of the timeout/retry values change (compare newRelay/newP2P/newP2pRetry against the previously-stored e.relayTimeoutSecs/e.p2pTimeoutSecs/e.p2pRetryMaxSecs) you must restart the running lazy manager so it picks up the new timers; call e.closeManager(ctx), then re-init with e.initLazyManager(ctx), call e.startModeSideEffects(), and re-add peers via e.addPeersToLazyConnManager() (also ensure e.statusRecorder.UpdateLazyConnection(true) is set appropriately). Use the existing symbols (UpdatedRemotePeerConfig, isManaged, e.lazyConnMgr, closeManager, initLazyManager, startModeSideEffects, addPeersToLazyConnManager, propagateP2pRetryMaxToConns) to locate and implement this branch.

coderabbitai · 2026-05-05T22:56:47Z

+		Mode:      e.connMgr.Mode(),
+		P2pRetryMaxSeconds: e.connMgr.P2pRetryMax(),


⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

These mode-dependent conn settings are only applied at construction time.

Mode and P2pRetryMaxSeconds are snapshotted when the peer.Conn is created, but ConnMgr.UpdatedRemotePeerConfig() can change the resolved mode later without recreating existing conns. That means a live switch between p2p-lazy and p2p-dynamic can leave already-open peers running with the old Open/AttachICE/DetachICE behavior until each conn is rebuilt or explicitly reconfigured.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@client/internal/engine.go` around lines 1585 - 1586, The Mode and P2pRetryMaxSeconds values are only snapshotted into peer.Conn at creation, so runtime changes via ConnMgr.UpdatedRemotePeerConfig() won't affect existing conns; modify the code so connections are reconfigured when the ConnMgr changes by adding an update path: introduce or use methods on peer.Conn like SetMode(mode string) and SetP2pRetryMax(seconds int) (or a single UpdateConfig) and, in the ConnMgr.UpdatedRemotePeerConfig() handler inside engine (the code that currently constructs conns with Mode: e.connMgr.Mode(), P2pRetryMaxSeconds: e.connMgr.P2pRetryMax()), iterate existing e.peers and call those methods to apply the new mode and retry value (and trigger any Open/AttachICE/DetachICE transitions as needed) so live switches between p2p-lazy and p2p-dynamic take effect without recreating connections.

coderabbitai · 2026-05-05T22:56:47Z

 	if wgIface.IsUserspaceBind() {
-		m.inactivityManager = inactivity.NewManager(wgIface, config.InactivityThreshold)
+		iceTO, relayTO := config.resolvedTimeouts()
+		if iceTO == 0 && relayTO == 0 {
+			// Phase 1 / single-timer fallback when caller hasn't migrated.
+			m.inactivityManager = inactivity.NewManager(wgIface, config.InactivityThreshold)
+		} else {
+			m.inactivityManager = inactivity.NewManagerWithTwoTimers(wgIface, iceTO, relayTO)
+		}
 	} else {
 		log.Warnf("inactivity manager not supported for kernel mode, wait for remote peer to close the connection")
 	}


⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Kernel-bind fallback is unsafe here.

This leaves m.inactivityManager nil, but the rest of the manager still dereferences it unconditionally (Start, activateSinglePeer, addActivePeer, DeactivatePeer, removePeer). In lazy/dynamic mode on a kernel bind, that turns the “not supported” path into a runtime panic instead of a graceful no-op/fallback.

Please either refuse to construct/start this manager for kernel mode or make every inactivity-manager call path nil-safe.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@client/internal/lazyconn/manager/manager.go` around lines 102 - 112, The kernel-bind path currently leaves m.inactivityManager nil which causes panics when methods like Start, activateSinglePeer, addActivePeer, DeactivatePeer and removePeer unconditionally dereference it; fix this by ensuring m.inactivityManager is always a valid implementation: add a lightweight no-op inactivity manager (e.g., inactivity.NewNoopManager returning an object that implements the same interface as inactivity.NewManager/NewManagerWithTwoTimers) and instantiate it in the else branch where kernel mode is detected (replace the current log-only path), so all subsequent calls to m.inactivityManager are safe without changing callers; alternatively, if you prefer refusal, return an error from the constructor instead of creating the manager (but pick one approach and apply it consistently).

coderabbitai · 2026-05-05T22:56:47Z

+	// P2pRetryMaxSeconds is the cap for the ICE-failure backoff schedule
+	// in p2p-dynamic mode. 0 = use built-in default (DefaultP2PRetryMax).
+	// Wire-format sentinel uint32-max (= ^uint32(0)) means "user-explicit
+	// disable", which the resolver translates to time.Duration(0) at
+	// engine.go before passing it here. Phase 3 of #5989.
+	P2pRetryMaxSeconds uint32


⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Verify how engine.go translates the wire sentinel for P2pRetryMaxSeconds, and # what DefaultP2PRetryMax actually is, to determine whether this is a doc bug # or a logic bug. echo "--- engine.go: how is P2PRetryMaxSeconds translated into ConnConfig? ---" rg -nP --type=go -C5 'P2PRetryMaxSeconds|P2pRetryMaxSeconds' -g 'client/internal/engine*.go' echo echo "--- DefaultP2PRetryMax definition ---" rg -nP --type=go -C2 '\bDefaultP2PRetryMax\b' echo echo "--- newIceBackoff: how is a 0 cap handled? ---" rg -nP --type=go -C5 '\bnewIceBackoff\b' echo echo "--- iceBackoffState.SetMaxBackoff: how is a 0 cap handled? ---" ast-grep --pattern $'func ($_ *iceBackoffState) SetMaxBackoff($_) { $$$ }' echo echo "--- conn_mgr.go: who calls SetIceBackoffMax and with what input? ---" rg -nP --type=go -C5 '\bSetIceBackoffMax\b'

Repository: netbirdio/netbird

Length of output: 17107

🏁 Script executed:

rg -nP --type=go -A10 'func $.*ConnMgr$ P2pRetryMax' client/internal/conn_mgr.go

Repository: netbirdio/netbird

Length of output: 362

🏁 Script executed:

rg -nP --type=go -A15 'func $conn \*Conn$ SetIceBackoffMax' client/internal/peer/conn.go

Repository: netbirdio/netbird

Length of output: 657

🏁 Script executed:

sed -n '540,560p' client/internal/conn_mgr.go

Repository: netbirdio/netbird

Length of output: 640

🏁 Script executed:

rg -nP --type=go -B20 'user-explicit disable' client/internal/conn_mgr.go | head -40

Repository: netbirdio/netbird

Length of output: 777

Fix sentinel preservation through the backoff cap translation; user-explicit disable is currently reversed to default.

The documented path from wire sentinel to disable is broken. When a user explicitly disables via the management sentinel ^uint32(0), the flow is:

propagateP2pRetryMaxToConns() correctly translates sentinel to time.Duration(0) and calls SetIceBackoffMax(0)

SetIceBackoffMax() stores uint32(0) back into conn.config.P2pRetryMaxSeconds

conn.Open() reads uint32(0) and treats it as "not set", applying DefaultP2PRetryMax instead

The field comment at lines 98–100 is misleading: it states the resolver translates the sentinel to time.Duration(0) but omits that Open() immediately overwrites 0 with the default. Since 0 cannot distinguish "user wants disable" from "not set, use default", the user's explicit disable intent is silently reversed to 15-minute backoff.

The same issue affects SetIceBackoffMax() when called before Open(): the sentinel-decoded 0 gets stored as uint32(0) in the config, and the next Open() call overwrites it with the default.

Solution: Preserve the sentinel value ^uint32(0) in the uint32 field so Open() can explicitly handle it—map it to disable (zero cap) separately from the "not set, use default" path (which remains 0 today).

Suggested fix

// P2pRetryMaxSeconds is the cap for the ICE-failure backoff schedule // in p2p-dynamic mode. 0 = use built-in default (DefaultP2PRetryMax). -// Wire-format sentinel uint32-max (= ^uint32(0)) means "user-explicit -// disable", which the resolver translates to time.Duration(0) at -// engine.go before passing it here. Phase 3 of `#5989`. +// Wire-format sentinel uint32-max (= ^uint32(0)) means "user-explicit +// disable" and is preserved as ^uint32(0) here so Open() can +// distinguish it from the "not set" zero value. P2pRetryMaxSeconds uint32

-backoffCap := time.Duration(conn.config.P2pRetryMaxSeconds) * time.Second -if backoffCap == 0 { +var backoffCap time.Duration +switch conn.config.P2pRetryMaxSeconds { +case 0: backoffCap = DefaultP2PRetryMax +case ^uint32(0): + backoffCap = 0 // user-explicit disable; iceBackoff treats 0 as off +default: + backoffCap = time.Duration(conn.config.P2pRetryMaxSeconds) * time.Second }

And update propagateP2pRetryMaxToConns() in conn_mgr.go to pass the sentinel directly (currently it converts to 0).

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

// P2pRetryMaxSeconds is the cap for the ICE-failure backoff schedule

// in p2p-dynamic mode. 0 = use built-in default (DefaultP2PRetryMax).

// Wire-format sentinel uint32-max (= ^uint32(0)) means "user-explicit

// disable", which the resolver translates to time.Duration(0) at

// engine.go before passing it here. Phase 3 of #5989.

P2pRetryMaxSeconds uint32

// P2pRetryMaxSeconds is the cap for the ICE-failure backoff schedule

// in p2p-dynamic mode. 0 = use built-in default (DefaultP2PRetryMax).

// Wire-format sentinel uint32-max (= ^uint32(0)) means "user-explicit

// disable" and is preserved as ^uint32(0) here so Open() can

// distinguish it from the "not set" zero value.

P2pRetryMaxSeconds uint32

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@client/internal/peer/conn.go` around lines 96 - 101, The uint32 sentinel ^uint32(0) for "user-explicit disable" is being lost; fix by preserving that sentinel in the config and handling it explicitly: modify propagateP2pRetryMaxToConns() to pass the sentinel value (not time.Duration(0)) into SetIceBackoffMax when the wire-value is the sentinel, change SetIceBackoffMax(conn.config.P2pRetryMaxSeconds) so it stores the sentinel (^uint32(0)) into conn.config.P2pRetryMaxSeconds instead of 0 when disable was requested, and update conn.Open() to treat conn.config.P2pRetryMaxSeconds == ^uint32(0) as "disabled (zero cap)" while treating the genuine 0 (or other unset marker if you prefer) as "not set → apply DefaultP2PRetryMax"; ensure all logic paths that previously wrote 0 for disable now preserve/write the sentinel so the distinction remains.

coderabbitai · 2026-05-05T22:56:47Z

+func (s *iceBackoffState) markSuccess() {
+	s.mu.Lock()
+	defer s.mu.Unlock()
+	s.failures = 0
+	s.suspended = false
+	s.bo.Reset()
+}


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Clear suspension window when backoff is reset/disabled.

Disabling backoff via SetMaxBackoff(0) can still leave IsSuspended() true until the old retry timestamp elapses, and markSuccess() keeps a stale NextRetry in snapshots. This can incorrectly gate ICE attach and misreport status.

💡 Proposed fix

func (s *iceBackoffState) markSuccess() { s.mu.Lock() defer s.mu.Unlock() s.failures = 0 s.suspended = false + s.nextRetry = time.Time{} s.bo.Reset() } func (s *iceBackoffState) SetMaxBackoff(d time.Duration) { s.mu.Lock() defer s.mu.Unlock() if d == s.maxBackoff { return } s.maxBackoff = d s.bo = buildBackoff(d) + if d == 0 { + s.suspended = false + s.nextRetry = time.Time{} + } }

Also applies to: 115-123

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@client/internal/peer/ice_backoff.go` around lines 97 - 103, When resetting or disabling backoff (e.g., in markSuccess) you must also clear the suspended retry timestamp so IsSuspended() and snapshots do not retain a stale NextRetry; update markSuccess (and the corresponding reset/disable handler in the 115-123 block) to set the internal nextRetry field to the zero time (reset the suspension window) whenever you set s.suspended = false and bo.Reset(), ensuring snapshots no longer report a stale NextRetry after SetMaxBackoff(0) or a success.

coderabbitai · 2026-05-05T22:56:47Z

+	if req.Settings.P2pTimeoutSeconds != nil {
+		v := uint32(*req.Settings.P2pTimeoutSeconds)
+		returnSettings.P2pTimeoutSeconds = &v
+	}
+	if req.Settings.P2pRetryMaxSeconds != nil {
+		v := uint32(*req.Settings.P2pRetryMaxSeconds)
+		returnSettings.P2pRetryMaxSeconds = &v
+	}
+	if req.Settings.RelayTimeoutSeconds != nil {
+		v := uint32(*req.Settings.RelayTimeoutSeconds)
+		returnSettings.RelayTimeoutSeconds = &v
+	}


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate timeout bounds before casting to uint32.

These values arrive as int64 and are cast directly to uint32. Negative inputs or anything larger than uint32 wrap silently, so malformed requests can persist nonsensical timeouts (-1 becomes 4294967295, 4294967296 becomes 0). Reject out-of-range values before the cast.

Suggested fix

+ const maxUint32 = int64(^uint32(0)) + if req.Settings.P2pTimeoutSeconds != nil { + if *req.Settings.P2pTimeoutSeconds < 0 || *req.Settings.P2pTimeoutSeconds > maxUint32 { + return nil, status.Errorf(status.InvalidArgument, "p2p_timeout_seconds must be between 0 and %d", maxUint32) + } v := uint32(*req.Settings.P2pTimeoutSeconds) returnSettings.P2pTimeoutSeconds = &v } if req.Settings.P2pRetryMaxSeconds != nil { + if *req.Settings.P2pRetryMaxSeconds < 0 || *req.Settings.P2pRetryMaxSeconds > maxUint32 { + return nil, status.Errorf(status.InvalidArgument, "p2p_retry_max_seconds must be between 0 and %d", maxUint32) + } v := uint32(*req.Settings.P2pRetryMaxSeconds) returnSettings.P2pRetryMaxSeconds = &v } if req.Settings.RelayTimeoutSeconds != nil { + if *req.Settings.RelayTimeoutSeconds < 0 || *req.Settings.RelayTimeoutSeconds > maxUint32 { + return nil, status.Errorf(status.InvalidArgument, "relay_timeout_seconds must be between 0 and %d", maxUint32) + } v := uint32(*req.Settings.RelayTimeoutSeconds) returnSettings.RelayTimeoutSeconds = &v }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if req.Settings.P2pTimeoutSeconds != nil {

v := uint32(*req.Settings.P2pTimeoutSeconds)

returnSettings.P2pTimeoutSeconds = &v

}

if req.Settings.P2pRetryMaxSeconds != nil {

v := uint32(*req.Settings.P2pRetryMaxSeconds)

returnSettings.P2pRetryMaxSeconds = &v

}

if req.Settings.RelayTimeoutSeconds != nil {

v := uint32(*req.Settings.RelayTimeoutSeconds)

returnSettings.RelayTimeoutSeconds = &v

}

const maxUint32 = int64(^uint32(0))

if req.Settings.P2pTimeoutSeconds != nil {

if *req.Settings.P2pTimeoutSeconds < 0 || *req.Settings.P2pTimeoutSeconds > maxUint32 {

return nil, status.Errorf(status.InvalidArgument, "p2p_timeout_seconds must be between 0 and %d", maxUint32)

}

v := uint32(*req.Settings.P2pTimeoutSeconds)

returnSettings.P2pTimeoutSeconds = &v

}

if req.Settings.P2pRetryMaxSeconds != nil {

if *req.Settings.P2pRetryMaxSeconds < 0 || *req.Settings.P2pRetryMaxSeconds > maxUint32 {

return nil, status.Errorf(status.InvalidArgument, "p2p_retry_max_seconds must be between 0 and %d", maxUint32)

}

v := uint32(*req.Settings.P2pRetryMaxSeconds)

returnSettings.P2pRetryMaxSeconds = &v

}

if req.Settings.RelayTimeoutSeconds != nil {

if *req.Settings.RelayTimeoutSeconds < 0 || *req.Settings.RelayTimeoutSeconds > maxUint32 {

return nil, status.Errorf(status.InvalidArgument, "relay_timeout_seconds must be between 0 and %d", maxUint32)

}

v := uint32(*req.Settings.RelayTimeoutSeconds)

returnSettings.RelayTimeoutSeconds = &v

}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@management/server/http/handlers/accounts/accounts_handler.go` around lines 229 - 240, The code casts int64 timeout fields from req.Settings (P2pTimeoutSeconds, P2pRetryMaxSeconds, RelayTimeoutSeconds) directly to uint32 which can wrap on negative or too-large values; validate each non-nil *req.Settings.* value is within 0..math.MaxUint32 before casting, and if out of range return a validation error (HTTP 400) instead of assigning to returnSettings.*; after the check perform v := uint32(*req.Settings.<Field>) and set returnSettings.<Field> = &v.

coderabbitai · 2026-05-05T22:56:47Z

+const (
+	ModeUnspecified Mode = iota
+	ModeRelayForced
+	ModeP2P
+	ModeP2PLazy
+	ModeP2PDynamic
+	// ModeFollowServer is a client-side sentinel: setting this in the
+	// client config explicitly clears any local override so the
+	// server-pushed value (or its legacy fallback) is used. It MUST NOT
+	// be sent on the wire -- ToProto returns UNSPECIFIED for it.
+	ModeFollowServer
+)
+
+// String returns the canonical lower-kebab-case name of the mode.
+func (m Mode) String() string {
+	switch m {
+	case ModeRelayForced:
+		return "relay-forced"
+	case ModeP2P:
+		return "p2p"
+	case ModeP2PLazy:
+		return "p2p-lazy"
+	case ModeP2PDynamic:
+		return "p2p-dynamic"
+	case ModeFollowServer:
+		return "follow-server"
+	default:
+		return ""
+	}
+}
+
+// ParseString accepts the canonical name (case-insensitive, surrounding
+// whitespace tolerated) and returns the corresponding Mode. Empty input
+// returns ModeUnspecified with no error. Unknown input returns
+// ModeUnspecified with an error.
+func ParseString(s string) (Mode, error) {
+	switch strings.ToLower(strings.TrimSpace(s)) {
+	case "":
+		return ModeUnspecified, nil
+	case "relay-forced":
+		return ModeRelayForced, nil
+	case "p2p":
+		return ModeP2P, nil
+	case "p2p-lazy":
+		return ModeP2PLazy, nil
+	case "p2p-dynamic":
+		return ModeP2PDynamic, nil
+	case "follow-server":
+		return ModeFollowServer, nil
+	default:
+		return ModeUnspecified, fmt.Errorf("unknown connection mode %q", s)
+	}
+}
+
+// FromProto translates a proto enum value to the internal Mode.
+func FromProto(m mgmProto.ConnectionMode) Mode {
+	switch m {
+	case mgmProto.ConnectionMode_CONNECTION_MODE_RELAY_FORCED:
+		return ModeRelayForced
+	case mgmProto.ConnectionMode_CONNECTION_MODE_P2P:
+		return ModeP2P
+	case mgmProto.ConnectionMode_CONNECTION_MODE_P2P_LAZY:
+		return ModeP2PLazy
+	case mgmProto.ConnectionMode_CONNECTION_MODE_P2P_DYNAMIC:
+		return ModeP2PDynamic
+	default:
+		return ModeUnspecified
+	}
+}
+
+// ToProto translates the internal Mode to a proto enum value.
+// ModeFollowServer is a client-side concept and intentionally maps to
+// UNSPECIFIED so it never appears on the wire.
+func (m Mode) ToProto() mgmProto.ConnectionMode {
+	switch m {
+	case ModeRelayForced:
+		return mgmProto.ConnectionMode_CONNECTION_MODE_RELAY_FORCED
+	case ModeP2P:
+		return mgmProto.ConnectionMode_CONNECTION_MODE_P2P
+	case ModeP2PLazy:
+		return mgmProto.ConnectionMode_CONNECTION_MODE_P2P_LAZY
+	case ModeP2PDynamic:
+		return mgmProto.ConnectionMode_CONNECTION_MODE_P2P_DYNAMIC
+	default:
+		return mgmProto.ConnectionMode_CONNECTION_MODE_UNSPECIFIED
+	}
+}
+
+// ResolveLegacyLazyBool maps the historical Settings.LazyConnectionEnabled
+// boolean to the new Mode. Used when a new client receives an old server's
+// PeerConfig (ConnectionMode = UNSPECIFIED) or when the management server
+// has no explicit Settings.ConnectionMode set yet.
+func ResolveLegacyLazyBool(lazy bool) Mode {
+	if lazy {
+		return ModeP2PLazy
+	}
+	return ModeP2P
+}
+
+// ToLazyConnectionEnabled is the inverse mapping for backwards-compat.
+// Used by toPeerConfig() so old clients (which only know the boolean)
+// still get a sensible behaviour.
+//
+// Note: ModeRelayForced cannot be expressed via the legacy boolean and
+// falls back to false. This is a structural compat gap documented in the
+// release notes; admins must set NB_FORCE_RELAY=true on old clients
+// or upgrade them.
+func (m Mode) ToLazyConnectionEnabled() bool {
+	return m == ModeP2PLazy


⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

p2p-dynamic-lazy is missing from the canonical mode table.

The supported mode list for #5989 includes p2p-dynamic-lazy, but this package can't represent it anywhere: String(), ParseString(), FromProto(), ToProto(), and the legacy lazy fallback all omit it. Any config carrying that mode will currently fail parsing or get downgraded during conversion.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@shared/connectionmode/mode.go` around lines 19 - 127, Add a new Mode constant for "p2p-dynamic-lazy" and wire it through the conversions: define ModeP2PDynamicLazy alongside other Mode constants, return "p2p-dynamic-lazy" from Mode.String(), accept "p2p-dynamic-lazy" in ParseString(), map mgmProto.ConnectionMode_CONNECTION_MODE_P2P_DYNAMIC_LAZY <-> ModeP2PDynamicLazy in FromProto() and ToProto(), and make ModeP2PDynamicLazy return true from ToLazyConnectionEnabled() (and ensure ResolveLegacyLazyBool behavior remains correct). Update only the symbols ModeP2PDynamicLazy, String(), ParseString(), FromProto(), ToProto(), ToLazyConnectionEnabled() to include this new mode and keep ModeFollowServer behavior unchanged.

coderabbitai · 2026-05-05T22:56:47Z

+        connection_mode:
+          x-experimental: true
+          type: string
+          enum: [relay-forced, p2p, p2p-lazy, p2p-dynamic]


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

connection_mode enum is missing p2p-dynamic-lazy.

PR scope introduces p2p-dynamic-lazy, but the OpenAPI enum does not expose it, so REST clients cannot configure that mode through the account settings API.

Suggested fix

- enum: [relay-forced, p2p, p2p-lazy, p2p-dynamic] + enum: [relay-forced, p2p, p2p-lazy, p2p-dynamic, p2p-dynamic-lazy]

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

enum: [relay-forced, p2p, p2p-lazy, p2p-dynamic]

enum: [relay-forced, p2p, p2p-lazy, p2p-dynamic, p2p-dynamic-lazy]

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@shared/management/http/api/openapi.yml` at line 365, The OpenAPI enum for connection_mode is missing the new value; update the enum array under the connection_mode schema to include "p2p-dynamic-lazy" alongside the existing values (relay-forced, p2p, p2p-lazy, p2p-dynamic) so REST clients can set that mode; ensure you add the exact string "p2p-dynamic-lazy" to the enum in the openapi YAML and run any spec validation/generation steps you normally use.

coderabbitai · 2026-05-05T22:56:47Z

+            Phase 1 of issue #5989: relay-forced, p2p, and p2p-lazy are
+            functional. p2p-dynamic is reserved (passes through as p2p in
+            Phase 1; will become functional in Phase 2).
+        p2p_timeout_seconds:


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Stale phase-gating text in public API description.

This description says p2p-dynamic is reserved and not functional yet, which is outdated for this PR and can mislead API consumers.

Suggested fix

- Phase 1 of issue `#5989`: relay-forced, p2p, and p2p-lazy are - functional. p2p-dynamic is reserved (passes through as p2p in - Phase 1; will become functional in Phase 2). + Supported values are relay-forced, p2p, p2p-lazy, p2p-dynamic, + and p2p-dynamic-lazy.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

Phase 1 of issue #5989: relay-forced, p2p, and p2p-lazy are

functional. p2p-dynamic is reserved (passes through as p2p in

Phase 1; will become functional in Phase 2).

p2p_timeout_seconds:

Supported values are relay-forced, p2p, p2p-lazy, p2p-dynamic,

and p2p-dynamic-lazy.

p2p_timeout_seconds:

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@shared/management/http/api/openapi.yml` around lines 370 - 373, Update the stale documentation text in the OpenAPI description that mentions "Phase 1 of issue `#5989`" and that "p2p-dynamic is reserved" so it no longer misleads consumers; locate the block around the p2p_timeout_seconds schema entry in openapi.yml (the paragraph starting with "Phase 1 of issue `#5989`: relay-forced, p2p, and p2p-lazy are functional...") and replace or remove the phase-gating sentence to reflect the current status (either remove the "p2p-dynamic is reserved" clause or update it to state that p2p-dynamic is now functional), keeping the rest of the p2p_timeout_seconds description consistent.

coderabbitai · 2026-05-05T22:56:48Z

+// Defines values for AccountSettingsConnectionMode.
+const (
+	AccountSettingsConnectionModeP2p         AccountSettingsConnectionMode = "p2p"
+	AccountSettingsConnectionModeP2pDynamic  AccountSettingsConnectionMode = "p2p-dynamic"
+	AccountSettingsConnectionModeP2pLazy     AccountSettingsConnectionMode = "p2p-lazy"
+	AccountSettingsConnectionModeRelayForced AccountSettingsConnectionMode = "relay-forced"
+)
+
+// Valid indicates whether the value is a known member of the AccountSettingsConnectionMode enum.
+func (e AccountSettingsConnectionMode) Valid() bool {
+	switch e {
+	case AccountSettingsConnectionModeP2p:
+		return true
+	case AccountSettingsConnectionModeP2pDynamic:
+		return true
+	case AccountSettingsConnectionModeP2pLazy:
+		return true
+	case AccountSettingsConnectionModeRelayForced:
+		return true
+	default:
+		return false
+	}
+}


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add p2p-dynamic-lazy to the generated AccountSettingsConnectionMode enum.

The PR introduces p2p-dynamic-lazy, but this public enum and Valid() omit it. That means OpenAPI-generated clients cannot round-trip the full connection-mode set, and the handler-side req.Settings.ConnectionMode.Valid() check will reject that mode even if the backend supports it. Please add it in the OpenAPI source and regenerate this file.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@shared/management/http/api/types.gen.go` around lines 41 - 63, The generated enum AccountSettingsConnectionMode is missing the new value "p2p-dynamic-lazy": add a new constant named AccountSettingsConnectionModeP2pDynamicLazy with value "p2p-dynamic-lazy" to the AccountSettingsConnectionMode block and include it in the AccountSettingsConnectionMode.Valid() switch so the method returns true for that value; update the OpenAPI source to include p2p-dynamic-lazy and regenerate shared/management/http/api/types.gen.go so the generated code and Valid() check are in sync.

…uest test setconfig_test.go reflection-tests (TestSetConfig_AllFieldsSaved + TestCLIFlags_MappedToSetConfig) tolerate the four new fields: ConnectionMode, P2PTimeoutSeconds, RelayTimeoutSeconds, P2PRetryMaxSeconds. They are in the proto so daemons can advertise them via GetConfig, but SetConfig RPC does not apply them at runtime today — the CLI sets them via 'netbird service install/reconfigure' writing the active profile file directly. Listed in the test maps so the structural drift-detector passes; wiring through SetConfig is a follow-up.

The pb.go regeneration on dev machines uses local protoc (v5.29.3 here) which causes the Proto Version Check workflow to flag the diff as a toolchain mismatch. Restore the upstream-main version strings (v6.33.1 for client/proto, v7.34.1 for shared/management/proto) — generated content matches; only the comment-line differs.

…able)

…rsion pin - conn_mgr.go: tagged switch on uint32 sentinel (QF1002 staticcheck) - engine.go: //nolint:staticcheck on the deprecated peer.IsForceRelayed call site - the function is intentionally retained for Phase-1 backwards compat with daemons that haven't migrated to ResolveModeFromEnv yet - lazyconn/manager.go: //nolint:staticcheck on the deprecated inactivity.NewManager fallback - kept on purpose for callers without resolved two-timer config (iceTimeout=0 && relayTimeout=0) - proxy_service.pb.go: pin protoc version header to upstream-main v7.34.1 (Proto Version Check workflow flagged it; only the comment-line differs)

sonarqubecloud · 2026-05-06T06:53:48Z

Quality Gate failed

Failed conditions
5 New issues
5 New Code Smells (required ≤ 0)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

coderabbitai

♻️ Duplicate comments (2)

client/internal/conn_mgr.go (1)
251-289: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Restart the managed lifecycle when only the inactivity timeouts change.

relayTimeoutSecs/p2pTimeoutSecs are applied only in initLazyManager(), but this branch only tears the manager down when the mode changes. A management push that keeps p2p-lazy/p2p-dynamic the same and only changes relay_timeout_seconds or p2p_timeout_seconds updates the stored fields without rebuilding the live manager, so it keeps running with the previous thresholds until reconnect.
♻️ Minimal direction
+	timeoutsChanged := newRelay != e.relayTimeoutSecs || newP2P != e.p2pTimeoutSecs
-	if modeChanged && wasManaged && isManaged {
+	if wasManaged && isManaged && (modeChanged || timeoutsChanged) {
 		// Switching between lazy and dynamic at runtime: tear down the
 		// existing manager so initLazyManager picks up the new timeouts.
 		log.Infof("lazy/dynamic mode change %s -> %s, restarting manager", prev, newMode)
 		e.closeManager(ctx)
 		e.statusRecorder.UpdateLazyConnection(false)
 	}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@client/internal/conn_mgr.go` around lines 251 - 289, The code updates
relayTimeoutSecs and p2pTimeoutSecs but only restarts the lazy/dynamic manager
when the mode changes, so an active manager (lazyConnMgr != nil) will keep old
thresholds; modify the branch handling unchanged mode to detect when timeouts
(relayTimeoutSecs or p2pTimeoutSecs or p2pRetryMaxSecs) changed while
modeUsesLazyMgr(newMode) is true and lazyConnMgr != nil, and when they did: call
e.closeManager(ctx), e.statusRecorder.UpdateLazyConnection(false), then re-init
by calling e.initLazyManager(ctx), e.startModeSideEffects(), and return
e.addPeersToLazyConnManager() so the running manager is recreated with the new
timeouts (use existing helpers closeManager, initLazyManager,
startModeSideEffects, addPeersToLazyConnManager and keep
propagateP2pRetryMaxToConns for immediate propagation).
client/internal/engine.go (1)
1585-1586: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Reconfigure live peers when the resolved mode changes.

Mode is still only copied into peer.ConnConfig at construction time. UpdatedRemotePeerConfig() can change ConnMgr.mode later, but already-created peer.Conns will keep their old Open/AttachICE/DetachICE behavior until each peer is rebuilt, so live switches like p2p-lazy -> p2p-dynamic or p2p -> relay-forced won't fully take effect.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@client/internal/engine.go` around lines 1585 - 1586, The code only sets Mode
into peer.ConnConfig at construction (via e.connMgr.Mode()) so when
UpdatedRemotePeerConfig() changes ConnMgr.mode live peers (peer.Conn) keep old
Open/AttachICE/DetachICE behavior; update UpdatedRemotePeerConfig() to detect a
mode change and reconfigure existing peers by iterating active peer.Conn
instances and updating or rebuilding their ConnConfig (e.g., set the new Mode
and P2pRetryMaxSeconds) or invoking a peer-level method to apply the new
connection policy (e.g., peer.UpdateConnConfig / peer.RebuildConnections /
peer.SetMode) so live switches like p2p-lazy ↔ p2p-dynamic or p2p ↔ relay-forced
take effect immediately.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@client/internal/conn_mgr.go`:
- Around line 251-289: The code updates relayTimeoutSecs and p2pTimeoutSecs but
only restarts the lazy/dynamic manager when the mode changes, so an active
manager (lazyConnMgr != nil) will keep old thresholds; modify the branch
handling unchanged mode to detect when timeouts (relayTimeoutSecs or
p2pTimeoutSecs or p2pRetryMaxSecs) changed while modeUsesLazyMgr(newMode) is
true and lazyConnMgr != nil, and when they did: call e.closeManager(ctx),
e.statusRecorder.UpdateLazyConnection(false), then re-init by calling
e.initLazyManager(ctx), e.startModeSideEffects(), and return
e.addPeersToLazyConnManager() so the running manager is recreated with the new
timeouts (use existing helpers closeManager, initLazyManager,
startModeSideEffects, addPeersToLazyConnManager and keep
propagateP2pRetryMaxToConns for immediate propagation).

In `@client/internal/engine.go`:
- Around line 1585-1586: The code only sets Mode into peer.ConnConfig at
construction (via e.connMgr.Mode()) so when UpdatedRemotePeerConfig() changes
ConnMgr.mode live peers (peer.Conn) keep old Open/AttachICE/DetachICE behavior;
update UpdatedRemotePeerConfig() to detect a mode change and reconfigure
existing peers by iterating active peer.Conn instances and updating or
rebuilding their ConnConfig (e.g., set the new Mode and P2pRetryMaxSeconds) or
invoking a peer-level method to apply the new connection policy (e.g.,
peer.UpdateConnConfig / peer.RebuildConnections / peer.SetMode) so live switches
like p2p-lazy ↔ p2p-dynamic or p2p ↔ relay-forced take effect immediately.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: eb05b04e-fb0e-4e46-938d-a6555e15d67d

📥 Commits

Reviewing files that changed from the base of the PR and between 745ef8f and 2ef31c0.

📒 Files selected for processing (3)

client/internal/conn_mgr.go
client/internal/engine.go
client/internal/lazyconn/manager/manager.go

🚧 Files skipped from review as they are similar to previous changes (1)

client/internal/lazyconn/manager/manager.go

MichaelUray and others added 30 commits May 5, 2026 22:31

openapi: add p2p_retry_max_seconds to AccountSettings

0a77556

Phase 3 of netbirdio#5989. New field documented in the public API. Defaults to 900 seconds (15 min). 0 disables backoff. Regenerated types.gen.go via oapi-codegen. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

MichaelUray and others added 10 commits May 5, 2026 22:31

MichaelUray changed the title ~~Phase 1+2+3 of #5989: ConnectionMode foundation, two-timer lifecycle, iceBackoff~~ [1/4 stacked] Phase 1+2+3 of #5989: ConnectionMode foundation, two-timer lifecycle, iceBackoff May 5, 2026

coderabbitai Bot reviewed May 5, 2026

View reviewed changes

MichaelUray changed the title ~~[1/4 stacked] Phase 1+2+3 of #5989: ConnectionMode foundation, two-timer lifecycle, iceBackoff~~ [client, management] Phase 1+2+3 of #5989: ConnectionMode foundation, two-timer lifecycle, iceBackoff (stack 1/4) May 6, 2026

MichaelUray force-pushed the pr/a-p2p-dynamic-foundation branch from bad32d4 to 9ea349f Compare May 6, 2026 06:18

MichaelUray added 3 commits May 6, 2026 06:19

client/peer: fix codespell typo in env_test.go (unparseable -> unpars…

745ef8f

…able)

coderabbitai Bot reviewed May 6, 2026

View reviewed changes

		Mode: e.connMgr.Mode(),
		P2pRetryMaxSeconds: e.connMgr.P2pRetryMax(),

	enum: [relay-forced, p2p, p2p-lazy, p2p-dynamic]
	enum: [relay-forced, p2p, p2p-lazy, p2p-dynamic, p2p-dynamic-lazy]

Uh oh!

Conversation

MichaelUray commented May 5, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

[client+management] Phase 1+2+3 of #5989: ConnectionMode foundation, two-timer lifecycle, iceBackoff

Summary

Why

What's in this PR

Phase 1 — Wire + Settings + CLI plumbing

Phase 2 — Lifecycle mechanism

Phase 3 — Backoff

Tests

Test plan

Use case

Linked work

What's not here (follow-up PRs)

Summary by CodeRabbit

Documentation

Uh oh!

coderabbitai Bot commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related issues

Possibly related PRs

Suggested reviewers

Uh oh!

MichaelUray commented May 5, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 5, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 5, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 5, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 5, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 5, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 5, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 5, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 5, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 5, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 5, 2026

Choose a reason for hiding this comment

Uh oh!

sonarqubecloud Bot commented May 6, 2026

Quality Gate failed

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

MichaelUray commented May 5, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 5, 2026 •

edited

Loading