[client, management] Phase 1+2+3 of #5989: ConnectionMode foundation, two-timer lifecycle, iceBackoff (stack 1/4) #6081
Conversation
…nfig Additive change for issue netbirdio#5989 Phase 1. New fields use new tag numbers (11, 12, 13); existing fields (including LazyConnectionEnabled tag 6) are unchanged so old clients ignore the additions and old servers send UNSPECIFIED, which the new client maps back via the legacy boolean. Note: the regenerated pb.go files now report protoc v5.29.3 in their header (this branch was generated with locally-installed protoc 29.3 instead of upstream's v7.34.1). Functionally identical; header diff is the only delta beyond the actual schema additions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Defines Mode enum (relay-forced, p2p, p2p-lazy, p2p-dynamic plus the client-only sentinels Unspecified and FollowServer), ParseString for CLI/env input, ToProto/FromProto for wire translation, and the two backwards-compat helpers ResolveLegacyLazyBool / ToLazyConnectionEnabled that bridge the old Settings.LazyConnectionEnabled boolean. Phase 1 of issue netbirdio#5989. Pure addition -- no existing callers touched in this commit; the engine/conn_mgr migration follows in subsequent commits in the same PR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…on warns NB_CONNECTION_MODE wins over the legacy pair (NB_FORCE_RELAY, NB_ENABLE_EXPERIMENTAL_LAZY_CONN); when the legacy pair is set together, NB_FORCE_RELAY wins (most-restrictive, mirrors the group-conflict rule from issue netbirdio#5990). Each legacy var emits a one-shot deprecation warning when it actually contributes to the resolved mode. NB_LAZY_CONN_INACTIVITY_THRESHOLD becomes an alias for the future relay_timeout setting and warns once. IsForceRelayed() is kept for callers that have not yet been migrated (conn.go, statusrecorder); they will be updated in the engine/conn refactor commits later in this PR. Phase 1 of issue netbirdio#5989. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three new CLI flags map onto the new connection-mode plumbing: - --connection-mode <relay-forced|p2p|p2p-lazy|p2p-dynamic|follow-server> - --relay-timeout <seconds> - --p2p-timeout <seconds> Plumbed through three sites in cmd/up.go (SetConfigRequest, ConfigInput, LoginRequest), persisted in profilemanager.Config, and added as new fields on the daemon.proto IPC messages. Empty / not-changed flags fall back to the server-pushed value (which itself falls back to the legacy lazy_connection_enabled boolean for old servers). Phase 1 of issue netbirdio#5989. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
EngineConfig gains ConnectionMode, RelayTimeoutSeconds, P2pTimeoutSeconds. ConnMgr now stores the resolved Mode plus the raw inputs (env, config) so it can re-resolve when the server pushes a new PeerConfig. UpdatedRemoteFeatureFlag is renamed to UpdatedRemotePeerConfig and takes the full PeerConfig pointer; a thin shim with the old name delegates to it for callers that haven't been updated yet. connect.go copies the three new fields from profilemanager.Config into the EngineConfig builder, with a tolerant parser that logs and falls through to Unspecified on invalid input. Phase 1 of issue netbirdio#5989. peer/conn.go forwarding follows in C4. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ConnConfig gains a Mode field forwarded from the engine. Open() now checks Mode == ModeRelayForced instead of calling the global env-reader IsForceRelayed(). The local 'forceRelay' variable name is renamed to 'skipICE' to make the new branching intent explicit. The PeerStateUpdate block at the end of Open() also reads from conn.config.Mode now, so the StatusRecorder sees the per-peer mode rather than the global env var. A single remaining caller of IsForceRelayed() (srWatcher.Start in engine.go) is left for a follow-up; that path uses a process-wide flag not per-peer state, so it can be migrated in Phase 2 once srWatcher itself learns about ConnectionMode. Phase 1 of issue netbirdio#5989. Engine forwarding (C5) follows. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
createPeerConn now reads ConnMgr.Mode() and copies it into peer.ConnConfig, so the per-peer Open() loop in conn.go can take the ModeRelayForced skip-ICE branch without reading the global env var. This is the last wiring commit for the client side of Phase 1; the server-side mgmt changes (Settings + OpenAPI + handler + audit + NetworkMap-build) follow in Section D. Phase 1 of issue netbirdio#5989. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
All three fields are nullable to distinguish 'use built-in default' (NULL) from explicit values (incl. 0 = never tear down). Copy() now deep-clones the new pointer fields via two small helpers. GORM AutoMigrate creates the new columns at first start; existing accounts have NULL in all three columns and resolve via the legacy LazyConnectionEnabled boolean. Phase 1 of issue netbirdio#5989. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tings Three new optional, nullable fields with descriptions of the NULL = built-in-default semantics and the Phase-1-vs-Phase-2 status of p2p-dynamic. Regenerated types.gen.go via the existing oapi-codegen tooling. The generated AccountSettingsConnectionMode enum has the canonical values relay-forced / p2p / p2p-lazy / p2p-dynamic, plus a Valid() helper for handler-side validation. Phase 1 of issue netbirdio#5989. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PUT /api/accounts/{id} now accepts connection_mode (validated against
the four-value enum via the generated AccountSettingsConnectionMode.
Valid()), p2p_timeout_seconds and relay_timeout_seconds. NULL in the
JSON body keeps the existing value untouched (= "no client-side
override on this round-trip"); explicit NULL-clear via API uses a
distinct PATCH-style call which is out-of-scope for Phase 1.
Response payload mirrors the input fields back as nullable so the
dashboard can distinguish "use default" from "explicit value".
Phase 1 of issue netbirdio#5989. Audit-event emission follows in D5.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
AccountConnectionModeChanged (121), AccountRelayTimeoutChanged (122), AccountP2pTimeoutChanged (123) -- emitted from account.go when settings change. Per-peer / per-group event codes are reserved for Phase 3 (issue netbirdio#5990). Phase 1 of issue netbirdio#5989. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
handleConnectionModeSettings is invoked from the same diff-detection block as handleLazyConnectionSettings; emits one StoreEvent per changed field (ConnectionMode, RelayTimeoutSeconds, P2pTimeoutSeconds) with old/new values in the meta payload. Four small ptr-equality / deref helpers are added for nullable string and uint32 fields. They are package-private and named after the existing convention used elsewhere in the package. Phase 1 of issue netbirdio#5989. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…elds Two changes in one commit because they're inseparable: 1. Move client/internal/peer/connectionmode/ to shared/connectionmode/. The package now needs to be importable from BOTH client/ and management/ (which is impossible while it lives under client/internal/ per Go's internal-package rule). All imports updated; tests pass on both sides. 2. Extend management/internals/shared/grpc/conversion.go::toPeerConfig to populate the three new PeerConfig fields (ConnectionMode, P2PTimeoutSeconds, RelayTimeoutSeconds) using the connectionmode helpers. The legacy LazyConnectionEnabled boolean is now derived from the resolved Mode via ToLazyConnectionEnabled() rather than copied verbatim from Settings -- this is the central backwards-compat contract: old clients see only the boolean, new clients prefer the explicit enum and ignore the bool. Resolution rules (Phase 1, account-wide only): - Settings.ConnectionMode != nil and parses -> wins - Otherwise -> ResolveLegacyLazyBool(LazyConnectionEnabled) - timeouts: Settings.RelayTimeoutSeconds / P2pTimeoutSeconds when non-NULL, else 0 (= server has no preference; client uses built-in default) Per-peer / per-group resolution comes in Phase 3 (netbirdio#5990). Phase 1 of issue netbirdio#5989. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Nine sub-cases cover the Phase-1 resolution matrix from spec section 3: - no settings -> default (P2P + lazy=false) - legacy bool only -> mapped via ResolveLegacyLazyBool - explicit ConnectionMode -> wins over the legacy bool - timeouts propagate - garbage ConnectionMode value -> tolerant fallback to legacy bool Particular attention to the structural compat gap: relay-forced cannot be expressed via the legacy boolean, so the wire field for old clients is sent as false. Documented in the spec, asserted here. Existing TestAccount_GetPeerNetworkMap remains green: existing test peers have ConnectionMode=NULL in Settings, falls through to the legacy ResolveLegacyLazyBool(false) -> ModeP2P -> wire bool false. Phase 1 of issue netbirdio#5989. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 2 of netbirdio#5989 needs to dynamically detach the ICE listener at runtime (when p2p-dynamic mode hits its ICE-inactivity threshold). Adds the RemoveICEListener method and extends the Add/RemoveICEListener pair to hold h.mu; Listen() now reads h.iceListener through readICEListener under the same mutex, fixing a latent race that didn't matter while AddICEListener was the only writer. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 2 of netbirdio#5989 needs the daemon to fire ICE-teardown and relay- teardown at different idle thresholds. Adds NewManagerWithTwoTimers plus iceInactiveChan / relayInactiveChan; the periodic check loop fires each channel when its threshold elapses. Threshold == 0 disables that channel (peer.lazy: iceTimeout=0; peer.dynamic: both>0; eager modes don't register peers at all). NewManager (Phase-1 entry point) becomes a thin wrapper that delegates to newManager with iceTimeout=0, preserving p2p-lazy semantics exactly. The Phase-1 InactivePeersChan now aliases the new RelayInactiveChan so existing callers (engine.go p2p-lazy path) continue to work unchanged. Five sub-tests cover the timeout matrix: - TwoTimers_OnlyICEFires (peer idle between thresholds) - TwoTimers_BothFire (peer idle past both) - TwoTimers_ICEDisabled (iceTimeout=0) - TwoTimers_RelayDisabled (relayTimeout=0) - TwoTimers_BothDisabled (both 0 = manager inert) Plus Phase1_LazyEquivalence proves NewManager-call-site behaviour is unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two-tier timeouts for Phase-2 p2p-dynamic. The deprecated InactivityThreshold pointer field becomes a backwards-compat alias that maps onto RelayInactivityThreshold when set, so Phase-1 callers that still pass it (engine.go, p2p-lazy tests) keep working without edits. resolvedTimeouts() helper centralises the alias logic. NewManager picks the appropriate inactivity-manager constructor based on the resolved timeouts; legacy callers continue through the Phase-1 single-timer code path. InactivityManager() getter exposes the inner manager so the conn_mgr can subscribe to ICEInactiveChan / RelayInactiveChan in the p2p-dynamic dispatch loop. Phase 2 of issue netbirdio#5989. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
AttachICE registers the ICE-offer listener on the handshaker after the activity-detector observes traffic; emits a fresh offer so the remote side learns we are now ICE-capable. DetachICE removes the listener and calls workerICE.Close() so the ICE worker tears down without affecting the relay tunnel. Both methods are idempotent and use Handshaker.readICEListener() under its mutex, avoiding the race with Handshaker.Listen() that the prior direct field read had. Used in Phase 2 by the inactivity manager to flip ICE on/off independently of the relay tunnel. Refs netbirdio#5989 (Phase 2 / connection-mode-phase2).
Phase 2 of netbirdio#5989. ModeP2PDynamic now takes a third branch in Open(): workerICE is constructed eagerly (so the activity-trigger does not have to wait for ICE setup) but the AddICEListener call is skipped. The handshaker still has workerICE wired in via NewHandshaker, so buildOfferAnswer can include local ICE credentials when the listener is later attached by Conn.AttachICE() on activity-trigger. Behavior matrix in Open(): - ModeRelayForced: no workerICE, no ICE listener. - ModeP2P / ModeP2PLazy: workerICE created, listener registered eagerly (Phase-1 lazy-tunnel defer remains at conn_mgr level). - ModeP2PDynamic: workerICE created, listener deferred to AttachICE. Refs netbirdio#5989 (Phase 2).
Phase 2 of netbirdio#5989. Fixes the lazy/eager mismatch where a remote peer's GO_IDLE was silently dropped whenever the local manager was not in lazy mode, leaving the eager local end to immediately reconnect and defeating the remote's lazy/dynamic intent. DeactivatePeer now dispatches by locally-resolved connection mode: p2p-lazy -> existing lazy-manager teardown (whole tunnel) p2p-dynamic-> DetachICEForPeer (ICE only, relay tunnel stays up) p2p / relay-forced / unspecified -> silent no-op (eager modes deliberately ignore GO_IDLE because they are always-on) DetachICEForPeer is also the entry point used by the inactivity manager when its iceTimeout fires (engine wiring follows in E1). Lookup miss is treated as a no-op: a peer can be removed concurrently with a GO_IDLE or inactivity-timer event. Refs netbirdio#5989 (Phase 2).
Phase 2 of netbirdio#5989. ConnMgr.Start now also brings up the lazy/dynamic manager when mode==ModeP2PDynamic. A new runDynamicInactivityLoop goroutine reads from inactivity.Manager.ICEInactiveChan() and RelayInactiveChan() and dispatches DetachICEForPeer or full Conn.Close per peer. initLazyManager populates Config.RelayInactivityThreshold from e.relayTimeoutSecs and Config.ICEInactivityThreshold from e.p2pTimeoutSecs (only in dynamic mode; in lazy mode iceTimeout stays 0 so the ICE channel never fires, preserving Phase-1 semantics exactly). UpdatedRemotePeerConfig generalized to handle the dynamic mode the same way it handled lazy: enable the manager on entry, tear down on exit, restart on lazy<->dynamic transitions so the new timeouts take effect. Refs netbirdio#5989 (Phase 2).
Phase-2 follow-up. The dynamic-mode iceTimeout was always 0 in the field because resolveConnectionMode only resolved relay-timeout, not p2p-timeout, and the manager-init only consulted e.p2pTimeoutSecs which never got updated from server pushes. Resolution chain for P2pTimeoutSeconds is now symmetric to relay: cfgP2pTimeout (client config) > serverPC.GetP2PTimeoutSeconds() (no env var; reserved for Phase 3). UpdatedRemotePeerConfig now also stores the resolved p2p-timeout and includes it in the early-return short-circuit so changes to the server-pushed value trigger a re-init. Refs netbirdio#5989 (Phase 2).
Phase 2 of netbirdio#5989. Closes the activity-attach loop: ActivatePeer is called by the engine when a remote peer signals it is active. In p2p-lazy this opens the relay tunnel; in p2p-dynamic the relay was already eager-opened, so we additionally call Conn.AttachICE() to register the deferred ICE listener and emit a fresh offer. Without this, p2p-dynamic peers would never upgrade from relay to ICE. Refs netbirdio#5989 (Phase 2).
…5989) Phase 3 introduces per-peer ICE-failure backoff. Server-pushed cap on the truncated-exponential schedule lives in this new field 14. Wire-format sentinels: 0 -> "not set", daemon falls back to built-in 15 min default max -> user explicitly disabled backoff (Phase-2 behavior) N -> N seconds as the MaxInterval of cenkalti/backoff Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 3 of netbirdio#5989. Account-wide cap on the ICE-failure backoff schedule. Nullable: NULL = use daemon default (15 min), 0 = treated as "user-explicit disable" by the conversion layer (signaled via uint32-max sentinel on the wire). Plus deep-copy. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 3 of netbirdio#5989. New field documented in the public API. Defaults to 900 seconds (15 min). 0 disables backoff. Regenerated types.gen.go via oapi-codegen. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 3 of netbirdio#5989. The PUT handler now persists the new account-wide ICE-backoff cap, and the GET response surfaces it back. NULL stays NULL (= "use daemon default"). Mirrors the Phase-1+2 plumbing of relay_timeout_seconds and p2p_timeout_seconds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 3 of netbirdio#5989. New activity-code 124 (AccountP2pRetryMaxChanged) plus the StoreEvent call inside handleConnectionModeSettings, mirroring the Phase-1 Relay-timeout and Phase-2 P2P-timeout patterns. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…apping Phase 3 of netbirdio#5989. The conversion layer translates between DB-NULL (= "use daemon default") and Settings-zero (= "user-explicit disable") using a uint32-max sentinel on the wire. Test matrix covers all three states: NULL, 0, and a normal positive value. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 3 of netbirdio#5989. Per-peer ICE-failure backoff using cenkalti/backoff v4 (already imported by Guard). InitialInterval 1m, Multiplier 2.0, RandomizationFactor 0.1 -- produces ~1m, ~2m, ~4m, ~8m, ~15m capped sequence with default cap. Public API: newIceBackoff, markFailure (returns delay for logging), markSuccess, Reset (alias), IsSuspended, Snapshot (read-only view for status output), SetMaxBackoff (live update). maxBackoff=0 disables the backoff entirely (Phase-2 behavior). Test matrix covers initial state, exponential growth, cap override, disabled mode, success-reset, hard-reset, suspended-expiry, and live SetMaxBackoff. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 3 of netbirdio#5989. New ConnConfig.P2pRetryMaxSeconds field plus an iceBackoff field on Conn. Open() initializes the backoff state with the resolved cap; if cap is 0, uses DefaultP2PRetryMax (15 min). The wire-format sentinel uint32-max for "user-explicit disable" is translated to cap=0 by the resolver in engine.go (next commit), so this code only sees positive caps OR 0-meaning-default. Refs netbirdio#5989 (Phase 3).
Phase 3 of netbirdio#5989. The Phase-2 AttachICE primitive now consults the iceBackoffState before doing anything. When suspended (= recent ICE failure within the backoff window), AttachICE silently returns nil and stays on relay. When the backoff expires, AttachICE proceeds with the normal listener-attach + offer-send. This is the actual bug-fix for the reset-loop where each iCEConn- Disconnected event re-enabled the 3-quick-retries budget. The backoff is consulted INSIDE AttachICE, so any path that triggers AttachICE (activity, signal-reconnect, candidate-change) goes through the gate. Refs netbirdio#5989 (Phase 3).
Phase 3 of netbirdio#5989. ConnectionStateFailed -> onICEFailed (markFailure + DetachICE), ConnectionStateConnected/Completed -> onICEConnected (markSuccess). Closes the loop: actual ICE outcomes now drive the backoff state machine. This is what makes the AttachICE-gate from the previous commit actually engage. Without these hooks, the backoff state would never transition. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 3 of netbirdio#5989. Extends resolveConnectionMode to a 4-tuple return (mode, relay-timeout, p2p-timeout, p2p-retry-max). Resolution chain analogous to relay/p2p timeouts: client-config wins over server-push. ConnMgr struct stores both the resolved value (p2pRetryMaxSecs) and the raw client-config input (cfgP2pRetryMax) so subsequent server pushes can be re-resolved correctly. UpdatedRemotePeerConfig early-return shortcut now also checks p2p-retry-max for changes. Refs netbirdio#5989 (Phase 3). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Phase 3 of netbirdio#5989. New propagateP2pRetryMaxToConns() iterates all active peers and calls Conn.SetIceBackoffMax with the freshly resolved value. Translates the uint32-max sentinel ("user-explicit disable") to time.Duration(0) for the daemon-side semantics. NULL on server (wire 0) maps to peer.DefaultP2PRetryMax (15 min built-in default). Conn.SetIceBackoffMax updates the iceBackoffState live or, if the Conn is not yet opened, stashes the value in config for Open() to pick up. IceBackoffSnapshot exposes a read-only view for the status output (implemented in Section E). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 3 of netbirdio#5989. Local override hierarchy mirrors p2p-timeout: config.json profile field > CLI flag > server-pushed. New entry points for the account-wide ICE-failure backoff cap: - CLI: netbird up --p2p-retry-max=600 - Profile (config.json): "p2p_retry_max_seconds": 600 EngineConfig.P2pRetryMaxSeconds wired through connect.go and profilemanager (ConfigInput + Config + updateConfig handler). daemon.proto gains p2p_retry_max_seconds = 43 on LoginRequest and SetConfigRequest; daemon.pb.go regenerated. Removes the D1 placeholder in conn_mgr.go: cfgP2pRetryMax now reads from EngineConfig.P2pRetryMaxSeconds. Env var NB_P2P_RETRY_MAX_SECONDS not added (matches Phase-2 pattern: p2p-timeout has no env-var path either). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 3 of netbirdio#5989. Each new Conn now receives the resolved cap via ConnConfig. Conn.Open() then initializes its iceBackoffState. The sentinel-translation (uint32-max -> 0) happens inside Conn so the engine-level code stays simple. Refs netbirdio#5989 (Phase 3).
Phase 3 (E1) of netbirdio#5989. Full pipeline implemented: - daemon.proto: add iceBackoffFailures/NextRetry/Suspended fields to PeerState (field IDs 20-22) - client/internal/peer/status.go: add IceBackoff* fields to State struct + UpdatePeerIceBackoff() method - client/internal/peer/conn.go: push snapshot to statusRecorder after onICEFailed / onICEConnected - client/status/status.go: wire fields through ToProtoFullStatus(), PeerStateDetailOutput, parsePeers() - parsePeers() appends "ICE backoff: suspended for ..." line only when suspended - client/status/status_test.go: update JSON/YAML expectations for new fields When not suspended, the line is omitted (no noise on healthy peers). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 3 of netbirdio#5989 follow-up. The PeerState.IceBackoffSuspended flag is only refreshed on ICE state-change events (markFailure / markSuccess), so it stays true even after the suspension window has elapsed. The status detail-output now also wall-clock-checks IceBackoffNextRetry before printing the line, avoiding lines like "suspended for -1m45s" between the expiry and the next event-driven snapshot push. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…RetryMaxSeconds proto changes Picked-up commits in this branch added new fields to daemon.proto and management.proto (ConnectionMode enum, P2pTimeoutSeconds, RelayTimeoutSeconds, P2pRetryMaxSeconds, ICE-backoff fields). The generated .pb.go files in the original Phase-3.7i branch were left out-of-date during the cherry-pick; regenerated here so the package builds.
📝 WalkthroughWalkthroughAdds a connection-mode system with env/CLI/server precedence, new proto/API fields, dual inactivity timers and per-peer ICE backoff, mode-driven conn lifecycle (Attach/Detach ICE), client/server wiring, and extensive tests across components. ChangesConnection Mode, Timeouts, and ICE Backoff
Sequence DiagramsequenceDiagram
participant Admin as Admin
participant API as HTTP API
participant Server as Management Server
participant GRPC as gRPC Conversion
participant Client as NetBird Client
participant ConnMgr as ConnMgr
participant Peer as Peer
Admin->>API: PUT /accounts/{id} with connection_mode/timeouts
API->>Server: validate & persist settings
Server->>Server: emit audit events
Server->>GRPC: toPeerConfig maps settings -> PeerConfig
GRPC->>Client: push PeerConfig (mode, timeouts, retry_max)
Client->>Client: ResolveModeFromEnv() (NB_CONNECTION_MODE, legacy flags)
Client->>ConnMgr: init with env + CLI values
ConnMgr->>ConnMgr: resolveConnectionMode(env > client > server)
ConnMgr->>ConnMgr: start/restart lazy manager if mode/timeouts changed
ConnMgr->>Peer: propagateP2pRetryMaxToConns -> update peer backoff max
ConnMgr->>Peer: ActivatePeer (P2P dynamic) -> Peer.AttachICE()
Peer->>Peer: ICE events -> onICEConnected()/onICEFailed() update backoff
Peer->>ConnMgr: status -> Status.UpdatePeerIceBackoff()
Estimated code review effort🎯 4 (Complex) | ⏱️ ~70 minutes Possibly related issues
Possibly related PRs
Suggested reviewers
✨ Finishing Touches🧪 Generate unit tests (beta)
|
There was a problem hiding this comment.
Actionable comments posted: 13
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
management/server/types/settings.go (1)
99-126:⚠️ Potential issue | 🟠 Major | ⚡ Quick winClone
JWTAllowGroupsinCopy().
Copy()still sharesJWTAllowGroupswith the source instance, so mutating the returned settings can mutate the original. This PR already tightens copy semantics for the new nullable fields, so leaving this slice aliased will keep producing surprising state bleed.Suggested fix
settings := &Settings{ PeerLoginExpirationEnabled: s.PeerLoginExpirationEnabled, PeerLoginExpiration: s.PeerLoginExpiration, JWTGroupsEnabled: s.JWTGroupsEnabled, JWTGroupsClaimName: s.JWTGroupsClaimName, GroupsPropagationEnabled: s.GroupsPropagationEnabled, - JWTAllowGroups: s.JWTAllowGroups, + JWTAllowGroups: slices.Clone(s.JWTAllowGroups), RegularUsersViewBlocked: s.RegularUsersViewBlocked,🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@management/server/types/settings.go` around lines 99 - 126, Settings.Copy currently returns a Settings that shares the JWTAllowGroups slice with the source, causing mutations to bleed; update the Copy method (function Settings.Copy) to clone JWTAllowGroups (e.g., use slices.Clone or an explicit copy) when constructing the new Settings so the returned Settings has its own independent slice just like PeerExposeGroups is cloned.management/server/account.go (1)
332-338:⚠️ Potential issue | 🟠 Major | ⚡ Quick winPush the new connection settings to live peers.
These fields are now saved and audited, but they never set
updateAccountPeers. A change toConnectionMode,RelayTimeoutSeconds,P2pTimeoutSeconds, orP2pRetryMaxSecondstherefore skipsIncrementNetworkSerialand the laterUpdateAccountPeers()call, so connected peers keep stale connection behavior until some unrelated update or reconnect.Suggested fix
if oldSettings.RoutingPeerDNSResolutionEnabled != newSettings.RoutingPeerDNSResolutionEnabled || oldSettings.LazyConnectionEnabled != newSettings.LazyConnectionEnabled || + !equalStringPtr(oldSettings.ConnectionMode, newSettings.ConnectionMode) || + !equalUint32Ptr(oldSettings.RelayTimeoutSeconds, newSettings.RelayTimeoutSeconds) || + !equalUint32Ptr(oldSettings.P2pTimeoutSeconds, newSettings.P2pTimeoutSeconds) || + !equalUint32Ptr(oldSettings.P2pRetryMaxSeconds, newSettings.P2pRetryMaxSeconds) || oldSettings.DNSDomain != newSettings.DNSDomain || oldSettings.AutoUpdateVersion != newSettings.AutoUpdateVersion || oldSettings.AutoUpdateAlways != newSettings.AutoUpdateAlways { updateAccountPeers = true }Also applies to: 372-375
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@management/server/account.go` around lines 332 - 338, The code only sets updateAccountPeers when a subset of settings change; include comparisons for ConnectionMode, RelayTimeoutSeconds, P2pTimeoutSeconds, and P2pRetryMaxSeconds (the newly saved/audited connection settings) in the same conditional that sets updateAccountPeers so that IncrementNetworkSerial() and UpdateAccountPeers() run when any of those values change; make the same addition in the other identical comparison block (the one around where other settings are checked) so both code paths trigger updateAccountPeers when any connection-related field differs between oldSettings and newSettings.client/cmd/up.go (1)
438-453:⚠️ Potential issue | 🟠 Major | ⚡ Quick winReject contradictory connection-mode overrides before building these payloads.
These builders can currently emit both
LazyConnectionEnabledandConnectionMode, so a command like--enable-lazy-connection=true --connection-mode=follow-serversends mutually inconsistent intent. Also,connectionModeis forwarded as an unchecked string here, so typos can be persisted and later fall through as “unspecified” instead of failing fast.Please validate
connectionModebefore serialization and make the legacy lazy flag mutually exclusive with--connection-modeacross all three payloads.Also applies to: 572-583, 695-710
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@client/cmd/up.go` around lines 438 - 453, The code currently allows setting both LazyConnectionEnabled and ConnectionMode together and forwards connectionMode as an unchecked string; update the builders that set req.LazyConnectionEnabled and req.ConnectionMode (the branches using cmd.Flag(enableLazyConnectionFlag).Changed and cmd.Flag(connectionModeFlag).Changed and the related payload constructors used in the other two locations) to first validate connectionMode against the allowed enum/values and return an error if invalid, and also reject the request early when both flags are present (if cmd.Flag(enableLazyConnectionFlag).Changed && cmd.Flag(connectionModeFlag).Changed) with a clear error message; apply the same mutually-exclusive check and validation to all three payload-building sites so the legacy lazy flag cannot be used alongside --connection-mode and typos in connectionMode fail fast.
🧹 Nitpick comments (7)
client/internal/peer/ice_backoff_test.go (1)
113-122: ⚡ Quick winAdd a transition test for disabling backoff while already suspended.
Current zero-cap coverage starts from a fresh state. Please add a case that does
markFailure()→SetMaxBackoff(0)and asserts immediate unsuspension plus cleared retry timestamp.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@client/internal/peer/ice_backoff_test.go` around lines 113 - 122, Add a test that verifies disabling backoff while already suspended immediately clears suspension: create s := newIceBackoff(somePositiveMax) then call s.markFailure() and assert s.IsSuspended() is true, then call s.SetMaxBackoff(0) and assert s.IsSuspended() is false and the internal retry timestamp (e.g., s.retryAt or equivalent time field) is cleared/zero (use its IsZero check). Name the test something like TestIceBackoff_DisableWhileSuspended and use newIceBackoff, markFailure, SetMaxBackoff, IsSuspended and the retry timestamp field to locate the logic.client/internal/lazyconn/inactivity/manager_test.go (1)
141-144: ⚡ Quick winRestore
newTickerafter each test override.These tests mutate a package-global and never put it back. That makes the suite order-dependent and brittle if later tests in this package rely on the real ticker or start running in parallel.
Suggested fix
fakeTick := make(chan time.Time, 1) + oldTicker := newTicker newTicker = func(d time.Duration) Ticker { return &fakeTickerMock{CChan: fakeTick} } + t.Cleanup(func() { + newTicker = oldTicker + })Also applies to: 181-184, 224-227, 264-267, 303-306, 341-344
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@client/internal/lazyconn/inactivity/manager_test.go` around lines 141 - 144, Tests in manager_test.go overwrite the package-global newTicker (e.g., setting newTicker = func(...) Ticker { return &fakeTickerMock{CChan: fakeTick} }) but never restore it, making the suite order-dependent; fix by capturing the original function before each override (old := newTicker) and restoring it after the test (use defer newTicker = old or t.Cleanup(func(){ newTicker = old })) wherever newTicker is reassigned (instances around the fakeTickerMock overrides) so the global is returned to its original value after each test.management/internals/shared/grpc/conversion_test.go (1)
160-169: 💤 Low valueReference the named sentinel constant in the assertion.
Line 166 asserts against the literal
^uint32(0); this test is in the samegrpcpackage and can read the unexportedp2pRetryMaxDisabledSentineldirectly, which keeps the test honest if the sentinel value ever changes (e.g. moves to a different magic number) and makes the intent slightly clearer.♻️ Proposed fix
- if pc.P2PRetryMaxSeconds != ^uint32(0) { - t.Errorf("explicit 0 should map to uint32-max sentinel on the wire, got %d", pc.P2PRetryMaxSeconds) + if pc.P2PRetryMaxSeconds != p2pRetryMaxDisabledSentinel { + t.Errorf("explicit 0 should map to disabled-sentinel on the wire, got %d", pc.P2PRetryMaxSeconds) }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@management/internals/shared/grpc/conversion_test.go` around lines 160 - 169, Update TestToPeerConfig_P2pRetryMax_ExplicitDisable to assert against the package's sentinel constant instead of the literal magic number: replace the literal ^uint32(0) in the assertion with p2pRetryMaxDisabledSentinel so the test uses the named sentinel from the grpc package (keep the rest of the test and the call to toPeerConfigForTest unchanged).client/internal/peer/env_test.go (1)
33-34: 💤 Low valueUse the package constants instead of literal env var names.
Lines 33–34 hardcode
"NB_ENABLE_EXPERIMENTAL_LAZY_CONN"and"NB_LAZY_CONN_INACTIVITY_THRESHOLD"while the test lives in the samepeerpackage and could referenceenvEnableLazyConn/envInactivityThresholddirectly (you already do that for the exportedEnvKeyNBConnectionMode/EnvKeyNBForceRelaytwo lines above). Keeping a single source of truth avoids silent test drift if a constant is ever renamed.♻️ Proposed fix
- t.Setenv("NB_ENABLE_EXPERIMENTAL_LAZY_CONN", c.envEnableLazy) - t.Setenv("NB_LAZY_CONN_INACTIVITY_THRESHOLD", c.envInactivity) + t.Setenv(envEnableLazyConn, c.envEnableLazy) + t.Setenv(envInactivityThreshold, c.envInactivity)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@client/internal/peer/env_test.go` around lines 33 - 34, The test currently hardcodes the environment variable names "NB_ENABLE_EXPERIMENTAL_LAZY_CONN" and "NB_LAZY_CONN_INACTIVITY_THRESHOLD"; change t.Setenv calls to use the package constants envEnableLazyConn and envInactivityThreshold (the same constants used elsewhere in this package) so the test references a single source of truth and won't drift if names change.client/internal/peer/conn.go (1)
1025-1054: 💤 Low value
AttachICEswallowsSendOfferfailures silently.When
SendOfferreturns an error at line 1050,AttachICEonly logs a warning and returnsnil. The local listener is now attached but the remote peer has not been notified that we're ICE-capable, so until something else triggers a fresh offer (e.g. the guard goroutine), the dynamic-mode upgrade window is wasted. Worth at least propagating the error so callers inconn_mgrcan decide whether to retry.♻️ Proposed fix
if err := conn.handshaker.SendOffer(); err != nil { - conn.Log.Warnf("AttachICE: SendOffer failed: %v", err) + conn.Log.Warnf("AttachICE: SendOffer failed: %v", err) + return fmt.Errorf("AttachICE: send offer: %w", err) } return nil🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@client/internal/peer/conn.go` around lines 1025 - 1054, AttachICE currently adds the ICE listener (Conn.handshaker.AddICEListener with Conn.workerICE.OnNewOffer) and then only logs SendOffer errors, leaving the local listener attached while remote peer may never be notified; change AttachICE so that if Conn.handshaker.SendOffer() returns an error it returns that error to the caller instead of swallowing it, and ensure you undo the listener attachment on failure (remove the listener or use a defer that removes it when SendOffer fails) so state stays consistent; update callers in conn_mgr to handle the returned error (retry/backoff) as needed.client/internal/peer/conn_test.go (2)
362-384: 💤 Low valueConsider exposing a test helper instead of reaching into
iceBackoffinternals.Lines 370-372 lock and directly mutate
iceBackoff.muandiceBackoff.nextRetryfrom the test. While same-package access is allowed in Go, this tightly couples the test to the unexported layout oficeBackoffState— any rename or restructuring of those fields silently breaks the test.A small helper such as
(b *iceBackoffState) forceExpiredForTest()or an exportedReset(nextRetry time.Time)test seam would decouple the test from internal state.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@client/internal/peer/conn_test.go` around lines 362 - 384, The test mutates iceBackoff internals (locking iceBackoff.mu and setting iceBackoff.nextRetry) which couples it to iceBackoffState layout; add a small test seam on iceBackoffState (e.g., func (b *iceBackoffState) forceExpiredForTest() or func (b *iceBackoffState) ResetNextRetry(t time.Time)) that performs the mutexed update of nextRetry, then change TestConn_AttachICE_AfterBackoffExpiry to call that helper after c.iceBackoff.markFailure() instead of touching mu/nextRetry directly; this keeps Conn.AttachICE and newIceBackoff/markFailure unchanged but removes direct field access from the test.
344-360: 💤 Low value
TestConn_AttachICE_NoOpWhenSuspendedis implicitly order-dependent.The test asserts
err == nilwith a nilworkerICEonly becauseAttachICEcurrently checks the backoff gate before theworkerICEnil check. If that ordering ever changes, the test would either start returning a "nil workerICE" error (masking the suspend path) or panic, with no obvious test failure message pointing to the ordering assumption.Consider adding an explicit comment documenting this dependency, or restructuring so the suspended path is exercised via a dedicated stub
workerICEthat records whether it was reached.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@client/internal/peer/conn_test.go` around lines 344 - 360, The test TestConn_AttachICE_NoOpWhenSuspended relies on AttachICE checking the backoff gate before validating a nil workerICE — make this explicit and robust by injecting a stub/fake workerICE into the Conn that will fail the test if its attach path is invoked; set c.workerICE to that stub before calling c.AttachICE(), keep c.iceBackoff.markFailure(), assert err == nil, and verify the stub was not called and handshaker.readICEListener() remains nil. This replaces the implicit nil-workerICE dependency with an explicit assertion that the suspended path short-circuits AttachICE (referencing TestConn_AttachICE_NoOpWhenSuspended, Conn.AttachICE, c.workerICE, and handshaker.readICEListener()).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@client/internal/conn_mgr.go`:
- Around line 245-291: In UpdatedRemotePeerConfig: when the connection mode
remains managed (isManaged true) but any of the timeout/retry values change
(compare newRelay/newP2P/newP2pRetry against the previously-stored
e.relayTimeoutSecs/e.p2pTimeoutSecs/e.p2pRetryMaxSecs) you must restart the
running lazy manager so it picks up the new timers; call e.closeManager(ctx),
then re-init with e.initLazyManager(ctx), call e.startModeSideEffects(), and
re-add peers via e.addPeersToLazyConnManager() (also ensure
e.statusRecorder.UpdateLazyConnection(true) is set appropriately). Use the
existing symbols (UpdatedRemotePeerConfig, isManaged, e.lazyConnMgr,
closeManager, initLazyManager, startModeSideEffects, addPeersToLazyConnManager,
propagateP2pRetryMaxToConns) to locate and implement this branch.
In `@client/internal/engine.go`:
- Around line 1585-1586: The Mode and P2pRetryMaxSeconds values are only
snapshotted into peer.Conn at creation, so runtime changes via
ConnMgr.UpdatedRemotePeerConfig() won't affect existing conns; modify the code
so connections are reconfigured when the ConnMgr changes by adding an update
path: introduce or use methods on peer.Conn like SetMode(mode string) and
SetP2pRetryMax(seconds int) (or a single UpdateConfig) and, in the
ConnMgr.UpdatedRemotePeerConfig() handler inside engine (the code that currently
constructs conns with Mode: e.connMgr.Mode(), P2pRetryMaxSeconds:
e.connMgr.P2pRetryMax()), iterate existing e.peers and call those methods to
apply the new mode and retry value (and trigger any Open/AttachICE/DetachICE
transitions as needed) so live switches between p2p-lazy and p2p-dynamic take
effect without recreating connections.
In `@client/internal/lazyconn/manager/manager.go`:
- Around line 102-112: The kernel-bind path currently leaves m.inactivityManager
nil which causes panics when methods like Start, activateSinglePeer,
addActivePeer, DeactivatePeer and removePeer unconditionally dereference it; fix
this by ensuring m.inactivityManager is always a valid implementation: add a
lightweight no-op inactivity manager (e.g., inactivity.NewNoopManager returning
an object that implements the same interface as
inactivity.NewManager/NewManagerWithTwoTimers) and instantiate it in the else
branch where kernel mode is detected (replace the current log-only path), so all
subsequent calls to m.inactivityManager are safe without changing callers;
alternatively, if you prefer refusal, return an error from the constructor
instead of creating the manager (but pick one approach and apply it
consistently).
In `@client/internal/peer/conn.go`:
- Around line 96-101: The uint32 sentinel ^uint32(0) for "user-explicit disable"
is being lost; fix by preserving that sentinel in the config and handling it
explicitly: modify propagateP2pRetryMaxToConns() to pass the sentinel value (not
time.Duration(0)) into SetIceBackoffMax when the wire-value is the sentinel,
change SetIceBackoffMax(conn.config.P2pRetryMaxSeconds) so it stores the
sentinel (^uint32(0)) into conn.config.P2pRetryMaxSeconds instead of 0 when
disable was requested, and update conn.Open() to treat
conn.config.P2pRetryMaxSeconds == ^uint32(0) as "disabled (zero cap)" while
treating the genuine 0 (or other unset marker if you prefer) as "not set → apply
DefaultP2PRetryMax"; ensure all logic paths that previously wrote 0 for disable
now preserve/write the sentinel so the distinction remains.
In `@client/internal/peer/ice_backoff.go`:
- Around line 97-103: When resetting or disabling backoff (e.g., in markSuccess)
you must also clear the suspended retry timestamp so IsSuspended() and snapshots
do not retain a stale NextRetry; update markSuccess (and the corresponding
reset/disable handler in the 115-123 block) to set the internal nextRetry field
to the zero time (reset the suspension window) whenever you set s.suspended =
false and bo.Reset(), ensuring snapshots no longer report a stale NextRetry
after SetMaxBackoff(0) or a success.
In `@management/server/account.go`:
- Around line 459-489: The audit currently uses derefUint32Ptr (and
derefStringPtr for timeouts) which collapses nil and 0 to 0; update
handleConnectionModeSettings so timeout fields emit the raw pointer values
instead of dereferenced zeroed values (e.g., use oldSettings.RelayTimeoutSeconds
and newSettings.RelayTimeoutSeconds,
oldSettings.P2pTimeoutSeconds/newSettings.P2pTimeoutSeconds,
oldSettings.P2pRetryMaxSeconds/newSettings.P2pRetryMaxSeconds) in the StoreEvent
payload so nil stays nil and 0 stays 0; keep the equality checks
(equalUint32Ptr) but change the map entries for the three timeout events to pass
the pointer (or a small helper that returns either nil or uint32) instead of
derefUint32Ptr.
In `@management/server/http/handlers/accounts/accounts_handler.go`:
- Around line 223-240: The handler currently treats JSON null as "absent" so
ConnectionMode, P2pTimeoutSeconds, P2pRetryMaxSeconds and RelayTimeoutSeconds
can never be cleared once set; update the request DTO used by the accounts
update path (the req.Settings type referenced in accounts_handler.go) to
implement tri-state semantics (absent vs null vs value) — e.g. replace plain *T
fields with a custom Nullable[T] struct (or use *json.RawMessage / presence
flag) that records whether the field was provided and whether it was explicitly
null, change the handler logic in the update function to inspect that presence
flag and set
returnSettings.<ConnectionMode|P2pTimeoutSeconds|P2pRetryMaxSeconds|RelayTimeoutSeconds>
to nil when the client sent explicit null, set to the concrete value when
provided, and leave unchanged when absent, and update JSON unmarshalling/tests
accordingly.
- Around line 218-221: The handler currently returns a generic fmt.Errorf for an
invalid req.Settings.ConnectionMode which yields a 500; change this to return a
gRPC InvalidArgument error consistent with other validation failures in this
handler (e.g. use status.Errorf(codes.InvalidArgument, "invalid connection_mode
%q", modeStr) or the project’s util helper for invalid arguments) when
req.Settings.ConnectionMode.Valid() is false, referencing
req.Settings.ConnectionMode and modeStr so the error message includes the bad
value.
- Around line 229-240: The code casts int64 timeout fields from req.Settings
(P2pTimeoutSeconds, P2pRetryMaxSeconds, RelayTimeoutSeconds) directly to uint32
which can wrap on negative or too-large values; validate each non-nil
*req.Settings.* value is within 0..math.MaxUint32 before casting, and if out of
range return a validation error (HTTP 400) instead of assigning to
returnSettings.*; after the check perform v := uint32(*req.Settings.<Field>) and
set returnSettings.<Field> = &v.
In `@shared/connectionmode/mode.go`:
- Around line 19-127: Add a new Mode constant for "p2p-dynamic-lazy" and wire it
through the conversions: define ModeP2PDynamicLazy alongside other Mode
constants, return "p2p-dynamic-lazy" from Mode.String(), accept
"p2p-dynamic-lazy" in ParseString(), map
mgmProto.ConnectionMode_CONNECTION_MODE_P2P_DYNAMIC_LAZY <-> ModeP2PDynamicLazy
in FromProto() and ToProto(), and make ModeP2PDynamicLazy return true from
ToLazyConnectionEnabled() (and ensure ResolveLegacyLazyBool behavior remains
correct). Update only the symbols ModeP2PDynamicLazy, String(), ParseString(),
FromProto(), ToProto(), ToLazyConnectionEnabled() to include this new mode and
keep ModeFollowServer behavior unchanged.
In `@shared/management/http/api/openapi.yml`:
- Line 365: The OpenAPI enum for connection_mode is missing the new value;
update the enum array under the connection_mode schema to include
"p2p-dynamic-lazy" alongside the existing values (relay-forced, p2p, p2p-lazy,
p2p-dynamic) so REST clients can set that mode; ensure you add the exact string
"p2p-dynamic-lazy" to the enum in the openapi YAML and run any spec
validation/generation steps you normally use.
- Around line 370-373: Update the stale documentation text in the OpenAPI
description that mentions "Phase 1 of issue `#5989`" and that "p2p-dynamic is
reserved" so it no longer misleads consumers; locate the block around the
p2p_timeout_seconds schema entry in openapi.yml (the paragraph starting with
"Phase 1 of issue `#5989`: relay-forced, p2p, and p2p-lazy are functional...") and
replace or remove the phase-gating sentence to reflect the current status
(either remove the "p2p-dynamic is reserved" clause or update it to state that
p2p-dynamic is now functional), keeping the rest of the p2p_timeout_seconds
description consistent.
In `@shared/management/http/api/types.gen.go`:
- Around line 41-63: The generated enum AccountSettingsConnectionMode is missing
the new value "p2p-dynamic-lazy": add a new constant named
AccountSettingsConnectionModeP2pDynamicLazy with value "p2p-dynamic-lazy" to the
AccountSettingsConnectionMode block and include it in the
AccountSettingsConnectionMode.Valid() switch so the method returns true for that
value; update the OpenAPI source to include p2p-dynamic-lazy and regenerate
shared/management/http/api/types.gen.go so the generated code and Valid() check
are in sync.
---
Outside diff comments:
In `@client/cmd/up.go`:
- Around line 438-453: The code currently allows setting both
LazyConnectionEnabled and ConnectionMode together and forwards connectionMode as
an unchecked string; update the builders that set req.LazyConnectionEnabled and
req.ConnectionMode (the branches using
cmd.Flag(enableLazyConnectionFlag).Changed and
cmd.Flag(connectionModeFlag).Changed and the related payload constructors used
in the other two locations) to first validate connectionMode against the allowed
enum/values and return an error if invalid, and also reject the request early
when both flags are present (if cmd.Flag(enableLazyConnectionFlag).Changed &&
cmd.Flag(connectionModeFlag).Changed) with a clear error message; apply the same
mutually-exclusive check and validation to all three payload-building sites so
the legacy lazy flag cannot be used alongside --connection-mode and typos in
connectionMode fail fast.
In `@management/server/account.go`:
- Around line 332-338: The code only sets updateAccountPeers when a subset of
settings change; include comparisons for ConnectionMode, RelayTimeoutSeconds,
P2pTimeoutSeconds, and P2pRetryMaxSeconds (the newly saved/audited connection
settings) in the same conditional that sets updateAccountPeers so that
IncrementNetworkSerial() and UpdateAccountPeers() run when any of those values
change; make the same addition in the other identical comparison block (the one
around where other settings are checked) so both code paths trigger
updateAccountPeers when any connection-related field differs between oldSettings
and newSettings.
In `@management/server/types/settings.go`:
- Around line 99-126: Settings.Copy currently returns a Settings that shares the
JWTAllowGroups slice with the source, causing mutations to bleed; update the
Copy method (function Settings.Copy) to clone JWTAllowGroups (e.g., use
slices.Clone or an explicit copy) when constructing the new Settings so the
returned Settings has its own independent slice just like PeerExposeGroups is
cloned.
---
Nitpick comments:
In `@client/internal/lazyconn/inactivity/manager_test.go`:
- Around line 141-144: Tests in manager_test.go overwrite the package-global
newTicker (e.g., setting newTicker = func(...) Ticker { return
&fakeTickerMock{CChan: fakeTick} }) but never restore it, making the suite
order-dependent; fix by capturing the original function before each override
(old := newTicker) and restoring it after the test (use defer newTicker = old or
t.Cleanup(func(){ newTicker = old })) wherever newTicker is reassigned
(instances around the fakeTickerMock overrides) so the global is returned to its
original value after each test.
In `@client/internal/peer/conn_test.go`:
- Around line 362-384: The test mutates iceBackoff internals (locking
iceBackoff.mu and setting iceBackoff.nextRetry) which couples it to
iceBackoffState layout; add a small test seam on iceBackoffState (e.g., func (b
*iceBackoffState) forceExpiredForTest() or func (b *iceBackoffState)
ResetNextRetry(t time.Time)) that performs the mutexed update of nextRetry, then
change TestConn_AttachICE_AfterBackoffExpiry to call that helper after
c.iceBackoff.markFailure() instead of touching mu/nextRetry directly; this keeps
Conn.AttachICE and newIceBackoff/markFailure unchanged but removes direct field
access from the test.
- Around line 344-360: The test TestConn_AttachICE_NoOpWhenSuspended relies on
AttachICE checking the backoff gate before validating a nil workerICE — make
this explicit and robust by injecting a stub/fake workerICE into the Conn that
will fail the test if its attach path is invoked; set c.workerICE to that stub
before calling c.AttachICE(), keep c.iceBackoff.markFailure(), assert err ==
nil, and verify the stub was not called and handshaker.readICEListener() remains
nil. This replaces the implicit nil-workerICE dependency with an explicit
assertion that the suspended path short-circuits AttachICE (referencing
TestConn_AttachICE_NoOpWhenSuspended, Conn.AttachICE, c.workerICE, and
handshaker.readICEListener()).
In `@client/internal/peer/conn.go`:
- Around line 1025-1054: AttachICE currently adds the ICE listener
(Conn.handshaker.AddICEListener with Conn.workerICE.OnNewOffer) and then only
logs SendOffer errors, leaving the local listener attached while remote peer may
never be notified; change AttachICE so that if Conn.handshaker.SendOffer()
returns an error it returns that error to the caller instead of swallowing it,
and ensure you undo the listener attachment on failure (remove the listener or
use a defer that removes it when SendOffer fails) so state stays consistent;
update callers in conn_mgr to handle the returned error (retry/backoff) as
needed.
In `@client/internal/peer/env_test.go`:
- Around line 33-34: The test currently hardcodes the environment variable names
"NB_ENABLE_EXPERIMENTAL_LAZY_CONN" and "NB_LAZY_CONN_INACTIVITY_THRESHOLD";
change t.Setenv calls to use the package constants envEnableLazyConn and
envInactivityThreshold (the same constants used elsewhere in this package) so
the test references a single source of truth and won't drift if names change.
In `@client/internal/peer/ice_backoff_test.go`:
- Around line 113-122: Add a test that verifies disabling backoff while already
suspended immediately clears suspension: create s :=
newIceBackoff(somePositiveMax) then call s.markFailure() and assert
s.IsSuspended() is true, then call s.SetMaxBackoff(0) and assert s.IsSuspended()
is false and the internal retry timestamp (e.g., s.retryAt or equivalent time
field) is cleared/zero (use its IsZero check). Name the test something like
TestIceBackoff_DisableWhileSuspended and use newIceBackoff, markFailure,
SetMaxBackoff, IsSuspended and the retry timestamp field to locate the logic.
In `@management/internals/shared/grpc/conversion_test.go`:
- Around line 160-169: Update TestToPeerConfig_P2pRetryMax_ExplicitDisable to
assert against the package's sentinel constant instead of the literal magic
number: replace the literal ^uint32(0) in the assertion with
p2pRetryMaxDisabledSentinel so the test uses the named sentinel from the grpc
package (keep the rest of the test and the call to toPeerConfigForTest
unchanged).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 34dd9c60-fb5b-443e-ae99-ca94bcb9fc13
⛔ Files ignored due to path filters (4)
client/proto/daemon.pb.gois excluded by!**/*.pb.goclient/proto/daemon_grpc.pb.gois excluded by!**/*.pb.goshared/management/proto/management.pb.gois excluded by!**/*.pb.goshared/management/proto/proxy_service.pb.gois excluded by!**/*.pb.go
📒 Files selected for processing (37)
client/cmd/root.goclient/cmd/up.goclient/internal/conn_mgr.goclient/internal/conn_mgr_test.goclient/internal/connect.goclient/internal/engine.goclient/internal/lazyconn/env.goclient/internal/lazyconn/inactivity/manager.goclient/internal/lazyconn/inactivity/manager_test.goclient/internal/lazyconn/manager/manager.goclient/internal/peer/conn.goclient/internal/peer/conn_test.goclient/internal/peer/env.goclient/internal/peer/env_test.goclient/internal/peer/handshaker.goclient/internal/peer/handshaker_test.goclient/internal/peer/ice_backoff.goclient/internal/peer/ice_backoff_test.goclient/internal/peer/status.goclient/internal/peer/worker_ice.goclient/internal/profilemanager/config.goclient/proto/daemon.protoclient/status/status.goclient/status/status_test.gomanagement/internals/shared/grpc/conversion.gomanagement/internals/shared/grpc/conversion_test.gomanagement/server/account.gomanagement/server/activity/codes.gomanagement/server/http/handlers/accounts/accounts_handler.gomanagement/server/http/handlers/accounts/accounts_handler_test.gomanagement/server/types/settings.gomanagement/server/types/settings_test.goshared/connectionmode/mode.goshared/connectionmode/mode_test.goshared/management/http/api/openapi.ymlshared/management/http/api/types.gen.goshared/management/proto/management.proto
| func (e *ConnMgr) UpdatedRemotePeerConfig(ctx context.Context, pc *mgmProto.PeerConfig) error { | ||
| newMode, newRelay, newP2P, newP2pRetry := resolveConnectionMode( | ||
| e.envMode, e.envRelayTimeout, e.cfgMode, e.cfgRelayTimeout, | ||
| e.cfgP2pTimeout, e.cfgP2pRetryMax, pc, | ||
| ) | ||
|
|
||
| if newMode == e.mode && newRelay == e.relayTimeoutSecs && | ||
| newP2P == e.p2pTimeoutSecs && newP2pRetry == e.p2pRetryMaxSecs { | ||
| return nil | ||
| } | ||
| prev := e.mode | ||
| e.mode = newMode | ||
| e.relayTimeoutSecs = newRelay | ||
| e.p2pTimeoutSecs = newP2P | ||
| e.p2pRetryMaxSecs = newP2pRetry | ||
| e.propagateP2pRetryMaxToConns() | ||
|
|
||
| wasManaged := modeUsesLazyMgr(prev) | ||
| isManaged := modeUsesLazyMgr(newMode) | ||
| modeChanged := prev != newMode | ||
|
|
||
| if modeChanged && wasManaged && !isManaged { | ||
| log.Infof("lazy/dynamic connection manager disabled by management push (mode=%s)", newMode) | ||
| e.closeManager(ctx) | ||
| e.statusRecorder.UpdateLazyConnection(false) | ||
| return nil | ||
| } | ||
|
|
||
| if modeChanged && wasManaged && isManaged { | ||
| // Switching between lazy and dynamic at runtime: tear down the | ||
| // existing manager so initLazyManager picks up the new timeouts. | ||
| log.Infof("lazy/dynamic mode change %s -> %s, restarting manager", prev, newMode) | ||
| e.closeManager(ctx) | ||
| e.statusRecorder.UpdateLazyConnection(false) | ||
| } | ||
|
|
||
| if isManaged && e.lazyConnMgr == nil { | ||
| if e.rosenpassEnabled { | ||
| log.Infof("rosenpass connection manager is enabled, lazy connection manager will not be started") | ||
| log.Warnf("rosenpass enabled, ignoring lazy/dynamic mode push") | ||
| return nil | ||
| } | ||
|
|
||
| log.Warnf("lazy connection manager is enabled by management feature flag") | ||
| log.Infof("lazy/dynamic connection manager enabled by management push (mode=%s)", newMode) | ||
| e.initLazyManager(ctx) | ||
| e.statusRecorder.UpdateLazyConnection(true) | ||
| e.startModeSideEffects() | ||
| return e.addPeersToLazyConnManager() | ||
| } else { | ||
| if e.lazyConnMgr == nil { | ||
| return nil | ||
| } | ||
| log.Infof("lazy connection manager is disabled by management feature flag") | ||
| e.closeManager(ctx) | ||
| e.statusRecorder.UpdateLazyConnection(false) | ||
| return nil | ||
| } | ||
| return nil |
There was a problem hiding this comment.
Restart the lazy/dynamic manager when only the timeouts change.
newRelay/newP2P update the stored fields here, but the live inactivity manager keeps the old timer values because it is only built in initLazyManager(). Right now a management push that changes relay_timeout_seconds or p2p_timeout_seconds inside the same mode updates bookkeeping only; the active lazy/dynamic lifecycle keeps running with the previous thresholds until a reconnect.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@client/internal/conn_mgr.go` around lines 245 - 291, In
UpdatedRemotePeerConfig: when the connection mode remains managed (isManaged
true) but any of the timeout/retry values change (compare
newRelay/newP2P/newP2pRetry against the previously-stored
e.relayTimeoutSecs/e.p2pTimeoutSecs/e.p2pRetryMaxSecs) you must restart the
running lazy manager so it picks up the new timers; call e.closeManager(ctx),
then re-init with e.initLazyManager(ctx), call e.startModeSideEffects(), and
re-add peers via e.addPeersToLazyConnManager() (also ensure
e.statusRecorder.UpdateLazyConnection(true) is set appropriately). Use the
existing symbols (UpdatedRemotePeerConfig, isManaged, e.lazyConnMgr,
closeManager, initLazyManager, startModeSideEffects, addPeersToLazyConnManager,
propagateP2pRetryMaxToConns) to locate and implement this branch.
| Mode: e.connMgr.Mode(), | ||
| P2pRetryMaxSeconds: e.connMgr.P2pRetryMax(), |
There was a problem hiding this comment.
These mode-dependent conn settings are only applied at construction time.
Mode and P2pRetryMaxSeconds are snapshotted when the peer.Conn is created, but ConnMgr.UpdatedRemotePeerConfig() can change the resolved mode later without recreating existing conns. That means a live switch between p2p-lazy and p2p-dynamic can leave already-open peers running with the old Open/AttachICE/DetachICE behavior until each conn is rebuilt or explicitly reconfigured.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@client/internal/engine.go` around lines 1585 - 1586, The Mode and
P2pRetryMaxSeconds values are only snapshotted into peer.Conn at creation, so
runtime changes via ConnMgr.UpdatedRemotePeerConfig() won't affect existing
conns; modify the code so connections are reconfigured when the ConnMgr changes
by adding an update path: introduce or use methods on peer.Conn like
SetMode(mode string) and SetP2pRetryMax(seconds int) (or a single UpdateConfig)
and, in the ConnMgr.UpdatedRemotePeerConfig() handler inside engine (the code
that currently constructs conns with Mode: e.connMgr.Mode(), P2pRetryMaxSeconds:
e.connMgr.P2pRetryMax()), iterate existing e.peers and call those methods to
apply the new mode and retry value (and trigger any Open/AttachICE/DetachICE
transitions as needed) so live switches between p2p-lazy and p2p-dynamic take
effect without recreating connections.
| if wgIface.IsUserspaceBind() { | ||
| m.inactivityManager = inactivity.NewManager(wgIface, config.InactivityThreshold) | ||
| iceTO, relayTO := config.resolvedTimeouts() | ||
| if iceTO == 0 && relayTO == 0 { | ||
| // Phase 1 / single-timer fallback when caller hasn't migrated. | ||
| m.inactivityManager = inactivity.NewManager(wgIface, config.InactivityThreshold) | ||
| } else { | ||
| m.inactivityManager = inactivity.NewManagerWithTwoTimers(wgIface, iceTO, relayTO) | ||
| } | ||
| } else { | ||
| log.Warnf("inactivity manager not supported for kernel mode, wait for remote peer to close the connection") | ||
| } |
There was a problem hiding this comment.
Kernel-bind fallback is unsafe here.
This leaves m.inactivityManager nil, but the rest of the manager still dereferences it unconditionally (Start, activateSinglePeer, addActivePeer, DeactivatePeer, removePeer). In lazy/dynamic mode on a kernel bind, that turns the “not supported” path into a runtime panic instead of a graceful no-op/fallback.
Please either refuse to construct/start this manager for kernel mode or make every inactivity-manager call path nil-safe.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@client/internal/lazyconn/manager/manager.go` around lines 102 - 112, The
kernel-bind path currently leaves m.inactivityManager nil which causes panics
when methods like Start, activateSinglePeer, addActivePeer, DeactivatePeer and
removePeer unconditionally dereference it; fix this by ensuring
m.inactivityManager is always a valid implementation: add a lightweight no-op
inactivity manager (e.g., inactivity.NewNoopManager returning an object that
implements the same interface as inactivity.NewManager/NewManagerWithTwoTimers)
and instantiate it in the else branch where kernel mode is detected (replace the
current log-only path), so all subsequent calls to m.inactivityManager are safe
without changing callers; alternatively, if you prefer refusal, return an error
from the constructor instead of creating the manager (but pick one approach and
apply it consistently).
| // P2pRetryMaxSeconds is the cap for the ICE-failure backoff schedule | ||
| // in p2p-dynamic mode. 0 = use built-in default (DefaultP2PRetryMax). | ||
| // Wire-format sentinel uint32-max (= ^uint32(0)) means "user-explicit | ||
| // disable", which the resolver translates to time.Duration(0) at | ||
| // engine.go before passing it here. Phase 3 of #5989. | ||
| P2pRetryMaxSeconds uint32 |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify how engine.go translates the wire sentinel for P2pRetryMaxSeconds, and
# what DefaultP2PRetryMax actually is, to determine whether this is a doc bug
# or a logic bug.
echo "--- engine.go: how is P2PRetryMaxSeconds translated into ConnConfig? ---"
rg -nP --type=go -C5 'P2PRetryMaxSeconds|P2pRetryMaxSeconds' -g 'client/internal/engine*.go'
echo
echo "--- DefaultP2PRetryMax definition ---"
rg -nP --type=go -C2 '\bDefaultP2PRetryMax\b'
echo
echo "--- newIceBackoff: how is a 0 cap handled? ---"
rg -nP --type=go -C5 '\bnewIceBackoff\b'
echo
echo "--- iceBackoffState.SetMaxBackoff: how is a 0 cap handled? ---"
ast-grep --pattern $'func ($_ *iceBackoffState) SetMaxBackoff($_) {
$$$
}'
echo
echo "--- conn_mgr.go: who calls SetIceBackoffMax and with what input? ---"
rg -nP --type=go -C5 '\bSetIceBackoffMax\b'Repository: netbirdio/netbird
Length of output: 17107
🏁 Script executed:
rg -nP --type=go -A10 'func \(.*ConnMgr\) P2pRetryMax' client/internal/conn_mgr.goRepository: netbirdio/netbird
Length of output: 362
🏁 Script executed:
rg -nP --type=go -A15 'func \(conn \*Conn\) SetIceBackoffMax' client/internal/peer/conn.goRepository: netbirdio/netbird
Length of output: 657
🏁 Script executed:
sed -n '540,560p' client/internal/conn_mgr.goRepository: netbirdio/netbird
Length of output: 640
🏁 Script executed:
rg -nP --type=go -B20 'user-explicit disable' client/internal/conn_mgr.go | head -40Repository: netbirdio/netbird
Length of output: 777
Fix sentinel preservation through the backoff cap translation; user-explicit disable is currently reversed to default.
The documented path from wire sentinel to disable is broken. When a user explicitly disables via the management sentinel ^uint32(0), the flow is:
propagateP2pRetryMaxToConns()correctly translates sentinel totime.Duration(0)and callsSetIceBackoffMax(0)SetIceBackoffMax()storesuint32(0)back intoconn.config.P2pRetryMaxSecondsconn.Open()readsuint32(0)and treats it as "not set", applyingDefaultP2PRetryMaxinstead
The field comment at lines 98–100 is misleading: it states the resolver translates the sentinel to time.Duration(0) but omits that Open() immediately overwrites 0 with the default. Since 0 cannot distinguish "user wants disable" from "not set, use default", the user's explicit disable intent is silently reversed to 15-minute backoff.
The same issue affects SetIceBackoffMax() when called before Open(): the sentinel-decoded 0 gets stored as uint32(0) in the config, and the next Open() call overwrites it with the default.
Solution: Preserve the sentinel value ^uint32(0) in the uint32 field so Open() can explicitly handle it—map it to disable (zero cap) separately from the "not set, use default" path (which remains 0 today).
Suggested fix
// P2pRetryMaxSeconds is the cap for the ICE-failure backoff schedule
// in p2p-dynamic mode. 0 = use built-in default (DefaultP2PRetryMax).
-// Wire-format sentinel uint32-max (= ^uint32(0)) means "user-explicit
-// disable", which the resolver translates to time.Duration(0) at
-// engine.go before passing it here. Phase 3 of `#5989`.
+// Wire-format sentinel uint32-max (= ^uint32(0)) means "user-explicit
+// disable" and is preserved as ^uint32(0) here so Open() can
+// distinguish it from the "not set" zero value.
P2pRetryMaxSeconds uint32-backoffCap := time.Duration(conn.config.P2pRetryMaxSeconds) * time.Second
-if backoffCap == 0 {
+var backoffCap time.Duration
+switch conn.config.P2pRetryMaxSeconds {
+case 0:
backoffCap = DefaultP2PRetryMax
+case ^uint32(0):
+ backoffCap = 0 // user-explicit disable; iceBackoff treats 0 as off
+default:
+ backoffCap = time.Duration(conn.config.P2pRetryMaxSeconds) * time.Second
}And update propagateP2pRetryMaxToConns() in conn_mgr.go to pass the sentinel directly (currently it converts to 0).
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // P2pRetryMaxSeconds is the cap for the ICE-failure backoff schedule | |
| // in p2p-dynamic mode. 0 = use built-in default (DefaultP2PRetryMax). | |
| // Wire-format sentinel uint32-max (= ^uint32(0)) means "user-explicit | |
| // disable", which the resolver translates to time.Duration(0) at | |
| // engine.go before passing it here. Phase 3 of #5989. | |
| P2pRetryMaxSeconds uint32 | |
| // P2pRetryMaxSeconds is the cap for the ICE-failure backoff schedule | |
| // in p2p-dynamic mode. 0 = use built-in default (DefaultP2PRetryMax). | |
| // Wire-format sentinel uint32-max (= ^uint32(0)) means "user-explicit | |
| // disable" and is preserved as ^uint32(0) here so Open() can | |
| // distinguish it from the "not set" zero value. | |
| P2pRetryMaxSeconds uint32 |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@client/internal/peer/conn.go` around lines 96 - 101, The uint32 sentinel
^uint32(0) for "user-explicit disable" is being lost; fix by preserving that
sentinel in the config and handling it explicitly: modify
propagateP2pRetryMaxToConns() to pass the sentinel value (not time.Duration(0))
into SetIceBackoffMax when the wire-value is the sentinel, change
SetIceBackoffMax(conn.config.P2pRetryMaxSeconds) so it stores the sentinel
(^uint32(0)) into conn.config.P2pRetryMaxSeconds instead of 0 when disable was
requested, and update conn.Open() to treat conn.config.P2pRetryMaxSeconds ==
^uint32(0) as "disabled (zero cap)" while treating the genuine 0 (or other unset
marker if you prefer) as "not set → apply DefaultP2PRetryMax"; ensure all logic
paths that previously wrote 0 for disable now preserve/write the sentinel so the
distinction remains.
| func (s *iceBackoffState) markSuccess() { | ||
| s.mu.Lock() | ||
| defer s.mu.Unlock() | ||
| s.failures = 0 | ||
| s.suspended = false | ||
| s.bo.Reset() | ||
| } |
There was a problem hiding this comment.
Clear suspension window when backoff is reset/disabled.
Disabling backoff via SetMaxBackoff(0) can still leave IsSuspended() true until the old retry timestamp elapses, and markSuccess() keeps a stale NextRetry in snapshots. This can incorrectly gate ICE attach and misreport status.
💡 Proposed fix
func (s *iceBackoffState) markSuccess() {
s.mu.Lock()
defer s.mu.Unlock()
s.failures = 0
s.suspended = false
+ s.nextRetry = time.Time{}
s.bo.Reset()
}
func (s *iceBackoffState) SetMaxBackoff(d time.Duration) {
s.mu.Lock()
defer s.mu.Unlock()
if d == s.maxBackoff {
return
}
s.maxBackoff = d
s.bo = buildBackoff(d)
+ if d == 0 {
+ s.suspended = false
+ s.nextRetry = time.Time{}
+ }
}Also applies to: 115-123
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@client/internal/peer/ice_backoff.go` around lines 97 - 103, When resetting or
disabling backoff (e.g., in markSuccess) you must also clear the suspended retry
timestamp so IsSuspended() and snapshots do not retain a stale NextRetry; update
markSuccess (and the corresponding reset/disable handler in the 115-123 block)
to set the internal nextRetry field to the zero time (reset the suspension
window) whenever you set s.suspended = false and bo.Reset(), ensuring snapshots
no longer report a stale NextRetry after SetMaxBackoff(0) or a success.
| if req.Settings.P2pTimeoutSeconds != nil { | ||
| v := uint32(*req.Settings.P2pTimeoutSeconds) | ||
| returnSettings.P2pTimeoutSeconds = &v | ||
| } | ||
| if req.Settings.P2pRetryMaxSeconds != nil { | ||
| v := uint32(*req.Settings.P2pRetryMaxSeconds) | ||
| returnSettings.P2pRetryMaxSeconds = &v | ||
| } | ||
| if req.Settings.RelayTimeoutSeconds != nil { | ||
| v := uint32(*req.Settings.RelayTimeoutSeconds) | ||
| returnSettings.RelayTimeoutSeconds = &v | ||
| } |
There was a problem hiding this comment.
Validate timeout bounds before casting to uint32.
These values arrive as int64 and are cast directly to uint32. Negative inputs or anything larger than uint32 wrap silently, so malformed requests can persist nonsensical timeouts (-1 becomes 4294967295, 4294967296 becomes 0). Reject out-of-range values before the cast.
Suggested fix
+ const maxUint32 = int64(^uint32(0))
+
if req.Settings.P2pTimeoutSeconds != nil {
+ if *req.Settings.P2pTimeoutSeconds < 0 || *req.Settings.P2pTimeoutSeconds > maxUint32 {
+ return nil, status.Errorf(status.InvalidArgument, "p2p_timeout_seconds must be between 0 and %d", maxUint32)
+ }
v := uint32(*req.Settings.P2pTimeoutSeconds)
returnSettings.P2pTimeoutSeconds = &v
}
if req.Settings.P2pRetryMaxSeconds != nil {
+ if *req.Settings.P2pRetryMaxSeconds < 0 || *req.Settings.P2pRetryMaxSeconds > maxUint32 {
+ return nil, status.Errorf(status.InvalidArgument, "p2p_retry_max_seconds must be between 0 and %d", maxUint32)
+ }
v := uint32(*req.Settings.P2pRetryMaxSeconds)
returnSettings.P2pRetryMaxSeconds = &v
}
if req.Settings.RelayTimeoutSeconds != nil {
+ if *req.Settings.RelayTimeoutSeconds < 0 || *req.Settings.RelayTimeoutSeconds > maxUint32 {
+ return nil, status.Errorf(status.InvalidArgument, "relay_timeout_seconds must be between 0 and %d", maxUint32)
+ }
v := uint32(*req.Settings.RelayTimeoutSeconds)
returnSettings.RelayTimeoutSeconds = &v
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if req.Settings.P2pTimeoutSeconds != nil { | |
| v := uint32(*req.Settings.P2pTimeoutSeconds) | |
| returnSettings.P2pTimeoutSeconds = &v | |
| } | |
| if req.Settings.P2pRetryMaxSeconds != nil { | |
| v := uint32(*req.Settings.P2pRetryMaxSeconds) | |
| returnSettings.P2pRetryMaxSeconds = &v | |
| } | |
| if req.Settings.RelayTimeoutSeconds != nil { | |
| v := uint32(*req.Settings.RelayTimeoutSeconds) | |
| returnSettings.RelayTimeoutSeconds = &v | |
| } | |
| const maxUint32 = int64(^uint32(0)) | |
| if req.Settings.P2pTimeoutSeconds != nil { | |
| if *req.Settings.P2pTimeoutSeconds < 0 || *req.Settings.P2pTimeoutSeconds > maxUint32 { | |
| return nil, status.Errorf(status.InvalidArgument, "p2p_timeout_seconds must be between 0 and %d", maxUint32) | |
| } | |
| v := uint32(*req.Settings.P2pTimeoutSeconds) | |
| returnSettings.P2pTimeoutSeconds = &v | |
| } | |
| if req.Settings.P2pRetryMaxSeconds != nil { | |
| if *req.Settings.P2pRetryMaxSeconds < 0 || *req.Settings.P2pRetryMaxSeconds > maxUint32 { | |
| return nil, status.Errorf(status.InvalidArgument, "p2p_retry_max_seconds must be between 0 and %d", maxUint32) | |
| } | |
| v := uint32(*req.Settings.P2pRetryMaxSeconds) | |
| returnSettings.P2pRetryMaxSeconds = &v | |
| } | |
| if req.Settings.RelayTimeoutSeconds != nil { | |
| if *req.Settings.RelayTimeoutSeconds < 0 || *req.Settings.RelayTimeoutSeconds > maxUint32 { | |
| return nil, status.Errorf(status.InvalidArgument, "relay_timeout_seconds must be between 0 and %d", maxUint32) | |
| } | |
| v := uint32(*req.Settings.RelayTimeoutSeconds) | |
| returnSettings.RelayTimeoutSeconds = &v | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@management/server/http/handlers/accounts/accounts_handler.go` around lines
229 - 240, The code casts int64 timeout fields from req.Settings
(P2pTimeoutSeconds, P2pRetryMaxSeconds, RelayTimeoutSeconds) directly to uint32
which can wrap on negative or too-large values; validate each non-nil
*req.Settings.* value is within 0..math.MaxUint32 before casting, and if out of
range return a validation error (HTTP 400) instead of assigning to
returnSettings.*; after the check perform v := uint32(*req.Settings.<Field>) and
set returnSettings.<Field> = &v.
| const ( | ||
| ModeUnspecified Mode = iota | ||
| ModeRelayForced | ||
| ModeP2P | ||
| ModeP2PLazy | ||
| ModeP2PDynamic | ||
| // ModeFollowServer is a client-side sentinel: setting this in the | ||
| // client config explicitly clears any local override so the | ||
| // server-pushed value (or its legacy fallback) is used. It MUST NOT | ||
| // be sent on the wire -- ToProto returns UNSPECIFIED for it. | ||
| ModeFollowServer | ||
| ) | ||
|
|
||
| // String returns the canonical lower-kebab-case name of the mode. | ||
| func (m Mode) String() string { | ||
| switch m { | ||
| case ModeRelayForced: | ||
| return "relay-forced" | ||
| case ModeP2P: | ||
| return "p2p" | ||
| case ModeP2PLazy: | ||
| return "p2p-lazy" | ||
| case ModeP2PDynamic: | ||
| return "p2p-dynamic" | ||
| case ModeFollowServer: | ||
| return "follow-server" | ||
| default: | ||
| return "" | ||
| } | ||
| } | ||
|
|
||
| // ParseString accepts the canonical name (case-insensitive, surrounding | ||
| // whitespace tolerated) and returns the corresponding Mode. Empty input | ||
| // returns ModeUnspecified with no error. Unknown input returns | ||
| // ModeUnspecified with an error. | ||
| func ParseString(s string) (Mode, error) { | ||
| switch strings.ToLower(strings.TrimSpace(s)) { | ||
| case "": | ||
| return ModeUnspecified, nil | ||
| case "relay-forced": | ||
| return ModeRelayForced, nil | ||
| case "p2p": | ||
| return ModeP2P, nil | ||
| case "p2p-lazy": | ||
| return ModeP2PLazy, nil | ||
| case "p2p-dynamic": | ||
| return ModeP2PDynamic, nil | ||
| case "follow-server": | ||
| return ModeFollowServer, nil | ||
| default: | ||
| return ModeUnspecified, fmt.Errorf("unknown connection mode %q", s) | ||
| } | ||
| } | ||
|
|
||
| // FromProto translates a proto enum value to the internal Mode. | ||
| func FromProto(m mgmProto.ConnectionMode) Mode { | ||
| switch m { | ||
| case mgmProto.ConnectionMode_CONNECTION_MODE_RELAY_FORCED: | ||
| return ModeRelayForced | ||
| case mgmProto.ConnectionMode_CONNECTION_MODE_P2P: | ||
| return ModeP2P | ||
| case mgmProto.ConnectionMode_CONNECTION_MODE_P2P_LAZY: | ||
| return ModeP2PLazy | ||
| case mgmProto.ConnectionMode_CONNECTION_MODE_P2P_DYNAMIC: | ||
| return ModeP2PDynamic | ||
| default: | ||
| return ModeUnspecified | ||
| } | ||
| } | ||
|
|
||
| // ToProto translates the internal Mode to a proto enum value. | ||
| // ModeFollowServer is a client-side concept and intentionally maps to | ||
| // UNSPECIFIED so it never appears on the wire. | ||
| func (m Mode) ToProto() mgmProto.ConnectionMode { | ||
| switch m { | ||
| case ModeRelayForced: | ||
| return mgmProto.ConnectionMode_CONNECTION_MODE_RELAY_FORCED | ||
| case ModeP2P: | ||
| return mgmProto.ConnectionMode_CONNECTION_MODE_P2P | ||
| case ModeP2PLazy: | ||
| return mgmProto.ConnectionMode_CONNECTION_MODE_P2P_LAZY | ||
| case ModeP2PDynamic: | ||
| return mgmProto.ConnectionMode_CONNECTION_MODE_P2P_DYNAMIC | ||
| default: | ||
| return mgmProto.ConnectionMode_CONNECTION_MODE_UNSPECIFIED | ||
| } | ||
| } | ||
|
|
||
| // ResolveLegacyLazyBool maps the historical Settings.LazyConnectionEnabled | ||
| // boolean to the new Mode. Used when a new client receives an old server's | ||
| // PeerConfig (ConnectionMode = UNSPECIFIED) or when the management server | ||
| // has no explicit Settings.ConnectionMode set yet. | ||
| func ResolveLegacyLazyBool(lazy bool) Mode { | ||
| if lazy { | ||
| return ModeP2PLazy | ||
| } | ||
| return ModeP2P | ||
| } | ||
|
|
||
| // ToLazyConnectionEnabled is the inverse mapping for backwards-compat. | ||
| // Used by toPeerConfig() so old clients (which only know the boolean) | ||
| // still get a sensible behaviour. | ||
| // | ||
| // Note: ModeRelayForced cannot be expressed via the legacy boolean and | ||
| // falls back to false. This is a structural compat gap documented in the | ||
| // release notes; admins must set NB_FORCE_RELAY=true on old clients | ||
| // or upgrade them. | ||
| func (m Mode) ToLazyConnectionEnabled() bool { | ||
| return m == ModeP2PLazy |
There was a problem hiding this comment.
p2p-dynamic-lazy is missing from the canonical mode table.
The supported mode list for #5989 includes p2p-dynamic-lazy, but this package can't represent it anywhere: String(), ParseString(), FromProto(), ToProto(), and the legacy lazy fallback all omit it. Any config carrying that mode will currently fail parsing or get downgraded during conversion.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@shared/connectionmode/mode.go` around lines 19 - 127, Add a new Mode constant
for "p2p-dynamic-lazy" and wire it through the conversions: define
ModeP2PDynamicLazy alongside other Mode constants, return "p2p-dynamic-lazy"
from Mode.String(), accept "p2p-dynamic-lazy" in ParseString(), map
mgmProto.ConnectionMode_CONNECTION_MODE_P2P_DYNAMIC_LAZY <-> ModeP2PDynamicLazy
in FromProto() and ToProto(), and make ModeP2PDynamicLazy return true from
ToLazyConnectionEnabled() (and ensure ResolveLegacyLazyBool behavior remains
correct). Update only the symbols ModeP2PDynamicLazy, String(), ParseString(),
FromProto(), ToProto(), ToLazyConnectionEnabled() to include this new mode and
keep ModeFollowServer behavior unchanged.
| connection_mode: | ||
| x-experimental: true | ||
| type: string | ||
| enum: [relay-forced, p2p, p2p-lazy, p2p-dynamic] |
There was a problem hiding this comment.
connection_mode enum is missing p2p-dynamic-lazy.
PR scope introduces p2p-dynamic-lazy, but the OpenAPI enum does not expose it, so REST clients cannot configure that mode through the account settings API.
Suggested fix
- enum: [relay-forced, p2p, p2p-lazy, p2p-dynamic]
+ enum: [relay-forced, p2p, p2p-lazy, p2p-dynamic, p2p-dynamic-lazy]📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| enum: [relay-forced, p2p, p2p-lazy, p2p-dynamic] | |
| enum: [relay-forced, p2p, p2p-lazy, p2p-dynamic, p2p-dynamic-lazy] |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@shared/management/http/api/openapi.yml` at line 365, The OpenAPI enum for
connection_mode is missing the new value; update the enum array under the
connection_mode schema to include "p2p-dynamic-lazy" alongside the existing
values (relay-forced, p2p, p2p-lazy, p2p-dynamic) so REST clients can set that
mode; ensure you add the exact string "p2p-dynamic-lazy" to the enum in the
openapi YAML and run any spec validation/generation steps you normally use.
| Phase 1 of issue #5989: relay-forced, p2p, and p2p-lazy are | ||
| functional. p2p-dynamic is reserved (passes through as p2p in | ||
| Phase 1; will become functional in Phase 2). | ||
| p2p_timeout_seconds: |
There was a problem hiding this comment.
Stale phase-gating text in public API description.
This description says p2p-dynamic is reserved and not functional yet, which is outdated for this PR and can mislead API consumers.
Suggested fix
- Phase 1 of issue `#5989`: relay-forced, p2p, and p2p-lazy are
- functional. p2p-dynamic is reserved (passes through as p2p in
- Phase 1; will become functional in Phase 2).
+ Supported values are relay-forced, p2p, p2p-lazy, p2p-dynamic,
+ and p2p-dynamic-lazy.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| Phase 1 of issue #5989: relay-forced, p2p, and p2p-lazy are | |
| functional. p2p-dynamic is reserved (passes through as p2p in | |
| Phase 1; will become functional in Phase 2). | |
| p2p_timeout_seconds: | |
| Supported values are relay-forced, p2p, p2p-lazy, p2p-dynamic, | |
| and p2p-dynamic-lazy. | |
| p2p_timeout_seconds: |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@shared/management/http/api/openapi.yml` around lines 370 - 373, Update the
stale documentation text in the OpenAPI description that mentions "Phase 1 of
issue `#5989`" and that "p2p-dynamic is reserved" so it no longer misleads
consumers; locate the block around the p2p_timeout_seconds schema entry in
openapi.yml (the paragraph starting with "Phase 1 of issue `#5989`: relay-forced,
p2p, and p2p-lazy are functional...") and replace or remove the phase-gating
sentence to reflect the current status (either remove the "p2p-dynamic is
reserved" clause or update it to state that p2p-dynamic is now functional),
keeping the rest of the p2p_timeout_seconds description consistent.
| // Defines values for AccountSettingsConnectionMode. | ||
| const ( | ||
| AccountSettingsConnectionModeP2p AccountSettingsConnectionMode = "p2p" | ||
| AccountSettingsConnectionModeP2pDynamic AccountSettingsConnectionMode = "p2p-dynamic" | ||
| AccountSettingsConnectionModeP2pLazy AccountSettingsConnectionMode = "p2p-lazy" | ||
| AccountSettingsConnectionModeRelayForced AccountSettingsConnectionMode = "relay-forced" | ||
| ) | ||
|
|
||
| // Valid indicates whether the value is a known member of the AccountSettingsConnectionMode enum. | ||
| func (e AccountSettingsConnectionMode) Valid() bool { | ||
| switch e { | ||
| case AccountSettingsConnectionModeP2p: | ||
| return true | ||
| case AccountSettingsConnectionModeP2pDynamic: | ||
| return true | ||
| case AccountSettingsConnectionModeP2pLazy: | ||
| return true | ||
| case AccountSettingsConnectionModeRelayForced: | ||
| return true | ||
| default: | ||
| return false | ||
| } | ||
| } |
There was a problem hiding this comment.
Add p2p-dynamic-lazy to the generated AccountSettingsConnectionMode enum.
The PR introduces p2p-dynamic-lazy, but this public enum and Valid() omit it. That means OpenAPI-generated clients cannot round-trip the full connection-mode set, and the handler-side req.Settings.ConnectionMode.Valid() check will reject that mode even if the backend supports it. Please add it in the OpenAPI source and regenerate this file.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@shared/management/http/api/types.gen.go` around lines 41 - 63, The generated
enum AccountSettingsConnectionMode is missing the new value "p2p-dynamic-lazy":
add a new constant named AccountSettingsConnectionModeP2pDynamicLazy with value
"p2p-dynamic-lazy" to the AccountSettingsConnectionMode block and include it in
the AccountSettingsConnectionMode.Valid() switch so the method returns true for
that value; update the OpenAPI source to include p2p-dynamic-lazy and regenerate
shared/management/http/api/types.gen.go so the generated code and Valid() check
are in sync.
…uest test setconfig_test.go reflection-tests (TestSetConfig_AllFieldsSaved + TestCLIFlags_MappedToSetConfig) tolerate the four new fields: ConnectionMode, P2PTimeoutSeconds, RelayTimeoutSeconds, P2PRetryMaxSeconds. They are in the proto so daemons can advertise them via GetConfig, but SetConfig RPC does not apply them at runtime today — the CLI sets them via 'netbird service install/reconfigure' writing the active profile file directly. Listed in the test maps so the structural drift-detector passes; wiring through SetConfig is a follow-up.
bad32d4 to
9ea349f
Compare
The pb.go regeneration on dev machines uses local protoc (v5.29.3 here) which causes the Proto Version Check workflow to flag the diff as a toolchain mismatch. Restore the upstream-main version strings (v6.33.1 for client/proto, v7.34.1 for shared/management/proto) — generated content matches; only the comment-line differs.
…rsion pin - conn_mgr.go: tagged switch on uint32 sentinel (QF1002 staticcheck) - engine.go: //nolint:staticcheck on the deprecated peer.IsForceRelayed call site - the function is intentionally retained for Phase-1 backwards compat with daemons that haven't migrated to ResolveModeFromEnv yet - lazyconn/manager.go: //nolint:staticcheck on the deprecated inactivity.NewManager fallback - kept on purpose for callers without resolved two-timer config (iceTimeout=0 && relayTimeout=0) - proxy_service.pb.go: pin protoc version header to upstream-main v7.34.1 (Proto Version Check workflow flagged it; only the comment-line differs)
|
There was a problem hiding this comment.
♻️ Duplicate comments (2)
client/internal/conn_mgr.go (1)
251-289:⚠️ Potential issue | 🟠 Major | ⚡ Quick winRestart the managed lifecycle when only the inactivity timeouts change.
relayTimeoutSecs/p2pTimeoutSecsare applied only ininitLazyManager(), but this branch only tears the manager down when the mode changes. A management push that keepsp2p-lazy/p2p-dynamicthe same and only changesrelay_timeout_secondsorp2p_timeout_secondsupdates the stored fields without rebuilding the live manager, so it keeps running with the previous thresholds until reconnect.♻️ Minimal direction
+ timeoutsChanged := newRelay != e.relayTimeoutSecs || newP2P != e.p2pTimeoutSecs - if modeChanged && wasManaged && isManaged { + if wasManaged && isManaged && (modeChanged || timeoutsChanged) { // Switching between lazy and dynamic at runtime: tear down the // existing manager so initLazyManager picks up the new timeouts. log.Infof("lazy/dynamic mode change %s -> %s, restarting manager", prev, newMode) e.closeManager(ctx) e.statusRecorder.UpdateLazyConnection(false) }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@client/internal/conn_mgr.go` around lines 251 - 289, The code updates relayTimeoutSecs and p2pTimeoutSecs but only restarts the lazy/dynamic manager when the mode changes, so an active manager (lazyConnMgr != nil) will keep old thresholds; modify the branch handling unchanged mode to detect when timeouts (relayTimeoutSecs or p2pTimeoutSecs or p2pRetryMaxSecs) changed while modeUsesLazyMgr(newMode) is true and lazyConnMgr != nil, and when they did: call e.closeManager(ctx), e.statusRecorder.UpdateLazyConnection(false), then re-init by calling e.initLazyManager(ctx), e.startModeSideEffects(), and return e.addPeersToLazyConnManager() so the running manager is recreated with the new timeouts (use existing helpers closeManager, initLazyManager, startModeSideEffects, addPeersToLazyConnManager and keep propagateP2pRetryMaxToConns for immediate propagation).client/internal/engine.go (1)
1585-1586:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftReconfigure live peers when the resolved mode changes.
Modeis still only copied intopeer.ConnConfigat construction time.UpdatedRemotePeerConfig()can changeConnMgr.modelater, but already-createdpeer.Conns will keep their oldOpen/AttachICE/DetachICEbehavior until each peer is rebuilt, so live switches likep2p-lazy -> p2p-dynamicorp2p -> relay-forcedwon't fully take effect.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@client/internal/engine.go` around lines 1585 - 1586, The code only sets Mode into peer.ConnConfig at construction (via e.connMgr.Mode()) so when UpdatedRemotePeerConfig() changes ConnMgr.mode live peers (peer.Conn) keep old Open/AttachICE/DetachICE behavior; update UpdatedRemotePeerConfig() to detect a mode change and reconfigure existing peers by iterating active peer.Conn instances and updating or rebuilding their ConnConfig (e.g., set the new Mode and P2pRetryMaxSeconds) or invoking a peer-level method to apply the new connection policy (e.g., peer.UpdateConnConfig / peer.RebuildConnections / peer.SetMode) so live switches like p2p-lazy ↔ p2p-dynamic or p2p ↔ relay-forced take effect immediately.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In `@client/internal/conn_mgr.go`:
- Around line 251-289: The code updates relayTimeoutSecs and p2pTimeoutSecs but
only restarts the lazy/dynamic manager when the mode changes, so an active
manager (lazyConnMgr != nil) will keep old thresholds; modify the branch
handling unchanged mode to detect when timeouts (relayTimeoutSecs or
p2pTimeoutSecs or p2pRetryMaxSecs) changed while modeUsesLazyMgr(newMode) is
true and lazyConnMgr != nil, and when they did: call e.closeManager(ctx),
e.statusRecorder.UpdateLazyConnection(false), then re-init by calling
e.initLazyManager(ctx), e.startModeSideEffects(), and return
e.addPeersToLazyConnManager() so the running manager is recreated with the new
timeouts (use existing helpers closeManager, initLazyManager,
startModeSideEffects, addPeersToLazyConnManager and keep
propagateP2pRetryMaxToConns for immediate propagation).
In `@client/internal/engine.go`:
- Around line 1585-1586: The code only sets Mode into peer.ConnConfig at
construction (via e.connMgr.Mode()) so when UpdatedRemotePeerConfig() changes
ConnMgr.mode live peers (peer.Conn) keep old Open/AttachICE/DetachICE behavior;
update UpdatedRemotePeerConfig() to detect a mode change and reconfigure
existing peers by iterating active peer.Conn instances and updating or
rebuilding their ConnConfig (e.g., set the new Mode and P2pRetryMaxSeconds) or
invoking a peer-level method to apply the new connection policy (e.g.,
peer.UpdateConnConfig / peer.RebuildConnections / peer.SetMode) so live switches
like p2p-lazy ↔ p2p-dynamic or p2p ↔ relay-forced take effect immediately.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: eb05b04e-fb0e-4e46-938d-a6555e15d67d
📒 Files selected for processing (3)
client/internal/conn_mgr.goclient/internal/engine.goclient/internal/lazyconn/manager/manager.go
🚧 Files skipped from review as they are similar to previous changes (1)
- client/internal/lazyconn/manager/manager.go




[client+management] Phase 1+2+3 of #5989: ConnectionMode foundation, two-timer lifecycle, iceBackoff
Branch:
pr/a-p2p-dynamic-foundation→ basemainCompare: https://github.com/netbirdio/netbird/compare/main...MichaelUray:netbird:pr/a-p2p-dynamic-foundation?expand=1
Summary
This PR is the foundation for the work proposed in issue #5989 ("Consolidate connection-mode flags; add p2p-dynamic and p2p-dynamic-lazy modes"). It introduces a single first-class
ConnectionModeenum on the wire and across Settings, replaces the asymmetricLazyConnectionEnabled/IsForceRelayedprecedence with that enum, adds the per-peer two-timer lifecycle (P2pTimeoutSecondsfor ICE-inactivity detach,RelayTimeoutSecondsfor full close), and lands the per-peericeBackoffStateso failed pair-checks don't loop forever (P2pRetryMaxSecondscontrols the cap).It supersedes the older PR #6047 (which only landed Phase 1 and is no longer mergeable).
Why
NetBird currently has two flags fighting for the same job:
LazyConnectionEnabled(account-wide bool) — eager vs. lazy initial connect.IsForceRelayed(per-peer hint) — skip ICE entirely.A user who wants "all my peers should start lazy, but on demand try P2P, and if P2P never comes up, fall back to relay" can't express that with two booleans. The proposed
ConnectionModeis one ofp2p,p2p-lazy,p2p-dynamic,p2p-dynamic-lazy,relay-forced. Each mode unambiguously selects one combination of (eager-vs-lazy, skip-ICE-yes-no, ICE-fallback-to-relay-yes-no).The
p2p-dynamicvariants additionally need two timers (one for ICE-only inactivity, one for full conn close) and an exponential ICE-failure backoff so a chronically-broken NAT path can't pin a peer to relay-with-stalled-retries forever.What's in this PR
Phase 1 — Wire + Settings + CLI plumbing
PeerConfig.ConnectionMode(enum) +P2pTimeoutSeconds,RelayTimeoutSeconds.AccountSettings.connection_mode+ the two timeouts; mgmt PUT handler accepts and audit-logs them.--connection-mode,--p2p-timeout,--relay-timeoutCLI flags + env-var bridge with deprecation warnings for the old flags.connectionmodepackage withModetype, proto bridge,ResolveModeFromEnv.Phase 2 — Lifecycle mechanism
Conn.AttachICE()/DetachICE()— register/remove the ICE listener on the handshaker on demand instead of atOpen().client/peer/conn.Open()defers ICE-listener registration in p2p-dynamic; the ICE worker is created but the dispatch is held back until the activity listener fires.client/lazyconn/inactivity(per-peer ICE-inactivity vs. relay-timeout, separate channels).conn_mgr.ActivatePeer/DeactivatePeerper-mode plumbing for ICE attach/detach.Phase 3 — Backoff
client/peer.iceBackoffStatetruncated exponential schedule;markFailure,markSuccess,ResetAPI.Conn.AttachICEno-ops while suspended; pion ICE state changes drivemarkFailure/markSuccess.conn_mgrresolvesP2pRetryMaxSecondsfrom server-pushedPeerConfigand propagates updates to active conns.netbird status -dshows the current backoff state (Failures, NextRetry, Suspended) per peer; line is suppressed whennextRetryhas passed.Tests
go test ./client/internal/peer ./client/internal/peer/guard ./client/internal/lazyconn/...— pass.go build ./client/... ./management/...— pass on linux/amd64, linux/arm64, windows/amd64.Test plan
iceBackoffState(initial, exponential, reset, suspended-expired, max-cap, disabled-when-zero, grace-after-reset, no-grace-without-reset).mgmt/grpctests covertoPeerConfigmode resolution paths.bash client/proto/generate.sh,bash shared/management/proto/generate.sh).LazyConnectionEnabledstill honoured by older daemons).Use case
Real-world deployment: a fleet of OpenWrt routers + a small number of always-on Linux/Windows boxes. For roaming/laptop peers we want strict relay (
relay-forced) for predictable latency; for the routers we wantp2p-dynamicso the daemon attempts P2P on demand but doesn't hold an ICE pair when the link is quiet. With the current two-flag approach this configuration is impossible from the dashboard.Linked work
What's not here (follow-up PRs)
The PRs are stacked: B's base = A's branch; C's base = B; D's base = C. Reviewing A first lets the foundation land standalone.
Maintainers are welcome to push directly to this branch.
Summary by CodeRabbit
Documentation
These changes are internal lifecycle / behavioural improvements; no user-visible API or CLI flag added that warrants new public docs. Existing flags/Settings already documented at netbirdio/docs cover the surface area.