[client, management] Phase 3.7i of #5989: peer-status visibility — RemotePeerConfig + conn-state pusher + Peers UI (stack 3/4) #6083
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds connection-mode and timeout controls (env/CLI/profile/server precedence), per-peer ICE backoff and two-timer inactivity, a conn-state pusher with SyncPeerConnections RPC + in-memory store/router, expanded status/proto fields, engine→management wiring for effective config and snapshots, UI/Android exposure, persistence, and tests. ChangesConnection-mode, per-peer state, and sync (single DAG)
Sequence Diagram(s)sequenceDiagram
participant Engine as Engine
participant ConnMgr as ConnMgr
participant Status as Status/PeerRecorder
participant Grpc as GrpcClient
participant MGM as Management
participant Store as PeerConnStore
Note over Engine,ConnMgr: resolve mode & timeouts
Engine->>ConnMgr: resolveConnectionMode(env,cfg,server)
ConnMgr-->>Engine: Mode/Timeouts/P2PRetryMax
Note over Status,Engine: peer events & deltas
Status->>Engine: Peer state changes / ICE events
Engine->>Engine: connStatePusher computes delta/full
Engine->>Grpc: SyncPeerConnections(encrypted PeerConnectionMap)
Grpc->>MGM: SyncPeerConnections RPC
MGM->>Store: Persist PeerConnectionMap
sequenceDiagram
participant UI as Desktop UI
participant Daemon as Daemon/Engine
participant Profile as ProfileManager
participant MGM as Management
UI->>Daemon: SetConfigRequest(connection-mode/timeouts)
Daemon->>Profile: applyConnectionModeFlagsToProfile
Daemon->>Daemon: ConnMgr applies local overrides
MGM->>Daemon: UpdatedRemotePeerConfig (server-pushed defaults)
Daemon->>UI: GetConfigResponse(server_pushed_*)
UI->>UI: Render server hints and user overrides
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related issues
Possibly related PRs
Suggested reviewers
✨ Finishing Touches🧪 Generate unit tests (beta)
|
There was a problem hiding this comment.
Actionable comments posted: 18
Note
Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (5)
client/internal/peer/worker_ice.go (1)
549-563:⚠️ Potential issue | 🟠 Major | ⚡ Quick winGuard
lastKnownStatewithmuxAgentin the state callback.
IsConnected()readslastKnownStateundermuxAgent, but this callback writes it without the same lock. That is a real data race, and it can make network-change handling observe stale connection state.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@client/internal/peer/worker_ice.go` around lines 549 - 563, The callback updates w.lastKnownState without holding w.muxAgent while IsConnected() reads it under that lock, causing a data race; fix by acquiring the same mutex (w.muxAgent) around all reads/writes of lastKnownState inside the state callback (the branch handling ice.ConnectionStateConnected and the branch handling ice.ConnectionStateFailed/Disconnected/Closed) so the assignments to w.lastKnownState and the subsequent calls that depend on it (e.g., w.logSuccessfulPaths, w.conn.onICEConnected, w.conn.onICEStateDisconnected, and the sessionChanged handling from w.closeAgent) are performed while holding the lock to match IsConnected()'s protection.client/status/status.go (1)
605-659:⚠️ Potential issue | 🟠 Major | ⚡ Quick winKeep this duplicate
FullStatus -> protomapper in sync with the expanded schema.
daemon.protonow includes aggregate counters plus the per-peerserver_online/ remote-mode / timeout fields, andclient/internal/peer/status.goalready serializes them. This copy only emits the legacy fields plus ICE backoff, so any path that goes throughstatus.ToProtoFullStatus()will silently drop the new Peers-tab metadata.Suggested fix
pbFullStatus.NumberOfForwardingRules = int32(fullStatus.NumOfForwardingRules) pbFullStatus.LazyConnectionEnabled = fullStatus.LazyConnectionEnabled + pbFullStatus.ConfiguredPeersTotal = fullStatus.ConfiguredPeersTotal + pbFullStatus.ServerOnlinePeers = fullStatus.ServerOnlinePeers + pbFullStatus.P2PConnectedPeers = fullStatus.P2PConnectedPeers + pbFullStatus.RelayedConnectedPeers = fullStatus.RelayedConnectedPeers + pbFullStatus.IdleOnlinePeers = fullStatus.IdleOnlinePeers + pbFullStatus.ServerOfflinePeers = fullStatus.ServerOfflinePeers for _, peerState := range fullStatus.Peers { pbPeerState := &proto.PeerState{ IP: peerState.IP, PubKey: peerState.PubKey, @@ SshHostKey: peerState.SSHHostKey, IceBackoffFailures: int32(peerState.IceBackoffFailures), IceBackoffNextRetry: timestamppb.New(peerState.IceBackoffNextRetry), IceBackoffSuspended: peerState.IceBackoffSuspended, + ServerOnline: peerState.ServerOnline, + Groups: peerState.RemoteGroups, + EffectiveConnectionMode: peerState.RemoteEffectiveConnectionMode, + ConfiguredConnectionMode: peerState.RemoteConfiguredConnectionMode, + EffectiveRelayTimeoutSecs: peerState.RemoteEffectiveRelayTimeoutSecs, + EffectiveP2PTimeoutSecs: peerState.RemoteEffectiveP2PTimeoutSecs, + EffectiveP2PRetryMaxSecs: peerState.RemoteEffectiveP2PRetryMaxSecs, + ConfiguredRelayTimeoutSecs: peerState.RemoteConfiguredRelayTimeoutSecs, + ConfiguredP2PTimeoutSecs: peerState.RemoteConfiguredP2PTimeoutSecs, + ConfiguredP2PRetryMaxSecs: peerState.RemoteConfiguredP2PRetryMaxSecs, } + if !peerState.RemoteLastSeenAtServer.IsZero() { + pbPeerState.LastSeenAtServer = timestamppb.New(peerState.RemoteLastSeenAtServer) + } pbFullStatus.Peers = append(pbFullStatus.Peers, pbPeerState) }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@client/status/status.go` around lines 605 - 659, ToProtoFullStatus currently maps only the legacy Peer fields and ICE backoff, so it drops newly added per-peer fields (server_online, remote_mode, timeout) and aggregate counters from the expanded daemon.proto; update ToProtoFullStatus (in client/status/status.go) to copy the same fields that client/internal/peer/status.go emits for each peer and the top-level aggregate counters from peer.FullStatus (e.g., add mapping for PeerState.ServerOnline, PeerState.RemoteMode, PeerState.Timeout and set pbFullStatus aggregate counters like NumPeersOnline/NumRemote/NumTimedOut or whatever aggregate names exist on peer.FullStatus), ensuring pbPeerState includes the new proto fields and pbFullStatus includes the new counters so paths using ToProtoFullStatus no longer drop the new metadata.management/server/account.go (1)
332-337:⚠️ Potential issue | 🟠 Major | ⚡ Quick winPush connection-mode setting changes through the peer update path.
These new settings are now audited, but they still don't participate in the
updateAccountPeerscondition above. ChangingConnectionMode,RelayTimeoutSeconds,P2pTimeoutSeconds, orP2pRetryMaxSecondswill therefore persist the account settings without bumping the network serial or notifying connected peers, so clients can keep running with stale effective config until some unrelated update happens.Suggested fix
- if oldSettings.RoutingPeerDNSResolutionEnabled != newSettings.RoutingPeerDNSResolutionEnabled || - oldSettings.LazyConnectionEnabled != newSettings.LazyConnectionEnabled || - oldSettings.DNSDomain != newSettings.DNSDomain || - oldSettings.AutoUpdateVersion != newSettings.AutoUpdateVersion || - oldSettings.AutoUpdateAlways != newSettings.AutoUpdateAlways { + if oldSettings.RoutingPeerDNSResolutionEnabled != newSettings.RoutingPeerDNSResolutionEnabled || + oldSettings.LazyConnectionEnabled != newSettings.LazyConnectionEnabled || + oldSettings.DNSDomain != newSettings.DNSDomain || + oldSettings.AutoUpdateVersion != newSettings.AutoUpdateVersion || + oldSettings.AutoUpdateAlways != newSettings.AutoUpdateAlways || + !equalStringPtr(oldSettings.ConnectionMode, newSettings.ConnectionMode) || + !equalUint32Ptr(oldSettings.RelayTimeoutSeconds, newSettings.RelayTimeoutSeconds) || + !equalUint32Ptr(oldSettings.P2pTimeoutSeconds, newSettings.P2pTimeoutSeconds) || + !equalUint32Ptr(oldSettings.P2pRetryMaxSeconds, newSettings.P2pRetryMaxSeconds) { updateAccountPeers = true }Also applies to: 372-375
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@management/server/account.go` around lines 332 - 337, The current comparison that sets updateAccountPeers (comparing oldSettings vs newSettings) misses connection-mode related fields so changes to ConnectionMode, RelayTimeoutSeconds, P2pTimeoutSeconds, and P2pRetryMaxSeconds do not trigger the peer update path; update the conditional that sets updateAccountPeers (the block comparing oldSettings.RoutingPeerDNSResolutionEnabled, LazyConnectionEnabled, DNSDomain, AutoUpdateVersion, AutoUpdateAlways) to also compare oldSettings.ConnectionMode, oldSettings.RelayTimeoutSeconds, oldSettings.P2pTimeoutSeconds, and oldSettings.P2pRetryMaxSeconds against newSettings and set updateAccountPeers = true when any of those differ (and make the same change in the other comparable conditional later in the same file that does the peer-update decision).client/ui/client_ui.go (1)
1944-1945:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winUse the popup-scoped context for the status poll.
The goroutine is meant to be cancelled by the
CancelFuncreturned fromshowLoginURL, but this RPC usess.ctx. If the daemon stalls here, closing the popup will not cancel the in-flight poll.Suggested fix
- status, err := conn.Status(s.ctx, &proto.StatusRequest{}) + pollCtx, cancel := context.WithTimeout(ctx, failFastTimeout) + status, err := conn.Status(pollCtx, &proto.StatusRequest{}) + cancel() if err != nil { continue }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@client/ui/client_ui.go` around lines 1944 - 1945, The status RPC is using the long-lived s.ctx instead of the popup-scoped context returned from showLoginURL, so cancelling the popup doesn't cancel the in-flight conn.Status call; change the call to use the popup's context (the ctx returned alongside the CancelFunc from showLoginURL) instead of s.ctx (i.e., replace conn.Status(s.ctx, ...) with conn.Status(popupCtx, ...) or the local ctx variable tied to the CancelFunc), ensuring the goroutine is canceled when the popup is closed.client/cmd/up.go (1)
438-453:⚠️ Potential issue | 🟠 Major | ⚡ Quick winDisallow mixing
--connection-modewith the legacy lazy-connection flag.This builder can now send both
LazyConnectionEnabledandConnectionModein the same request. Modes likep2p-lazy/p2p-dynamicalready encode the lazy-connection behavior, sonetbird up --enable-lazy-connection ... --connection-mode ...produces conflicting instructions and leaves precedence to the daemon. The same ambiguity is repeated insetupConfigandsetupLoginRequestbelow.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@client/cmd/up.go` around lines 438 - 453, Detect when both enableLazyConnectionFlag and connectionModeFlag are provided and fail fast: in the request-building code that sets req.LazyConnectionEnabled and req.ConnectionMode (and in the analogous setupConfig and setupLoginRequest functions), check if cmd.Flag(enableLazyConnectionFlag).Changed && cmd.Flag(connectionModeFlag).Changed and return an error (or print a user-facing message and exit) explaining that --enable-lazy-connection cannot be combined with --connection-mode because connection modes encode lazy behavior; do this before mutating req so neither field is set when the flags conflict.
🟡 Minor comments (14)
client/internal/debouncer/debouncer.go-28-53 (1)
28-53:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winGeneration guard against
time.AfterFunc+Stoprace.There is a small window where
Trigger(orStop) cannot actually cancel the previously scheduled callback: once the runtime has started the AfterFunc goroutine but before it acquiresd.mu,d.timer.Stop()returnsfalseand the goroutine will still proceed to readd.fnand invoke it. Concretely:
Trigger(A)schedulesT1.T1fires → goroutineG1is spawned but not yet ond.mu.Trigger(B)acquiresmu, setsd.fn = B,T1.Stop()returns false, schedulesT2.G1acquiresmu, readsd.fn = B, invokesB().- Later
T2fires and invokesB()again.
Stop()has the same issue: a callback already past the timer-fire boundary will still run becaused.fnis not cleared.For the SyncMeta caller this only causes an extra push, but it does defeat the debounce in racy windows. A generation counter fixes both cases:
♻️ Proposed fix using a generation counter
type Debouncer struct { delay time.Duration mu sync.Mutex timer *time.Timer fn func() + gen uint64 } func (d *Debouncer) Trigger(fn func()) { d.mu.Lock() defer d.mu.Unlock() d.fn = fn if d.timer != nil { d.timer.Stop() } + d.gen++ + myGen := d.gen d.timer = time.AfterFunc(d.delay, func() { d.mu.Lock() + if myGen != d.gen { + d.mu.Unlock() + return + } f := d.fn + d.fn = nil d.mu.Unlock() if f != nil { f() } }) } func (d *Debouncer) Stop() { d.mu.Lock() defer d.mu.Unlock() if d.timer != nil { d.timer.Stop() d.timer = nil } + d.gen++ + d.fn = nil }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@client/internal/debouncer/debouncer.go` around lines 28 - 53, The Trigger/Stop race allows an AfterFunc goroutine that has already started to read a stale d.fn and invoke it; fix by adding a generation counter on Debouncer (e.g., gen uint64) and use it to validate callbacks: increment d.gen whenever you replace/clear d.fn (in Trigger before scheduling and in Stop to cancel), capture the current generation into a local variable when creating time.AfterFunc, and inside the AfterFunc acquire d.mu and compare the captured generation to d.gen — only call the captured fn if generations match and the fn is still non-nil; keep existing mutex usage for d.mu, and still call d.timer.Stop() as before but rely on the generation check to prevent the "already fired" goroutine from executing a stale fn.client/internal/peer/ice_backoff.go-137-155 (1)
137-155:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winReset stale retry metadata on success/reset.
markSuccess()andReset()leavenextRetryuntouched.Snapshot()will then keep exposing an old retry timestamp even thoughSuspendedis false, which can leak stale data into the new peer-status surfaces.Suggested fix
func (s *iceBackoffState) markSuccess() { s.mu.Lock() defer s.mu.Unlock() s.failures = 0 s.suspended = false + s.nextRetry = time.Time{} s.bo.Reset() } @@ func (s *iceBackoffState) Reset() { s.mu.Lock() defer s.mu.Unlock() s.failures = 0 s.suspended = false + s.nextRetry = time.Time{} s.bo.Reset() s.lastResetAt = time.Now() }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@client/internal/peer/ice_backoff.go` around lines 137 - 155, markSuccess() and Reset() currently clear failures/suspended and backoff but leave nextRetry untouched, so Snapshot() can expose stale retry timestamps; update iceBackoffState.markSuccess and iceBackoffState.Reset to also clear/reset the nextRetry field (e.g., set nextRetry to the zero time) after resetting bo and suspended so that Snapshot() no longer returns stale retry metadata; ensure you modify both methods that update lastResetAt (Reset) and the success path (markSuccess) to reset nextRetry consistently.client/internal/peer/env_test.go-25-25 (1)
25-25:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winFix the misspelling in this test case name.
unparseabletrips thecodespellcheck here; changing it tounparsableshould clear the failure.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@client/internal/peer/env_test.go` at line 25, Rename the test case name string "connection_mode unparseable falls through to legacy" to "connection_mode unparsable falls through to legacy" in the test table (the literal present in the slice/array entry) so the misspelling "unparseable" is corrected to "unparsable" and the codespell check will pass.client/internal/conn_state_pusher.go-104-112 (1)
104-112:⚠️ Potential issue | 🟡 Minor | ⚡ Quick win
OnSnapshotRequestdoes not actually coalesce — doc claim is wrong.The comment says "multiple requests in flight result in a single full snapshot with the latest nonce echoed", but the loop reads one nonce per iteration (line 174) and emits a full snapshot per nonce. With buffer size 4, a burst of refresh requests results in up to four back-to-back full snapshots, each echoing a different (older-first) nonce, instead of one snapshot with the latest. Mgmt's store handles it (newer seq wins), but it's wasteful and the doc misleads anyone reasoning about it.
Either fix the doc or actually coalesce by draining the queue (mirrors the events drain pattern at lines 161–169):
🔧 Drain queued nonces and emit one snapshot
case nonce := <-p.snapshotReq: + // Coalesce any additional refresh requests buffered behind us; + // only the latest nonce needs to be echoed back to mgmt. + drain := true + for drain { + select { + case n2 := <-p.snapshotReq: + if n2 > nonce { + nonce = n2 + } + default: + drain = false + } + } if p.source != nil { p.flushFull(p.source.SnapshotAllRemotePeers(), nonce) }Also applies to: 174-180
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@client/internal/conn_state_pusher.go` around lines 104 - 112, The comment claims coalescing but the processing loop currently emits one snapshot per queued nonce; update the processing logic in connStatePusher to actually coalesce by draining p.snapshotReq: when you receive a nonce from p.snapshotReq (the same queue written by OnSnapshotRequest), run a non-blocking loop to read and discard any additional pending nonces, keeping the latest value, then emit a single full snapshot with that latest nonce (mirror the existing events-drain pattern used elsewhere); reference connStatePusher, OnSnapshotRequest, and p.snapshotReq when locating where to implement the drain-and-keep-latest change.client/internal/conn_state_pusher.go-234-253 (1)
234-253:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winThe
flushFull"dirty-retain" comment overstates what happens after a failed full snapshot.The comment claims "the next push (or the next snapshot request) will see every peer as dirty". That's not what
computeDeltaFromSourcedoes — it diffs the latest source snapshot againstp.lastPushed, andlastPushedstill contains whatever was successfully pushed before this failed full snapshot. So if a full snapshot fails:
- Peers whose state genuinely changed since the last successful push will be re-included in the next delta tick (good).
- Peers whose state hasn't changed since the last successful delta/full will not be re-pushed — they still match
lastPushedandisMaterialChangereturns false.If the intent is "a failed full snapshot causes the next tick to re-send the entire snapshot to repair mgmt", you'd need to either invalidate
lastPushedon full-snapshot failure, or set a flag that forces the next tick to callflushFullinstead offlushDelta. Otherwise please tone the comment down to match reality:📝 More accurate comment
}); err != nil { - // Same dirty-retain semantics as flushDelta. A failed full - // snapshot leaves lastPushed unchanged so the next push (or - // the next snapshot request) will see every peer as dirty. + // Leave lastPushed unchanged so peers that have changed since + // the previous successful push remain dirty on the next tick. + // NOTE: peers that haven't changed will NOT be retransmitted — + // mgmt's view of those peers stays stale until either another + // flushFull succeeds or those peers change. Consider invalidating + // lastPushed here if full mgmt-side repair is required. return🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@client/internal/conn_state_pusher.go` around lines 234 - 253, The comment in flushFull overstates behavior: when a full snapshot Push fails, p.lastPushed is left unchanged so computeDeltaFromSource will only re-include peers whose current state differs from p.lastPushed (isMaterialChange), not "every peer"; update the comment in connStatePusher.flushFull to state this exact behavior and note two alternatives if the intent is to force a repair: either invalidate p.lastPushed (so every peer looks dirty) or set a flag to force the next tick to call flushFull instead of flushDelta; reference the functions/fields flushFull, computeDeltaFromSource, p.lastPushed, isMaterialChange, and flushDelta so readers can find the relevant logic.management/server/peer_connections/store.go-73-101 (1)
73-101:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winFull snapshot path drops
prev.InResponseToNoncewhen the new snapshot's nonce is 0.The merge path on line 83-85 is careful to preserve the latest non-zero
InResponseToNonceso a refresh poller (GET ?since=N) doesn't lose track of an in-flight nonce when a delta withInResponseToNonce=0arrives. The full-snapshot path on line 73 just clonesmand replaces, so a follow-up full snapshot from the client (e.g., reconnect / initial-after-restart) withInResponseToNonce=0overwrites a previously-stored nonce=N. Any client that was polling?since=Nwill then see GetWithNonceCheck refuse the entry until the next refresh request lands.Apply the same "keep latest non-zero nonce" rule on the replacement path:
🔧 Suggested fix
stored := proto.Clone(m).(*mgmProto.PeerConnectionMap) - if !m.GetFullSnapshot() && prev != nil { + if m.GetFullSnapshot() && prev != nil { + // Preserve a previously-recorded refresh nonce when the new + // full snapshot wasn't pushed in response to a refresh request. + if stored.GetInResponseToNonce() == 0 && prev.m.GetInResponseToNonce() > 0 { + stored.InResponseToNonce = prev.m.GetInResponseToNonce() + } + } else if prev != nil { merged := proto.Clone(prev.m).(*mgmProto.PeerConnectionMap)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@management/server/peer_connections/store.go` around lines 73 - 101, The full-snapshot path currently overwrites a previously-stored non-zero InResponseToNonce with a new snapshot that has InResponseToNonce==0; after cloning m into stored (stored := proto.Clone(m)...), if prev != nil and stored.GetInResponseToNonce() == 0 and prev.m.GetInResponseToNonce() > 0 then copy the nonce from prev (stored.InResponseToNonce = prev.m.GetInResponseToNonce()); keep this check just before s.maps[peerPubKey] = &memEntry{...} so stored retains the latest non-zero nonce while leaving other fields from m intact.shared/management/http/api/types.gen.go-1482-1487 (1)
1482-1487:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winUpdate the generated
connection_modedocs to match current behavior.These comments still say
p2p-dynamicis reserved and behaves likep2pin “Phase 1”, but this PR already exposes dynamic-mode-specific timeout/backoff settings. The published OpenAPI/SDK docs will advertise obsolete semantics unless the source schema description is refreshed.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@shared/management/http/api/types.gen.go` around lines 1482 - 1487, The generated doc comment for the ConnectionMode field is outdated; update the comment on ConnectionMode (*AccountSettingsConnectionMode) to remove the "reserved" / "Phase 1" language and instead state the current behavior: that p2p-dynamic is supported and exposes dynamic-mode-specific timeout and backoff settings (so it no longer just falls back to p2p), while NULL still falls back to lazy_connection_enabled for backwards compatibility; ensure the new description briefly names the supported modes (relay-forced, p2p, p2p-lazy, p2p-dynamic) and mentions where dynamic-specific settings apply so OpenAPI/SDK docs reflect current semantics.client/cmd/service.go-60-76 (1)
60-76:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winThese install/reconfigure flags won't be populated from
NB_/WT_env vars.They are registered on subcommands, but
SetFlagsFromEnvVarsonly walkscmd.Root().PersistentFlags(). Headless installs that rely on environment-based config will silently ignoreNB_CONNECTION_MODE,NB_RELAY_TIMEOUT,NB_P2P_TIMEOUT, andNB_P2P_RETRY_MAXunless the env loader is updated to visit the active command's flags too.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@client/cmd/service.go` around lines 60 - 76, The install/reconfigure flags (connectionModeFlag, relayTimeoutFlag, p2pTimeoutFlag, p2pRetryMaxFlag) are registered only on installCmd/reconfigureCmd so SetFlagsFromEnvVars doesn't pick up NB_/WT_ env vars; either make these flags persistent on the root command or update SetFlagsFromEnvVars to also iterate the active command's local flags (visit installCmd.Flags() and reconfigureCmd.Flags() or the current command's Flags()) so NB_CONNECTION_MODE, NB_RELAY_TIMEOUT, NB_P2P_TIMEOUT and NB_P2P_RETRY_MAX are loaded from environment when running install/reconfigure; update references in SetFlagsFromEnvVars and ensure the env-to-flag mapping logic still uses the same flag names/vars.management/server/http/handlers/peer_connections/handler_test.go-40-45 (1)
40-45:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winMake the fake enforce
accountID.This stub ignores the new
accountIDparameter, so these tests would still pass if the handler accidentally enriched entries with peer data from another account. Checkingp.AccountIDhere keeps the test double aligned with the production contract.Suggested fix
func (a *fakeAM) GetPeerByPubKey(_ context.Context, _, pubKey string) (*nbpeer.Peer, error) { p, ok := a.peersByKey[pubKey] - if !ok { + if !ok { return nil, errors.New("not found") } + if p.AccountID != accountID { + return nil, errors.New("not found") + } return p, nil }-func (a *fakeAM) GetPeerByPubKey(_ context.Context, _, pubKey string) (*nbpeer.Peer, error) { +func (a *fakeAM) GetPeerByPubKey(_ context.Context, accountID, pubKey string) (*nbpeer.Peer, error) {🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@management/server/http/handlers/peer_connections/handler_test.go` around lines 40 - 45, The fake GetPeerByPubKey stub currently ignores the accountID parameter; update its signature to name the account id parameter (e.g., accountID string) and enforce it by checking that the found peer's AccountID matches the passed accountID, returning the "not found" error if there is no peer or the AccountID mismatches; keep the rest of the lookup logic (lookup by pubKey in a.peersByKey and return p,nil when both pubKey exists and p.AccountID==accountID) so the test double mirrors production behavior.management/server/peer/peer.go-142-147 (1)
142-147:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winTreat runtime-config-only metadata as non-empty.
UpdateMetaIfNew()exits early whenmeta.isEmpty(). SinceisEmpty()still ignoresEffectiveConnectionModeand the three timeout fields, an update that only changes these new runtime values is discarded beforeisEqual()ever runs.Suggested fix
func (p PeerSystemMeta) isEmpty() bool { return p.Hostname == "" && p.GoOS == "" && p.Kernel == "" && p.Core == "" && p.Platform == "" && p.OS == "" && p.OSVersion == "" && p.WtVersion == "" && p.UIVersion == "" && p.KernelVersion == "" && len(p.NetworkAddresses) == 0 && p.SystemSerialNumber == "" && p.SystemProductName == "" && p.SystemManufacturer == "" && p.Environment.Cloud == "" && p.Environment.Platform == "" && + p.EffectiveConnectionMode == "" && + p.EffectiveRelayTimeoutSecs == 0 && + p.EffectiveP2PTimeoutSecs == 0 && + p.EffectiveP2PRetryMaxSecs == 0 && len(p.Files) == 0 }Also applies to: 256-272
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@management/server/peer/peer.go` around lines 142 - 147, UpdateMetaIfNew() currently returns early when meta.isEmpty(), but isEmpty() ignores the new runtime-only fields (EffectiveConnectionMode, EffectiveRelayTimeoutSecs, EffectiveP2PTimeoutSecs, EffectiveP2PRetryMaxSecs) so runtime-only updates are dropped; fix by updating the isEmpty() implementation (or its use in UpdateMetaIfNew()) to treat those four fields as making meta non-empty (i.e., consider EffectiveConnectionMode and the three Effective* timeout fields when deciding emptiness) so that UpdateMetaIfNew() proceeds to isEqual() and applies runtime-only metadata changes.client/ui/peers_tab.go-45-57 (1)
45-57:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winClear the previous snapshot when refresh fails.
Both error paths only update
summary.breakdownand the peer list keep the last successful snapshot, so the tab can show stale peer state underneath an error banner. Clear the old content in the same UI update so the screen can't be read as current data.Suggested fix
conn, err := s.getSrvClient(failFastTimeout) if err != nil { - fyne.Do(func() { summary.SetText("Error: " + err.Error()) }) + fyne.Do(func() { + summary.SetText("Error: " + err.Error()) + breakdown.SetText("") + listVBox.Objects = nil + listVBox.Refresh() + }) return } @@ st, err := conn.Status(callCtx, &proto.StatusRequest{GetFullPeerStatus: true}) if err != nil { - fyne.Do(func() { summary.SetText("Error: " + err.Error()) }) + fyne.Do(func() { + summary.SetText("Error: " + err.Error()) + breakdown.SetText("") + listVBox.Objects = nil + listVBox.Refresh() + }) return }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@client/ui/peers_tab.go` around lines 45 - 57, When a refresh fails in the render closure (errors from s.getSrvClient or conn.Status), clear the old snapshot in the same fyne.Do UI update that sets summary so stale information isn't shown: in the error branches update summary.SetText(...) AND also clear the breakdown widget (breakdown.SetText("") or equivalent) and clear/refresh the peers list UI (e.g., peersList.Clear() or peersContainer.Objects = nil followed by Refresh()) so both breakdown and peer list are emptied when an error occurs.management/server/http/handlers/accounts/accounts_handler.go-218-228 (1)
218-228:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winMisleading comment about null-clearing semantics for
ConnectionMode.The comment (lines 223–225) says "Clients clear an override by sending JSON null" but then immediately says "leaving the existing value untouched" — these two statements are contradictory. "Clearing an override" and "leaving the value untouched" are opposite outcomes.
The actual runtime behavior depends on whether
UpdateAccountSettingsperforms a full replace or a partial merge ofnilfields. If it's a full replace, null propagates as nil and does clear the override. If it's a merge, null is ignored and the old value persists. The comment obscures this distinction and will mislead future maintainers about the expected contract.Consider replacing with a comment that accurately describes whichever semantic is actually implemented.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@management/server/http/handlers/accounts/accounts_handler.go` around lines 218 - 228, The comment about "Clients clear an override by sending JSON null" is contradictory to "leaving the existing value untouched" — inspect the update flow (the UpdateAccountSettings implementation) to determine whether req.Settings.ConnectionMode being nil means "ignore/merge" or whether a JSON null value is propagated to clear the field; then replace the misleading comment above the block that references req.Settings.ConnectionMode and returnSettings.ConnectionMode with a concise, accurate statement that describes the actual behavior (e.g., "A JSON null for ConnectionMode is treated as X by UpdateAccountSettings, so this branch only runs when a non-null value is provided" or "A JSON null clears the stored value; this branch only handles non-null overrides"), and remove the contradictory language.client/server/server.go-1518-1538 (1)
1518-1538:⚠️ Potential issue | 🟡 Minor | ⚡ Quick win
spP2pRetMaxis never zero whencm != nil, contradicting the inline comment.
ServerPushedP2pRetryMaxSecs()always returns a non-zero value: the pushed value when available, otherwisepeer.DefaultP2PRetryMax / time.Second. The other three accessors (ServerPushedRelayTimeoutSecs,ServerPushedP2pTimeoutSecs,ServerPushedMode) return their zero value when the server hasn't pushed anything yet.The comment on line 1521–1522 says "All zero/empty when the engine has not received PeerConfig yet", which is false for
spP2pRetMax. Any UI consumer ofServerPushedP2PRetryMaxSecondscannot distinguish "server explicitly set this" from "this is just the client default", whereas it can make that distinction for the other three fields.Either the accessor should drop the default fallback (return 0 when unpushed), or the comment and UI handling need to be updated to account for this field always carrying a non-zero value.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@client/server/server.go` around lines 1518 - 1538, The comment claims all four sp* values are zero/empty before PeerConfig arrives but spP2pRetMax never is because ConnMgr.ServerPushedP2pRetryMaxSecs() currently falls back to peer.DefaultP2PRetryMax; fix this by changing ServerPushedP2pRetryMaxSecs to return 0 when no server value was pushed (matching ServerPushedRelayTimeoutSecs/ServerPushedP2pTimeoutSecs/ServerPushedMode semantics) so spP2pRetMax will be zero when unpushed, and keep callers (e.g., the block that sets spP2pRetMax) unchanged. Ensure the ConnMgr method's signature/contract and any tests reflect the new zero-return behavior.client/ui/network.go-97-103 (1)
97-103:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winClear the tab being left, not the newly selected tab.
OnUnselectedresolves the grid throughtabs.Selected(), so when switching from Overlapping/Exit-node to Peers it can clear the wrong grid. Pick the grid fromitem.Textinstead.Suggested fix
tabs.OnUnselected = func(item *container.TabItem) { // Only reset network grids when leaving a network tab; the // peers VBox manages its own state. if item != nil && item.Text != peersText { - grid, _ := getGridAndFilterFromTab(tabs, allGrid, overlappingGrid, exitNodeGrid) + var grid *fyne.Container + switch item.Text { + case overlappingNetworksText: + grid = overlappingGrid + case exitNodeNetworksText: + grid = exitNodeGrid + default: + grid = allGrid + } grid.Objects = nil } }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@client/ui/network.go` around lines 97 - 103, OnUnselected currently uses tabs.Selected() when resolving which grid to clear and thus can clear the newly selected tab's grid; change the logic so the grid is picked based on the leaving tab's identifier (item.Text) instead. In the tabs.OnUnselected handler (symbol: OnUnselected) call or update getGridAndFilterFromTab to accept the leaving tab text (item.Text) and return the corresponding grid among allGrid, overlappingGrid, exitNodeGrid (and ignore peersText), then set that grid's Objects = nil; ensure peersText is still treated as a no-op so the peers tab never gets cleared.
🧹 Nitpick comments (8)
management/server/store/sql_store.go (1)
1502-1571: ⚡ Quick winAdd a pgx regression test for these nullable settings fields.
This path manually keeps the
SELECTlist,Scan(...)order, and assignments in sync. A focused round-trip test forConnectionMode,RelayTimeoutSeconds,P2pTimeoutSeconds, andP2pRetryMaxSecondswould pay off here because a mismatch only breaks the Postgres fast path.Also applies to: 1631-1646
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@management/server/store/sql_store.go` around lines 1502 - 1571, Add a focused pgx regression test that round-trips the nullable settings fields used in accountQuery/Scan to catch fast-path column/order regressions: insert an account row with explicit values and NULLs for settings_connection_mode, settings_relay_timeout_seconds, settings_p2p_timeout_seconds, settings_p2p_retry_max_seconds (and optionally settings_lazy_connection_enabled and related extra settings), then load it via the same code path (the QueryRow + Scan that fills sConnectionMode, sRelayTimeoutSeconds, sP2pTimeoutSeconds, sP2pRetryMaxSeconds, sLazyConnectionEnabled, etc.) and assert the scanned sql.NullString/sql.NullInt64/sql.NullBool values match the inserted ones; add the same test for the other Scan block referenced around the 1631-1646 area to ensure both SELECT/Scan orders remain in sync.management/internals/shared/grpc/conversion.go (1)
161-161: ⚖️ Poor tradeoffSonarCloud: 16 parameters on
ToSyncResponse. Worth folding into an options struct.Static analysis flags this and the function has accreted enough orthogonal inputs (config trio, peer/credentials, network state, dnsName, checks/cache, settings/extraSettings, peerGroups, dnsFwdPort, groupNamesByPeerID) that a single
ToSyncResponseInputstruct would make call-sites self-documenting and stop further parameter creep — especially since this PR is already adding agroupNamesByPeerIDarg. The newly-introducedAppendRemotePeerConfigContextshows the codebase is already moving toward that pattern; same treatment one level up would be consistent.Not strictly required for this PR; flagging as a recommended follow-up.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@management/internals/shared/grpc/conversion.go` at line 161, The ToSyncResponse function has grown too many parameters; introduce a ToSyncResponseInput struct to group the orthogonal inputs (e.g., Config, HttpServerConfig, DeviceAuthorizationFlow, Peer, Token turnCredentials, Token relayCredentials, NetworkMap, dnsName, Checks slice, DNSConfigCache, Settings, ExtraSettings, peerGroups, dnsFwdPort, groupNamesByPeerID) and change the signature of ToSyncResponse to accept a single *ToSyncResponseInput; update all call sites to construct and pass this struct (optionally provide a constructor/helper to build it for readability) and adapt any tests, and follow the same pattern used by AppendRemotePeerConfigContext to keep usage consistent.shared/management/proto/management.proto (1)
235-241: 💤 Low valueReserve the tag gap 18–49 in
PeerSystemMetafor consistency withPeerConfig.
PeerConfigcarefully calls out its tag gap withreserved 9, 10;plus a comment so reviewers don't accidentally land a future field on a reused tag.PeerSystemMetajumps fromFlags flags = 17;straight toeffective_connection_mode = 50;with noreserved 18 to 49;and no comment, so a future small addition here is one careless commit away from colliding. Cheap to make defensive now.♻️ Suggested defensive marker
Flags flags = 17; + + // Tags 18–49 are intentionally left unused so future small additions + // can land without re-numbering the new connection-mode/effective-* + // fields below. Reserved here to make the gap explicit. Phase 3.7i (`#5989`). + reserved 18 to 49; // Phase 3.7i (`#5989`): connection mode/timeouts this peer is actually // running with. Mgmt copies into RemotePeerConfig of every other peer. string effective_connection_mode = 50;🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@shared/management/proto/management.proto` around lines 235 - 241, Add a defensive reserved field range to PeerSystemMeta to prevent accidental tag reuse: after the existing "Flags flags = 17;" declaration, add a `reserved 18 to 49;` with a short comment mirroring PeerConfig's style so future reviewers know that tags 18–49 are intentionally skipped; update the proto in the same message (PeerSystemMeta) that currently jumps to `effective_connection_mode = 50` to include this reserved declaration.management/internals/shared/grpc/conversion_test.go (1)
21-180: ⚡ Quick winAdd one test that exercises the new
groupNamesByPeerIDpath.These additions cover connection-mode and retry-max wiring well, but they still don’t pin the new remote-peer group-name serialization introduced in this PR. A small
ToSyncResponsecase with one peer in multiple groups would catch regressions in both propagation and ordering.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@management/internals/shared/grpc/conversion_test.go` around lines 21 - 180, Add a new unit test that exercises the groupNamesByPeerID path by creating a minimal Network and two or more Group objects where a single Peer appears in multiple groups, then call ToSyncResponse (or the public wrapper that invokes groupNamesByPeerID) with those groups and peers and assert the returned SyncResponse encodes the peer's group names (both presence and stable ordering) on the wire; locate the test near the other toPeerConfig tests (use the existing network/peer fixtures or toPeerConfigForTest for setup), call ToSyncResponse with one peer in multiple groups, and add assertions that the response contains the expected group-name slice for that peer and that repeated runs preserve ordering.management/server/peer/peer_test.go (1)
145-152: ⚡ Quick winCover the other effective config fields in this regression test.
PeerSystemMeta.isEqualnow compares more thanEffectiveConnectionMode. A regression inEffectiveRelayTimeoutSecs,EffectiveP2PTimeoutSecs, orEffectiveP2PRetryMaxSecswould still pass this test, so this is worth making table-driven while the change is fresh.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@management/server/peer/peer_test.go` around lines 145 - 152, The test TestPeerSystemMeta_isEqual_ChecksEffectiveFields only checks EffectiveConnectionMode but isEqual now compares EffectiveRelayTimeoutSecs, EffectiveP2PTimeoutSecs, and EffectiveP2PRetryMaxSecs as well; convert this into a table-driven test that iterates cases where a single effective field differs (use PeerSystemMeta instances and call isEqual) for each of EffectiveConnectionMode, EffectiveRelayTimeoutSecs, EffectiveP2PTimeoutSecs, and EffectiveP2PRetryMaxSecs and assert isEqual returns false for each case, and true for an identical base case to ensure all effective fields are covered.management/server/http/handlers/accounts/accounts_handler_test.go (1)
340-414: ⚡ Quick winRound out this API test with the other new connection settings.
This only verifies
p2p_retry_max_seconds. The same request/response mapping was added forconnection_mode,relay_timeout_seconds, andp2p_timeout_seconds, so a wiring bug in any of those fields would still merge cleanly.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@management/server/http/handlers/accounts/accounts_handler_test.go` around lines 340 - 414, TestAccountsHandler_PutSettings_P2pRetryMax only asserts P2pRetryMaxSeconds but the new request/response mapping also includes connection_mode, relay_timeout_seconds, and p2p_timeout_seconds; update the test (TestAccountsHandler_PutSettings_P2pRetryMax) to include those fields in the JSON request body sent to handler.updateAccount and add the corresponding fields to expectedSettings (e.g., ConnectionMode, RelayTimeoutSeconds, P2pTimeoutSeconds in api.AccountSettings) with the expected values so the test verifies all new connection settings are round-tripped.shared/management/http/api/openapi.yml (2)
367-393: ⚡ Quick winAvoid rollout-phase notes in public API descriptions.
The phase-specific wording (
Phase 1/2/3, rollout behavior notes) is likely to become stale in generated API docs. Prefer stable behavior-focused descriptions so the contract remains accurate across releases.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@shared/management/http/api/openapi.yml` around lines 367 - 393, Remove phase-specific rollout wording from the descriptions for p2p_mode, p2p_timeout_seconds, and p2p_retry_max_seconds and replace with stable, behavior-focused text that describes current runtime effects and defaults; e.g., state what NULL means, the effective modes for each setting (without mentioning "Phase 1/2/3" or future rollout plans), the default values (e.g., built-in default 180 minutes, 900 seconds), and when a value is ignored or treated as 0, so the public API docs stay accurate across releases.
373-405: ⚡ Quick winAdd explicit upper bounds for timeout settings.
These properties currently only enforce
minimum: 0, so very large values still pass schema validation and can drift from practical backend limits. Addmaximumconstraints aligned with server-side validation forp2p_timeout_seconds,p2p_retry_max_seconds, andrelay_timeout_seconds.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@shared/management/http/api/openapi.yml` around lines 373 - 405, OpenAPI schema properties p2p_timeout_seconds, p2p_retry_max_seconds, and relay_timeout_seconds lack upper bounds; add a maximum field to each to match the server-side validation limits. For each property (p2p_timeout_seconds, p2p_retry_max_seconds, relay_timeout_seconds) add a "maximum" integer (format: int64) value equal to the backend's allowed max, ensure the maximum is >= the existing minimum (0), keep nullable: true and existing descriptions/examples (update example if needed, e.g., p2p_retry_max_seconds), and regenerate/validate the spec so schema validation enforces the same limits as the server.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 15ca1bee-2a94-4ad8-9f78-233908bf91ea
⛔ Files ignored due to path filters (5)
client/proto/daemon.pb.gois excluded by!**/*.pb.goclient/proto/daemon_grpc.pb.gois excluded by!**/*.pb.goshared/management/proto/management.pb.gois excluded by!**/*.pb.goshared/management/proto/management_grpc.pb.gois excluded by!**/*.pb.goshared/management/proto/proxy_service.pb.gois excluded by!**/*.pb.go
📒 Files selected for processing (81)
client/android/client.goclient/android/peer_notifier.goclient/android/preferences.goclient/cmd/root.goclient/cmd/service.goclient/cmd/service_installer.goclient/cmd/up.goclient/internal/conn_mgr.goclient/internal/conn_mgr_test.goclient/internal/conn_state_pusher.goclient/internal/conn_state_pusher_test.goclient/internal/connect.goclient/internal/debouncer/debouncer.goclient/internal/engine.goclient/internal/engine_pusher_adapters.goclient/internal/engine_test.goclient/internal/lazyconn/env.goclient/internal/lazyconn/inactivity/manager.goclient/internal/lazyconn/inactivity/manager_test.goclient/internal/lazyconn/manager/manager.goclient/internal/lazyconn/support.goclient/internal/peer/conn.goclient/internal/peer/conn_test.goclient/internal/peer/env.goclient/internal/peer/env_test.goclient/internal/peer/guard/guard.goclient/internal/peer/handshaker.goclient/internal/peer/handshaker_test.goclient/internal/peer/ice_backoff.goclient/internal/peer/ice_backoff_test.goclient/internal/peer/status.goclient/internal/peer/status_test.goclient/internal/peer/worker_ice.goclient/internal/profilemanager/config.goclient/proto/daemon.protoclient/server/server.goclient/status/status.goclient/status/status_test.goclient/ui/client_ui.goclient/ui/const.goclient/ui/event_handler.goclient/ui/network.goclient/ui/peers_tab.gomanagement/internals/controllers/network_map/controller/controller.gomanagement/internals/server/boot.gomanagement/internals/shared/grpc/conversion.gomanagement/internals/shared/grpc/conversion_test.gomanagement/internals/shared/grpc/server.gomanagement/server/account.gomanagement/server/account/manager.gomanagement/server/account/manager_mock.gomanagement/server/activity/codes.gomanagement/server/http/handler.gomanagement/server/http/handlers/accounts/accounts_handler.gomanagement/server/http/handlers/accounts/accounts_handler_test.gomanagement/server/http/handlers/peer_connections/handler.gomanagement/server/http/handlers/peer_connections/handler_test.gomanagement/server/http/testing/testing_tools/channel/channel.gomanagement/server/management_proto_test.gomanagement/server/management_test.gomanagement/server/mock_server/account_mock.gomanagement/server/peer.gomanagement/server/peer/peer.gomanagement/server/peer/peer_test.gomanagement/server/peer_connections/snapshot_router.gomanagement/server/peer_connections/snapshot_router_test.gomanagement/server/peer_connections/store.gomanagement/server/peer_connections/store_test.gomanagement/server/peer_test.gomanagement/server/store/sql_store.gomanagement/server/types/settings.gomanagement/server/types/settings_test.goshared/connectionmode/mode.goshared/connectionmode/mode_test.goshared/management/client/client.goshared/management/client/client_test.goshared/management/client/grpc.goshared/management/client/mock.goshared/management/http/api/openapi.ymlshared/management/http/api/types.gen.goshared/management/proto/management.proto
💤 Files with no reviewable changes (2)
- client/ui/const.go
- client/ui/event_handler.go
| func (p *Preferences) SetRelayTimeoutSeconds(secs int64) { | ||
| v := uint32(secs) | ||
| p.configInput.RelayTimeoutSeconds = &v | ||
| } | ||
|
|
||
| // GetP2pTimeoutSeconds returns the locally configured ICE-worker | ||
| // inactivity timeout in seconds (only effective in p2p-dynamic mode), | ||
| // or 0 if no override is set. | ||
| func (p *Preferences) GetP2pTimeoutSeconds() (int64, error) { | ||
| if p.configInput.P2pTimeoutSeconds != nil { | ||
| return int64(*p.configInput.P2pTimeoutSeconds), nil | ||
| } | ||
| cfg, err := profilemanager.ReadConfig(p.configInput.ConfigPath) | ||
| if err != nil { | ||
| return 0, err | ||
| } | ||
| return int64(cfg.P2pTimeoutSeconds), nil | ||
| } | ||
|
|
||
| // SetP2pTimeoutSeconds stores a local override for the p2p timeout. | ||
| // Pass 0 to clear the override. | ||
| func (p *Preferences) SetP2pTimeoutSeconds(secs int64) { | ||
| v := uint32(secs) | ||
| p.configInput.P2pTimeoutSeconds = &v | ||
| } | ||
|
|
||
| // GetP2pRetryMaxSeconds returns the locally configured cap on the | ||
| // per-peer ICE-failure backoff schedule, or 0 if no override is set. | ||
| func (p *Preferences) GetP2pRetryMaxSeconds() (int64, error) { | ||
| if p.configInput.P2pRetryMaxSeconds != nil { | ||
| return int64(*p.configInput.P2pRetryMaxSeconds), nil | ||
| } | ||
| cfg, err := profilemanager.ReadConfig(p.configInput.ConfigPath) | ||
| if err != nil { | ||
| return 0, err | ||
| } | ||
| return int64(cfg.P2pRetryMaxSeconds), nil | ||
| } | ||
|
|
||
| // SetP2pRetryMaxSeconds stores a local override for the backoff cap. | ||
| // Pass 0 to clear the override. | ||
| func (p *Preferences) SetP2pRetryMaxSeconds(secs int64) { | ||
| v := uint32(secs) | ||
| p.configInput.P2pRetryMaxSeconds = &v |
There was a problem hiding this comment.
Guard the int64 → uint32 conversions.
These setters silently wrap invalid inputs; for example, -1 becomes 4294967295. That stores a bogus timeout/backoff override instead of clearing or rejecting it.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@client/android/preferences.go` around lines 349 - 392, The setters
(SetRelayTimeoutSeconds, SetP2pTimeoutSeconds, SetP2pRetryMaxSeconds) currently
cast int64→uint32 and silently wrap negative or too-large values; change each to
guard inputs: treat secs <= 0 as "clear the override" by setting the
corresponding p.configInput.RelayTimeoutSeconds / P2pTimeoutSeconds /
P2pRetryMaxSeconds to nil, if secs > math.MaxUint32 clamp to math.MaxUint32
before converting, otherwise convert safely to uint32 and store the pointer;
implement these checks so negative values do not wrap into large uint32s and
zero clears the override.
| // NewManagerWithTwoTimers is the Phase-2 constructor. Pass 0 for either | ||
| // timeout to disable that teardown path. Both 0 leaves the manager | ||
| // running but inert (no channel ever fires) -- used by p2p / relay-forced | ||
| // modes that don't tear down workers. | ||
| func NewManagerWithTwoTimers(iface WgInterface, iceTimeout, relayTimeout time.Duration) *Manager { | ||
| if iceTimeout > 0 { | ||
| log.Infof("ICE inactivity timeout: %v", iceTimeout) | ||
| } | ||
| if relayTimeout > 0 { | ||
| log.Infof("relay inactivity timeout: %v", relayTimeout) | ||
| } | ||
| return newManager(iface, iceTimeout, relayTimeout) | ||
| } |
There was a problem hiding this comment.
Validate the new two-timer inputs before constructing the manager.
NewManagerWithTwoTimers() bypasses validateInactivityThreshold, so negative or sub-minute values are now accepted even though the old constructor rejected them. That can make peers tear down on the first ticker pass or behave inconsistently across constructors.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@client/internal/lazyconn/inactivity/manager.go` around lines 83 - 95,
NewManagerWithTwoTimers currently skips input validation, so call
validateInactivityThreshold for both iceTimeout and relayTimeout before calling
newManager (the same validation used by the old constructor) and reject invalid
values; if validation fails, log an error and return nil (or otherwise propagate
the validation error) to avoid constructing a Manager with negative or
sub-minute timeouts. Ensure you reference NewManagerWithTwoTimers,
validateInactivityThreshold, and newManager when making the change so behavior
is consistent across constructors.
| // Phase 3: initialize per-peer ICE-failure backoff. The cap comes | ||
| // from the resolved P2pRetryMaxSeconds. 0 means "use built-in default". | ||
| backoffCap := time.Duration(conn.config.P2pRetryMaxSeconds) * time.Second | ||
| if backoffCap == 0 { | ||
| backoffCap = DefaultP2PRetryMax | ||
| } | ||
| if conn.iceBackoff == nil { | ||
| conn.iceBackoff = newIceBackoff(backoffCap) | ||
| } else { | ||
| conn.iceBackoff.SetMaxBackoff(backoffCap) | ||
| } |
There was a problem hiding this comment.
Preserve the explicit “disable ICE backoff” state across reopens.
SetIceBackoffMax(0) stores config.P2pRetryMaxSeconds = 0, but Open() interprets zero as “use DefaultP2PRetryMax”. After the next close/reopen, a peer that was explicitly configured to disable retry backoff comes back with the default 15-minute cap instead.
Suggested fix
// Open()
- backoffCap := time.Duration(conn.config.P2pRetryMaxSeconds) * time.Second
- if backoffCap == 0 {
- backoffCap = DefaultP2PRetryMax
- }
+ var backoffCap time.Duration
+ switch conn.config.P2pRetryMaxSeconds {
+ case ^uint32(0):
+ backoffCap = 0
+ case 0:
+ backoffCap = DefaultP2PRetryMax
+ default:
+ backoffCap = time.Duration(conn.config.P2pRetryMaxSeconds) * time.Second
+ }
// SetIceBackoffMax()
- conn.config.P2pRetryMaxSeconds = uint32(d / time.Second)
+ if d == 0 {
+ conn.config.P2pRetryMaxSeconds = ^uint32(0)
+ } else {
+ conn.config.P2pRetryMaxSeconds = uint32(d / time.Second)
+ }Also applies to: 1166-1172
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@client/internal/peer/conn.go` around lines 202 - 212, Open() currently treats
config.P2pRetryMaxSeconds == 0 as "use DefaultP2PRetryMax", which loses an
explicit "disabled" setting from SetIceBackoffMax(0); add a persistent flag
(e.g. conn.iceBackoffDisabled bool) that SetIceBackoffMax toggles when passed 0,
and update Open()'s initialization logic (the block using
config.P2pRetryMaxSeconds, backoffCap, newIceBackoff and
iceBackoff.SetMaxBackoff) to check conn.iceBackoffDisabled first and, if true,
initialize or set conn.iceBackoff to a disabled state (backoffCap == 0 /
SetMaxBackoff(0)) instead of substituting DefaultP2PRetryMax so the explicit
"disable" choice survives close/reopen.
| func ResolveModeFromEnv() (connectionmode.Mode, uint32) { | ||
| mode := connectionmode.ModeUnspecified | ||
|
|
||
| if raw := os.Getenv(EnvKeyNBConnectionMode); raw != "" { | ||
| parsed, err := connectionmode.ParseString(raw) | ||
| if err != nil { | ||
| log.Warnf("ignoring %s=%q: %v", EnvKeyNBConnectionMode, raw, err) | ||
| } else if parsed != connectionmode.ModeUnspecified { | ||
| mode = parsed | ||
| } | ||
| } | ||
|
|
||
| if mode == connectionmode.ModeUnspecified { | ||
| if strings.EqualFold(os.Getenv(EnvKeyNBForceRelay), "true") { | ||
| warnDeprecated(EnvKeyNBForceRelay, EnvKeyNBConnectionMode+"=relay-forced") | ||
| mode = connectionmode.ModeRelayForced | ||
| } else if isLazyEnvTrue() { | ||
| warnDeprecated(envEnableLazyConn, EnvKeyNBConnectionMode+"=p2p-lazy") | ||
| mode = connectionmode.ModeP2PLazy | ||
| } | ||
| } | ||
|
|
||
| timeoutSecs := uint32(0) | ||
| if raw := os.Getenv(envInactivityThreshold); raw != "" { | ||
| if d, err := time.ParseDuration(raw); err == nil { | ||
| timeoutSecs = uint32(d.Seconds()) | ||
| warnDeprecated(envInactivityThreshold, "the relay_timeout setting on the management server") | ||
| } else { | ||
| log.Warnf("ignoring %s=%q: %v", envInactivityThreshold, raw, err) | ||
| } | ||
| } | ||
|
|
||
| return mode, timeoutSecs |
There was a problem hiding this comment.
Preserve the js relay-only special case in the new resolver.
IsForceRelayed() still forces relay on runtime.GOOS == "js", but ResolveModeFromEnv() does not. Any caller migrated to the new API can now resolve to a non-relay mode in browsers, which breaks the old compatibility guarantee.
Suggested fix
func ResolveModeFromEnv() (connectionmode.Mode, uint32) {
+ if runtime.GOOS == "js" {
+ return connectionmode.ModeRelayForced, 0
+ }
+
mode := connectionmode.ModeUnspecified📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| func ResolveModeFromEnv() (connectionmode.Mode, uint32) { | |
| mode := connectionmode.ModeUnspecified | |
| if raw := os.Getenv(EnvKeyNBConnectionMode); raw != "" { | |
| parsed, err := connectionmode.ParseString(raw) | |
| if err != nil { | |
| log.Warnf("ignoring %s=%q: %v", EnvKeyNBConnectionMode, raw, err) | |
| } else if parsed != connectionmode.ModeUnspecified { | |
| mode = parsed | |
| } | |
| } | |
| if mode == connectionmode.ModeUnspecified { | |
| if strings.EqualFold(os.Getenv(EnvKeyNBForceRelay), "true") { | |
| warnDeprecated(EnvKeyNBForceRelay, EnvKeyNBConnectionMode+"=relay-forced") | |
| mode = connectionmode.ModeRelayForced | |
| } else if isLazyEnvTrue() { | |
| warnDeprecated(envEnableLazyConn, EnvKeyNBConnectionMode+"=p2p-lazy") | |
| mode = connectionmode.ModeP2PLazy | |
| } | |
| } | |
| timeoutSecs := uint32(0) | |
| if raw := os.Getenv(envInactivityThreshold); raw != "" { | |
| if d, err := time.ParseDuration(raw); err == nil { | |
| timeoutSecs = uint32(d.Seconds()) | |
| warnDeprecated(envInactivityThreshold, "the relay_timeout setting on the management server") | |
| } else { | |
| log.Warnf("ignoring %s=%q: %v", envInactivityThreshold, raw, err) | |
| } | |
| } | |
| return mode, timeoutSecs | |
| func ResolveModeFromEnv() (connectionmode.Mode, uint32) { | |
| if runtime.GOOS == "js" { | |
| return connectionmode.ModeRelayForced, 0 | |
| } | |
| mode := connectionmode.ModeUnspecified | |
| if raw := os.Getenv(EnvKeyNBConnectionMode); raw != "" { | |
| parsed, err := connectionmode.ParseString(raw) | |
| if err != nil { | |
| log.Warnf("ignoring %s=%q: %v", EnvKeyNBConnectionMode, raw, err) | |
| } else if parsed != connectionmode.ModeUnspecified { | |
| mode = parsed | |
| } | |
| } | |
| if mode == connectionmode.ModeUnspecified { | |
| if strings.EqualFold(os.Getenv(EnvKeyNBForceRelay), "true") { | |
| warnDeprecated(EnvKeyNBForceRelay, EnvKeyNBConnectionMode+"=relay-forced") | |
| mode = connectionmode.ModeRelayForced | |
| } else if isLazyEnvTrue() { | |
| warnDeprecated(envEnableLazyConn, EnvKeyNBConnectionMode+"=p2p-lazy") | |
| mode = connectionmode.ModeP2PLazy | |
| } | |
| } | |
| timeoutSecs := uint32(0) | |
| if raw := os.Getenv(envInactivityThreshold); raw != "" { | |
| if d, err := time.ParseDuration(raw); err == nil { | |
| timeoutSecs = uint32(d.Seconds()) | |
| warnDeprecated(envInactivityThreshold, "the relay_timeout setting on the management server") | |
| } else { | |
| log.Warnf("ignoring %s=%q: %v", envInactivityThreshold, raw, err) | |
| } | |
| } | |
| return mode, timeoutSecs | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@client/internal/peer/env.go` around lines 51 - 83, ResolveModeFromEnv no
longer preserves the browser ("js") special-case that forces relay-only mode;
update ResolveModeFromEnv so it checks runtime.GOOS == "js" (or calls
IsForceRelayed if that encapsulates the check) and if true sets mode =
connectionmode.ModeRelayForced (and emits the same deprecation/warning path if
appropriate) before returning so callers in JS still get relay-only behavior;
locate the logic in ResolveModeFromEnv and apply the check either early (before
env parsing) or after parsing but prior to returning.
| peerConnHandler := peer_connections_http.NewHandler( | ||
| peerConnStore, | ||
| &pcAccountManagerAdapter{am: accountManager, nmc: networkMapController}, | ||
| peerConnRouter, | ||
| ) | ||
| router.HandleFunc("/peers/{peerId}/connections", peerConnHandler.GetPeerConnections).Methods("GET", "OPTIONS") | ||
| router.HandleFunc("/peers/{peerId}/connections/refresh", peerConnHandler.PostRefresh).Methods("POST", "OPTIONS") |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== peer_connections handler ==="
sed -n '1,260p' management/server/http/handlers/peer_connections/handler.go
echo
echo "=== peers handler/auth patterns ==="
rg -n -C3 --type=go 'HasPermission|permissions\.|Permission|UserAuth|Admin|Owner|RegularUser|GetUserFromUserAuth|Check.*Permission' \
management/server/http/handlers/peer_connections \
management/server/http/handlers/peers
echo
echo "=== constructor signatures ==="
rg -n -C2 --type=go 'func NewHandler|func AddEndpoints' \
management/server/http/handlers/peer_connections \
management/server/http/handlers/peersRepository: netbirdio/netbird
Length of output: 24412
Add explicit RBAC checks to the peer-connections routes.
The peer_connections handler does not enforce the same permission validation as the existing peers endpoints:
peershandler callsh.permissionsManager.ValidateUserPermissions(r.Context(), accountID, userID, modules.Peers, operations.Read)before returning peer datapeer_connectionshandler calls onlyh.account.GetPeer(r.Context(), userAuth.AccountId, peerID, userAuth.UserId)and relies implicitly on that method for authorization
The NewHandler constructor for peer_connections does not receive permissionsManager at all, so explicit permission checks cannot be added without refactoring.
Since these endpoints expose per-peer state and trigger network changes (refresh), they should enforce the same Peers read/write permissions as the existing /peers routes before merge.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@management/server/http/handler.go` around lines 143 - 149, The
peer-connections routes lack explicit RBAC checks; refactor
peer_connections_http.NewHandler to accept the permissionsManager (same type
used by the peers handler), then in the handler methods (GetPeerConnections and
PostRefresh) call permissionsManager.ValidateUserPermissions(ctx, accountID,
userID, modules.Peers, operations.Read) before returning data and
ValidateUserPermissions(..., modules.Peers, operations.Write) (or the
appropriate operation) before performing refresh/network changes; replace the
current reliance on h.account.GetPeer for authorization with these explicit
checks and use the pcAccountManagerAdapter/peerConnStore as before.
| if req.Settings.P2pTimeoutSeconds != nil { | ||
| v := uint32(*req.Settings.P2pTimeoutSeconds) | ||
| returnSettings.P2pTimeoutSeconds = &v | ||
| } | ||
| if req.Settings.P2pRetryMaxSeconds != nil { | ||
| v := uint32(*req.Settings.P2pRetryMaxSeconds) | ||
| returnSettings.P2pRetryMaxSeconds = &v | ||
| } | ||
| if req.Settings.RelayTimeoutSeconds != nil { | ||
| v := uint32(*req.Settings.RelayTimeoutSeconds) | ||
| returnSettings.RelayTimeoutSeconds = &v | ||
| } |
There was a problem hiding this comment.
Missing bounds validation before int64 → uint32 truncation for timeout fields.
All three timeout values come in as *int64 from the API but are stored as uint32. There is no check that the value is non-negative or within [0, math.MaxUint32]. In Go, uint32(int64(-1)) silently produces 4294967295 (~136-year timeout) rather than an error. A client submitting -1 would end up with a de-facto infinite peer timeout with no diagnostic.
🛡️ Proposed fix — add a bounds-check helper
+func toUint32Seconds(fieldName string, v int64) (uint32, error) {
+ if v < 0 {
+ return 0, fmt.Errorf("%s must be non-negative, got %d", fieldName, v)
+ }
+ if v > math.MaxUint32 {
+ return 0, fmt.Errorf("%s value %d exceeds maximum allowed (%d)", fieldName, v, math.MaxUint32)
+ }
+ return uint32(v), nil
+}Then in updateAccountRequestSettings:
if req.Settings.P2pTimeoutSeconds != nil {
- v := uint32(*req.Settings.P2pTimeoutSeconds)
- returnSettings.P2pTimeoutSeconds = &v
+ v, err := toUint32Seconds("p2p_timeout_seconds", *req.Settings.P2pTimeoutSeconds)
+ if err != nil {
+ return nil, err
+ }
+ returnSettings.P2pTimeoutSeconds = &v
}
if req.Settings.P2pRetryMaxSeconds != nil {
- v := uint32(*req.Settings.P2pRetryMaxSeconds)
- returnSettings.P2pRetryMaxSeconds = &v
+ v, err := toUint32Seconds("p2p_retry_max_seconds", *req.Settings.P2pRetryMaxSeconds)
+ if err != nil {
+ return nil, err
+ }
+ returnSettings.P2pRetryMaxSeconds = &v
}
if req.Settings.RelayTimeoutSeconds != nil {
- v := uint32(*req.Settings.RelayTimeoutSeconds)
- returnSettings.RelayTimeoutSeconds = &v
+ v, err := toUint32Seconds("relay_timeout_seconds", *req.Settings.RelayTimeoutSeconds)
+ if err != nil {
+ return nil, err
+ }
+ returnSettings.RelayTimeoutSeconds = &v
}You'll also need "math" in the imports.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@management/server/http/handlers/accounts/accounts_handler.go` around lines
229 - 240, The timeout conversions in updateAccountRequestSettings
(req.Settings.P2pTimeoutSeconds, P2pRetryMaxSeconds, RelayTimeoutSeconds →
returnSettings.*) currently cast *int64 to uint32 without bounds checking; add a
small helper (e.g., validateUint32FromInt64) that accepts *int64, returns
(uint32, error) and ensures the input is non-nil, >= 0 and <= math.MaxUint32,
then use that helper for P2pTimeoutSeconds, P2pRetryMaxSeconds and
RelayTimeoutSeconds and propagate/return an error if validation fails; also add
"math" to imports.
| peer, err := h.account.GetPeer(r.Context(), userAuth.AccountId, peerID, userAuth.UserId) | ||
| if err != nil { | ||
| http.Error(w, "peer not found", http.StatusNotFound) | ||
| return |
There was a problem hiding this comment.
Don't turn every GetPeer failure into 404.
Both endpoints return "peer not found" for all GetPeer errors. That masks transient/store failures as a false 404 and makes clients behave as if the peer disappeared. Distinguish not-found from everything else and return 5xx for internal failures.
Also applies to: 137-140
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@management/server/http/handlers/peer_connections/handler.go` around lines 85
- 88, The handlers currently treat any h.account.GetPeer error as 404; change
both call sites that invoke h.account.GetPeer in handler.go to distinguish "not
found" from other errors by using the store's sentinel or error type check
(e.g., errors.Is(err, store.ErrNotFound) or a typed NotFound error returned by
GetPeer). If the error is a NotFound return http.StatusNotFound; for any other
error return http.StatusInternalServerError, log the error with context
(including peerID/accountID/userID) and then write a 5xx response. Ensure you
update both occurrences of h.account.GetPeer in this file.
| func (s *MemoryStore) Put(peerPubKey string, m *mgmProto.PeerConnectionMap) { | ||
| s.mu.Lock() | ||
| defer s.mu.Unlock() | ||
|
|
||
| prev := s.maps[peerPubKey] | ||
| if prev != nil && m.GetSeq() > 0 && m.GetSeq() <= prev.m.GetSeq() { | ||
| return | ||
| } |
There was a problem hiding this comment.
Full snapshots can be silently dropped after a client engine restart (seq counter resets).
The pusher's seq is an in-memory counter that resets to 0 on every engine start (see connStatePusher.flushDelta/flushFull incrementing p.seq from 0). After a peer process/engine restart:
- Client reconnects and pushes its initial full snapshot with
seq=1. - Mgmt still holds the old
prev(TTL hasn't expired), with e.g.prev.Seq=137. - Line 69:
m.GetSeq() > 0 && m.GetSeq() <= prev.m.GetSeq()→1 <= 137→ drop.
Mgmt's view of that peer's connections then stays stale until the in-memory TTL expires, even though the client has been pushing fresh data the whole time. Subsequent deltas hit the same wall. This is exactly the case full snapshots are supposed to authoritatively repair.
Bypass the seq check on full snapshots — they're always authoritative replacements:
🔧 Suggested fix
prev := s.maps[peerPubKey]
- if prev != nil && m.GetSeq() > 0 && m.GetSeq() <= prev.m.GetSeq() {
+ // Full snapshots are authoritative replacements (e.g., after a client
+ // engine restart the pusher's seq counter resets to 0, so its first
+ // snapshot has seq=1 even if prev.Seq is much higher). Only out-of-order
+ // *deltas* should be dropped.
+ if prev != nil && !m.GetFullSnapshot() && m.GetSeq() > 0 && m.GetSeq() <= prev.m.GetSeq() {
return
}Also consider adding a regression test in store_test.go covering "full snapshot with lower seq after restart replaces prev".
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| func (s *MemoryStore) Put(peerPubKey string, m *mgmProto.PeerConnectionMap) { | |
| s.mu.Lock() | |
| defer s.mu.Unlock() | |
| prev := s.maps[peerPubKey] | |
| if prev != nil && m.GetSeq() > 0 && m.GetSeq() <= prev.m.GetSeq() { | |
| return | |
| } | |
| func (s *MemoryStore) Put(peerPubKey string, m *mgmProto.PeerConnectionMap) { | |
| s.mu.Lock() | |
| defer s.mu.Unlock() | |
| prev := s.maps[peerPubKey] | |
| // Full snapshots are authoritative replacements (e.g., after a client | |
| // engine restart the pusher's seq counter resets to 0, so its first | |
| // snapshot has seq=1 even if prev.Seq is much higher). Only out-of-order | |
| // *deltas* should be dropped. | |
| if prev != nil && !m.GetFullSnapshot() && m.GetSeq() > 0 && m.GetSeq() <= prev.m.GetSeq() { | |
| return | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@management/server/peer_connections/store.go` around lines 64 - 71, In
MemoryStore.Put, the current seq gating can silently drop authoritative full
snapshots after a client restart; modify MemoryStore.Put so that if the incoming
mgmProto.PeerConnectionMap is a full snapshot (check m.GetFull() or the
equivalent "full" boolean accessor on PeerConnectionMap) you bypass the seq
comparison and always replace prev (i.e., skip the m.GetSeq() <= prev.m.GetSeq()
return path for full snapshots), then proceed to store the new map and update
TTL as usual; add a regression test in store_test.go asserting that a full
snapshot with a lower seq replaces an existing prev with a higher seq after
restart.
| func (am *DefaultAccountManager) GetPeerByPubKey(ctx context.Context, accountID, pubKey string) (*nbpeer.Peer, error) { | ||
| p, err := am.Store.GetPeerByPeerPubKey(ctx, store.LockingStrengthNone, pubKey) | ||
| if err != nil { | ||
| return nil, err | ||
| } | ||
| if p.AccountID != accountID { | ||
| return nil, fmt.Errorf("peer with pubkey %s not in account %s", pubKey, accountID) | ||
| } |
There was a problem hiding this comment.
Return a typed peer/account error here.
On Lines 1233-1235 this switches to a plain fmt.Errorf, while the other peer lookup paths return status.NewPeerNotPartOfAccountError() for the same account-mismatch case. Keeping the typed status error here preserves consistent handling in the HTTP layer.
Suggested fix
func (am *DefaultAccountManager) GetPeerByPubKey(ctx context.Context, accountID, pubKey string) (*nbpeer.Peer, error) {
p, err := am.Store.GetPeerByPeerPubKey(ctx, store.LockingStrengthNone, pubKey)
if err != nil {
return nil, err
}
if p.AccountID != accountID {
- return nil, fmt.Errorf("peer with pubkey %s not in account %s", pubKey, accountID)
+ return nil, status.NewPeerNotPartOfAccountError()
}
return p, nil
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| func (am *DefaultAccountManager) GetPeerByPubKey(ctx context.Context, accountID, pubKey string) (*nbpeer.Peer, error) { | |
| p, err := am.Store.GetPeerByPeerPubKey(ctx, store.LockingStrengthNone, pubKey) | |
| if err != nil { | |
| return nil, err | |
| } | |
| if p.AccountID != accountID { | |
| return nil, fmt.Errorf("peer with pubkey %s not in account %s", pubKey, accountID) | |
| } | |
| func (am *DefaultAccountManager) GetPeerByPubKey(ctx context.Context, accountID, pubKey string) (*nbpeer.Peer, error) { | |
| p, err := am.Store.GetPeerByPubKey(ctx, store.LockingStrengthNone, pubKey) | |
| if err != nil { | |
| return nil, err | |
| } | |
| if p.AccountID != accountID { | |
| return nil, status.NewPeerNotPartOfAccountError() | |
| } | |
| return p, nil | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@management/server/peer.go` around lines 1228 - 1235, In GetPeerByPubKey
(DefaultAccountManager) replace the plain fmt.Errorf returned when p.AccountID
!= accountID with the typed status error used elsewhere — call
status.NewPeerNotPartOfAccountError(...) with the same peer pubKey and accountID
(matching other lookup paths) so the HTTP layer receives the consistent
peer/account error type.
d48893f to
907bc1c
Compare
There was a problem hiding this comment.
Actionable comments posted: 13
♻️ Duplicate comments (9)
client/internal/peer/ice_backoff.go (1)
161-169:⚠️ Potential issue | 🟠 Major | ⚡ Quick winClear any active suspension when backoff is disabled.
SetMaxBackoff(0)rebuilds the scheduler, but an already-suspended peer keeps honoring the oldnextRetry. That means a server push disabling ICE backoff can stay ineffective until the stale timer expires.Suggested fix
func (s *iceBackoffState) SetMaxBackoff(d time.Duration) { s.mu.Lock() defer s.mu.Unlock() if d == s.maxBackoff { return } s.maxBackoff = d s.bo = buildBackoff(d) + if d == 0 { + s.suspended = false + s.nextRetry = time.Time{} + } }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@client/internal/peer/ice_backoff.go` around lines 161 - 169, SetMaxBackoff currently rebuilds the scheduler but does not clear an existing suspension, so calling SetMaxBackoff(0) can leave a peer suspended until the old nextRetry elapses; update SetMaxBackoff (in iceBackoffState) to, under the same lock, when d == 0 clear any active suspension by resetting s.suspended (and s.nextRetry to zero/empty) before rebuilding s.bo and assigning s.maxBackoff so the peer is immediately unsuspended when backoff is disabled.management/server/peer.go (1)
1228-1235:⚠️ Potential issue | 🟠 Major | ⚡ Quick winReturn the typed peer/account error here.
This account-mismatch path should use the same typed status error as the other peer lookup APIs;
fmt.Errorfmakes the HTTP layer handle it inconsistently.Suggested fix
if p.AccountID != accountID { - return nil, fmt.Errorf("peer with pubkey %s not in account %s", pubKey, accountID) + return nil, status.NewPeerNotPartOfAccountError() }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@management/server/peer.go` around lines 1228 - 1235, GetPeerByPubKey currently returns a plain fmt.Errorf when the peer.AccountID doesn't match, causing inconsistent HTTP handling; replace that fmt.Errorf with the same typed status/account-mismatch error used by the other peer lookup APIs so the HTTP layer treats it consistently. Locate GetPeerByPubKey (and compare with the other peer lookup functions that call am.Store.GetPeerByPeerPubKey) and return the same typed error value/function (the project’s account/peer status error helper) instead of fmt.Errorf, preserving the error message/context about pubKey and accountID.management/server/peer_connections/store.go (1)
68-70:⚠️ Potential issue | 🟠 Major | ⚡ Quick winDo not seq-drop authoritative full snapshots.
After a client restart, the pusher's counter starts over, so its first full snapshot can legitimately have a lower seq than the cached map. This gate drops that repair snapshot and leaves the old state in place until TTL expiry.
Suggested fix
- if prev != nil && m.GetSeq() > 0 && m.GetSeq() <= prev.m.GetSeq() { + if prev != nil && !m.GetFullSnapshot() && m.GetSeq() > 0 && m.GetSeq() <= prev.m.GetSeq() { return }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@management/server/peer_connections/store.go` around lines 68 - 70, The current seq-check in the peer map update (using s.maps, prev, m.GetSeq(), prev.m.GetSeq()) wrongly drops authoritative full snapshots with lower seqs; change the guard so you only return when the incoming map is not an authoritative full snapshot — e.g. replace the condition "if prev != nil && m.GetSeq() > 0 && m.GetSeq() <= prev.m.GetSeq() { return }" with a check that also requires the map to be non-full/ non-authoritative (something like "&& !m.IsFullSnapshot()" or "&& !m.GetMeta().GetAuthoritative()") before returning; ensure you use the actual predicate your Map type exposes to detect an authoritative/full snapshot so full repairs are accepted even if seq is lower.management/internals/shared/grpc/server.go (2)
487-489:⚠️ Potential issue | 🟠 MajorDisable the snapshot select case once the router channel is closed.
continueon a closedsnapshotChleaves this case permanently ready and can spin the goroutine at 100% CPU. SetsnapshotCh = nilor return whenokis false.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@management/internals/shared/grpc/server.go` around lines 487 - 489, The select case reading from snapshotCh currently does "if !ok { continue }" which leaves the case permanently ready and can spin the goroutine; update the handling in the select branch that reads "case nonce, ok := <-snapshotCh:" to either set snapshotCh = nil when ok is false (to disable that select case) or return/exit the goroutine instead, ensuring you reference and modify the snapshotCh variable in the scope of the goroutine where it is used (adjust logic in the surrounding loop/select to handle snapshotCh being nil).
1175-1183:⚠️ Potential issue | 🟠 MajorReject connection-map uploads from unregistered peers before storing them.
parseRequest()only proves the sender owns the WireGuard key used for encryption. It does not prove that this pubkey belongs to a registered peer, so this path still allows arbitrary keypairs to create entries inpeerConnections. Resolve the peer/account first and only callPutfor registered peers.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@management/internals/shared/grpc/server.go` around lines 1175 - 1183, After parsing the request in SyncPeerConnections (using parseRequest) avoid blindly writing to peerConnections; instead resolve the sender to a registered peer/account (e.g. call your existing lookup/resolve method using peerKey or the peer public key) and if that lookup fails return an error, only calling s.peerConnections.Put(peerKey.String(), pcm) when the peer/account is confirmed registered; ensure you reference peerKey and pcm and return a clear permission/unauthorized error for unregistered senders.client/internal/peer/status.go (2)
422-465:⚠️ Potential issue | 🟠 MajorBuild the listener/router notifications before releasing
d.mux.These helpers are explicitly documented to snapshot under the lock, but this path unlocks first and only then calls
notifyConnStateChange(). The same post-unlock pattern is repeated in the otherupdatePeer*Lockedhelpers below, includingnotifyPeerStateChangeListeners(), so listener/subscription changes can race with notification construction. Capture the closures/snapshots while locked, then unlock and invoke them.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@client/internal/peer/status.go` around lines 422 - 465, The code in updatePeerStateLocked builds router/peer notifications after releasing d.mux which allows races; while still holding the lock you must construct/capture any closures and snapshots (call snapshotRouterPeersLocked, notifyConnStateChange(receivedState.PubKey, peerState) and any notifyPeerStateChangeListeners equivalents) into local variables, then release d.mux and invoke those captured closures (and call d.notifier.peerListChanged) outside the lock; update the other updatePeer*Locked helpers similarly so all listener/router notification closures and snapshots are created under the lock before unlocking.
506-540:⚠️ Potential issue | 🟠 MajorRemote-meta-only changes still never notify subscribers.
This method updates
RemoteLiveOnline, groups, and the effective/configured mode fields, then returns without firing either the per-peer conn-state listener or the peer-list/router notifications. A management-only liveness or config change will stay stale until some unrelated ICE/relay transition happens. Snapshot the required notifications under the lock and dispatch them after unlock in both branches.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@client/internal/peer/status.go` around lines 506 - 540, In UpdatePeerRemoteMeta, you currently update Remote* fields for an online peer (d.peers[pubKey]) or an offline entry (d.offlinePeers[i]) and return without notifying subscribers; capture (snapshot) the required notifications — e.g., the per-peer connection-state listener callback and any peer-list/router change events — while still holding d.mux (after mutating the peer record), then release the lock and dispatch those notifications outside the lock in both the online branch (when st exists) and the offline branch (when offlinePeers[i].PubKey matches); ensure you record which peer key and the minimal changed state needed for the notifications so the post-unlock dispatch logic can run safely.management/server/http/handler.go (1)
143-149:⚠️ Potential issue | 🟠 MajorWire the peer-connections handler through the permissions manager.
This route registration still constructs the handler without any RBAC dependency, so these endpoints cannot perform the same explicit
Peersread/write permission checks aspeers.AddEndpoints(...). Since they expose per-peer connection state and trigger refresh actions, that authorization should be explicit here before merge.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@management/server/http/handler.go` around lines 143 - 149, Construct the peer-connections handler via the permissions manager so the endpoints enforce Peers RBAC: instead of registering peer_connections_http.NewHandler(...) directly, create the handler with peer_connections_http.NewHandler(peerConnStore, &pcAccountManagerAdapter{am: accountManager, nmc: networkMapController}, peerConnRouter) and then wrap its methods with the permissions middleware (or RequirePermission) to enforce Peers read on GetPeerConnections and Peers write on PostRefresh (similar to how peers.AddEndpoints does it); register the wrapped functions with router.HandleFunc("/peers/{peerId}/connections", wrappedGet).Methods("GET","OPTIONS") and router.HandleFunc("/peers/{peerId}/connections/refresh", wrappedPost).Methods("POST","OPTIONS").management/internals/shared/grpc/conversion.go (1)
263-267:⚠️ Potential issue | 🟠 MajorPreserve the retry-max sentinel when populating
ConfiguredP2PRetryMaxSecs.
derefUint32OrZerocollapsesniland explicit0here, buttoPeerConfig()usesp2pRetryMaxDisabledSentinelto distinguish "unset" from "explicitly disabled backoff". That means remote peers lose the canonical wire-format value for this setting. Mirror the same sentinel encoding here and add a regression test.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@management/internals/shared/grpc/conversion.go` around lines 263 - 267, The code currently uses derefUint32OrZero to set cfgP2pRetryMax which collapses nil and explicit 0; instead preserve the p2pRetryMaxDisabledSentinel used by toPeerConfig so consumers can tell "unset" from "explicit 0". Change the assignment in conversion.go that sets cfgP2pRetryMax (the variable that populates ConfiguredP2PRetryMaxSecs) to check c.Cfg.P2pRetryMaxSeconds: if nil set p2pRetryMaxDisabledSentinel, otherwise use the dereferenced value (keep explicit 0); replace derefUint32OrZero for this field or add a small helper (e.g., derefUint32OrSentinel) and add a regression test asserting the sentinel is preserved in the resulting ConfiguredP2PRetryMaxSecs and that explicit 0 remains 0.
🧹 Nitpick comments (9)
management/server/peer/peer_test.go (1)
145-152: ⚡ Quick winTest name implies all
Effective*fields are checked, but onlyEffectiveConnectionModeis covered.The PR adds four new fields to
isEqual:EffectiveConnectionMode,EffectiveRelayTimeoutSecs,EffectiveP2PTimeoutSecs, andEffectiveP2PRetryMaxSecs. This test validates only the first one; the other three would silently pass even if they were accidentally omitted fromisEqual. Either extend the test to cover all four fields, or rename it to reflect its narrower scope (e.g.,TestPeerSystemMeta_isEqual_ChecksEffectiveConnectionMode).A straightforward extension using the same table-driven style as
TestFlags_IsEqual:🧪 Proposed extension to cover all `Effective*` fields
-func TestPeerSystemMeta_isEqual_ChecksEffectiveFields(t *testing.T) { - base := PeerSystemMeta{Hostname: "h", EffectiveConnectionMode: "p2p-dynamic"} - other := base - other.EffectiveConnectionMode = "p2p" - if base.isEqual(other) { - t.Error("isEqual should return false when EffectiveConnectionMode differs") - } -} +func TestPeerSystemMeta_isEqual_ChecksEffectiveFields(t *testing.T) { + tests := []struct { + name string + mutate func(*PeerSystemMeta) + }{ + { + name: "EffectiveConnectionMode", + mutate: func(m *PeerSystemMeta) { m.EffectiveConnectionMode = "p2p" }, + }, + { + name: "EffectiveRelayTimeoutSecs", + mutate: func(m *PeerSystemMeta) { m.EffectiveRelayTimeoutSecs = 99 }, + }, + { + name: "EffectiveP2PTimeoutSecs", + mutate: func(m *PeerSystemMeta) { m.EffectiveP2PTimeoutSecs = 99 }, + }, + { + name: "EffectiveP2PRetryMaxSecs", + mutate: func(m *PeerSystemMeta) { m.EffectiveP2PRetryMaxSecs = 99 }, + }, + } + base := PeerSystemMeta{Hostname: "h", EffectiveConnectionMode: "p2p-dynamic"} + for _, tt := range tests { + t.Run(tt.name, func(t *testing.T) { + other := base + tt.mutate(&other) + require.False(t, base.isEqual(other), "isEqual should return false when %s differs", tt.name) + }) + } +}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@management/server/peer/peer_test.go` around lines 145 - 152, The test TestPeerSystemMeta_isEqual_ChecksEffectiveFields currently only asserts a difference for EffectiveConnectionMode; update the test to either (a) rename it to TestPeerSystemMeta_isEqual_ChecksEffectiveConnectionMode if you intend to keep it narrow, or preferably (b) extend it to cover all four Effective* fields by using a table-driven approach that mutates each of EffectiveConnectionMode, EffectiveRelayTimeoutSecs, EffectiveP2PTimeoutSecs, and EffectiveP2PRetryMaxSecs on the copied other PeerSystemMeta and asserts isEqual returns false for each; target the PeerSystemMeta.isEqual method and add one subcase per field so omissions in isEqual will fail the test.client/ui/network.go (2)
125-125: 💤 Low valueAuto-refresh keeps polling networks even while the Peers tab is active.
startAutoRefresh(10s) callsupdateNetworksBasedOnDisplayTabunconditionally, which repaints the (hidden) network grid behind the Peers tab and triggers daemonListNetworksRPCs the user can't see. Combined with the new 30s peers ticker inpeers_tab.go, this is wasted work. Consider gating the network refresh ontabs.Selected().Text != peersTextso each tab only refreshes its own data.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@client/ui/network.go` at line 125, The auto-refresh is repainting the network grid even when the Peers tab is active; modify startAutoRefresh (the call site and/or its implementation that invokes updateNetworksBasedOnDisplayTab) to check the currently selected tab before refreshing networks: gate the call to updateNetworksBasedOnDisplayTab (and any network-related refresh work for allGrid/overlappingGrid/exitNodeGrid) so it only runs when tabs.Selected().Text != peersText, leaving the peers_tab.go 30s peers ticker to handle peers updates; ensure you reference tabs and peersText consistently so only the visible tab's data is polled.
60-76: 💤 Low valueDefensive-only nit:
tabs.Selected()can theoretically be nil.
getGridAndFilterFromTab(Line 707) dereferencestabs.Selected().Textdirectly. The button callbacks here only run after the user interacts with a fully-built window, so in practiceSelected()is non-nil — but theRefreshhandler at Line 71 already guards against nil, whileselectAllBtn/deselectAllBtndo not. Worth aligning for consistency, though not blocking.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@client/ui/network.go` around lines 60 - 76, The selectAll and deselectAll button handlers call getGridAndFilterFromTab which dereferences tabs.Selected().Text; add the same nil-guard used in refreshBtn: check if tabs.Selected() != nil before calling getGridAndFilterFromTab and otherwise return (or no-op). Update the selectAllBtn and deselectAllBtn callbacks to early-return when tabs.Selected() is nil, then call getGridAndFilterFromTab and proceed to s.selectAllFilteredNetworks/s.deselectAllFilteredNetworks and s.updateNetworksBasedOnDisplayTab.client/ui/peers_tab.go (2)
219-273: 💤 Low valueReduce cognitive complexity of
buildPeerDetailText(Sonar).Sonar reports complexity 24 vs. 20 limit. The function is essentially a sequence of independent "section writers" (basic, endpoints, full-details). Extracting each section into a small helper drops complexity and keeps the overall layout obvious.
♻️ Sketch
func buildPeerDetailText(p *proto.PeerState, full bool) string { var sb strings.Builder writeBasicPeerFields(&sb, p) writeEndpointFields(&sb, p) writeMetadataFields(&sb, p) if full { sb.WriteString("\n--- Full details ---\n") writeFullPeerFields(&sb, p) } return sb.String() }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@client/ui/peers_tab.go` around lines 219 - 273, Sonar flags buildPeerDetailText as too complex; split its linear sections into small helpers to reduce cognitive complexity. Create helper functions (e.g., writeBasicPeerFields(sb *strings.Builder, p *proto.PeerState), writeEndpointFields(sb *strings.Builder, p *proto.PeerState), writeMetadataFields(sb *strings.Builder, p *proto.PeerState), and writeFullPeerFields(sb *strings.Builder, p *proto.PeerState)) that contain the corresponding fmt.Fprintf blocks currently in buildPeerDetailText, then rewrite buildPeerDetailText to instantiate the strings.Builder and call those helpers (and only append the "\n--- Full details ---\n" header before calling writeFullPeerFields when full==true). Ensure all existing uses of p.Get... and orDashStr/peerLatencyStr/humanBytes remain unchanged inside the helpers.
51-52: ⚡ Quick winDerive the call context from the parent
ctx.
context.WithTimeout(context.Background(), …)ignores thectxpassed intobuildPeersTabContent. If the UI/process ctx is cancelled (shutdown, window close), an in-flightStatusRPC keeps running until its own 5s timeout instead of being cancelled promptly. Use the parent ctx so cancellation propagates.- callCtx, cancel := context.WithTimeout(context.Background(), 5*time.Second) + callCtx, cancel := context.WithTimeout(ctx, 5*time.Second) defer cancel()🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@client/ui/peers_tab.go` around lines 51 - 52, In buildPeersTabContent, the temporary call context is created with context.WithTimeout(context.Background(), 5*time.Second) which ignores the parent ctx; change it to derive from the parent (use context.WithTimeout(ctx, 5*time.Second)) so callCtx and cancel propagate cancellation from the UI/process ctx (affecting the Status RPC and any in-flight operations).management/server/peer_test.go (1)
1159-1159: ⚡ Quick winAdd one assertion path for the new group-enrichment input.
Passing
nilhere meansToSyncResponse's newgroupNamesByPeerIDbranch is still untested, so regressions in the peer-groups projection would slip through this fixture. A small non-nil map plus one assertion on the emitted peer metadata would close that gap.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@management/server/peer_test.go` at line 1159, The test calls ToSyncResponse with a nil groupNamesByPeerID which leaves the new group-enrichment branch untested; update the call in peer_test.go to pass a small non-nil map (e.g. map[peer.ID][]string with the test peer's ID mapped to one or two group names) into the position currently nil, then add an assertion that the returned response's peer metadata (from response.Peers or the specific peer entry asserted elsewhere in this fixture) contains those group names so the groupNamesByPeerID -> peer-groups projection is exercised.management/server/peer_connections/store_test.go (1)
38-52: ⚡ Quick winCover mutation-after-
Putas well, not just mutation-after-Get.This test only proves the store clones on return. It would still pass if
Putkept the caller's protobuf pointer and the caller mutated it later. Add one case that mutates the originalPeerConnectionMapafterPutand then verifies the stored copy stayed unchanged.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@management/server/peer_connections/store_test.go` around lines 38 - 52, Add a second assertion in TestMemoryStore_DeepCopyOnReturn to cover mutation-after-Put: after creating the original mgmProto.PeerConnectionMap (the value passed into s.Put in the test that uses newStoreWithClock), mutate fields on that original protobuf (e.g., change Entries[0].RemotePubkey or LastHandshake) immediately after calling s.Put("peerA", ...), then call s.Get("peerA") and assert the returned copy still has the original values ("peerB" etc.). This verifies s.Put clones the input rather than retaining the caller's pointer.management/server/http/testing/testing_tools/channel/channel.go (1)
138-138: 💤 Low valueConsider naming the trailing
nilparameters.Both call sites (lines 138 and 267) pass seven consecutive
nilarguments afterserviceManager. While the compiler catches any count/type mismatch, tracking whichnilmaps to which parameter (peerConnections store, snapshot router, …) requires cross-referencingNewAPIHandler's signature. A comment block or a thin wrapper helper that names the optional arguments would ease future maintenance.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@management/server/http/testing/testing_tools/channel/channel.go` at line 138, The call to NewAPIHandler passes seven consecutive nils after serviceManager which makes it unclear which optional parameters are being set to nil; update the apiHandler construction (the call to http2.NewAPIHandler where apiHandler, err := ...) to either replace the trailing nils with named inline comments indicating which parameter each nil corresponds to (e.g., /* peerConnections */, /* snapshotRouter */, etc.), or create a small helper/wrapper that accepts the required arguments and supplies named nil defaults before calling NewAPIHandler; reference the NewAPIHandler signature to use the exact parameter names and ensure the helper or inline comments map each nil to the correct parameter.shared/management/client/grpc.go (1)
483-490: ⚡ Quick winSnapshot callback runs inline on the Sync recv loop.
cb(req.GetNonce())is invoked synchronously while the goroutine is reading the management Sync stream. If a future implementation of the handler does anything blocking (RPCs, locks held by other goroutines, channel sends without buffer/select), the entire Sync stream stalls — including subsequent NetworkMap updates. Consider dispatching to a goroutine here, or at least documenting that the handler must be non-blocking and bounded.♻️ Suggested fix
- if req := decryptedResp.GetSnapshotRequest(); req != nil { - c.snapMu.Lock() - cb := c.onSnapshotRequest - c.snapMu.Unlock() - if cb != nil { - cb(req.GetNonce()) - } - } + if req := decryptedResp.GetSnapshotRequest(); req != nil { + c.snapMu.Lock() + cb := c.onSnapshotRequest + c.snapMu.Unlock() + if cb != nil { + go cb(req.GetNonce()) + } + }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@shared/management/client/grpc.go` around lines 483 - 490, The snapshot callback is invoked inline on the Sync receive loop (see c.onSnapshotRequest usage and cb(req.GetNonce())), which can block the stream; change the invocation to dispatch the callback asynchronously (e.g., spawn a goroutine) after acquiring and releasing snapMu so the recv loop isn’t blocked, and ensure the goroutine is safe (recover panics) or document that onSnapshotRequest must be non-blocking if you choose not to spawn a goroutine.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@client/android/preferences.go`:
- Around line 325-331: SetConnectionMode currently stores a non-nil pointer to
an empty string so calling SetConnectionMode("") doesn’t clear the override;
change SetConnectionMode to set p.configInput.ConnectionMode = nil when mode ==
"" and otherwise set it to a pointer to the provided string (e.g., allocate a
new string variable or use &m) so GetConnectionMode’s nil-check will correctly
fall through to the server-pushed value; update the logic in the
SetConnectionMode method that manipulates p.configInput.ConnectionMode
accordingly.
In `@client/internal/conn_state_pusher.go`:
- Around line 107-112: OnSnapshotRequest currently sends nonce into
p.snapshotReq but if the channel buffer is full it drops the new nonce, causing
older nonces to be used; change the logic in connStatePusher.OnSnapshotRequest
to coalesce by non-blocking enqueue of the newest nonce: when attempting to
send, if the channel is full drain/consume all pending values from p.snapshotReq
and then send the newest nonce so the buffer holds only the latest request;
apply the same drain-and-replace pattern to the analogous handler at the other
occurrence (lines referenced in the comment) so pending snapshot requests always
resolve to the most recent nonce.
- Around line 261-275: computeDeltaFromSource currently only iterates current
SnapshotAllRemotePeers and never emits deletions, so keys present in
connStatePusher.lastPushed but missing from SnapshotAllRemotePeers are never
reconciled; update computeDeltaFromSource in connStatePusher to also iterate
lastPushed and detect pubkeys absent from the current snapshot and emit explicit
removal events (e.g., a PeerStateChangeEvent marked as deletion or a distinct
event type) for those pubkeys; use existing symbols SnapshotAllRemotePeers,
lastPushed, PeerStateChangeEvent and isMaterialChange to decide when to emit
updates vs deletions, ensuring the store receives explicit removal entries
instead of relying on TTL or full snapshots.
In `@client/internal/debouncer/debouncer.go`:
- Around line 28-53: The Debouncer can run the wrong/duplicate fn when an old
timer callback races with a re-arm; fix Trigger and Stop by using a
timer-identity guard and clearing d.fn when cancelling/executing: in Trigger
(method Debouncer.Trigger) after creating the timer store it as a local variable
(e.g. myTimer := d.timer) and capture that in the time.AfterFunc closure; inside
the closure lock d.mu and check if d.timer == myTimer — if not, return;
otherwise read f := d.fn, set d.fn = nil and set d.timer = nil before unlocking
and then call f(); in Stop (method Debouncer.Stop) under d.mu, stop the timer if
present and set d.timer = nil and d.fn = nil so an
already-fired-but-not-yet-executed callback cannot observe a retained fn.
In `@client/internal/engine.go`:
- Around line 354-357: The Stop() path currently sets e.connStatePusher = nil
without unregistering callbacks installed by Start(), which can leave closures
referencing the pusher and cause panics; update Stop() (and/or implement an
Unregister method on the connStatePusher type) to first unregister or remove all
registered conn-state callbacks (the ones installed in Start()) and only then
stop and nil out e.connStatePusher, ensuring any closures cannot dereference a
nil e.connStatePusher (refer to e.connStatePusher, Start(), and Stop() to locate
the affected logic).
In `@client/internal/peer/guard/guard.go`:
- Around line 145-147: The direct call to g.onNetworkChange() in the guard code
can panic and crash the reconnect flow; locate the invocation (inside the guard
type/method where g.onNetworkChange is executed) and wrap the callback
invocation in a panic-safe boundary by deferring a recover to swallow and log
the panic (or handle it) so it doesn't propagate — e.g., invoke onNetworkChange
inside a small closure that defers a recover, captures the panic, and logs it
via the guard's logger or an error path; ensure you leave the original call site
(g.onNetworkChange) intact but protected by this recover wrapper.
In `@client/internal/peer/handshaker.go`:
- Around line 122-129: readICEListener() only snapshots h.iceListener and
releases h.mu before Listen() invokes it, allowing RemoveICEListener() to race
and let a detached listener still be called; fix by adding a removal-generation
check: add a generation field (e.g., iceGen int64) on Handshaker, increment it
inside RemoveICEListener() under h.mu, have readICEListener() return both the
listener and the current generation (or capture the generation in the returned
closure), and in Listen() re-check the current generation under h.mu (or compare
atomically) right before invoking the callback and skip invocation if the
generation changed; update the symbols readICEListener, RemoveICEListener,
Listen, and any code touching h.iceListener to use this generation check so a
detached listener cannot be called after removal.
In `@client/internal/peer/worker_ice.go`:
- Around line 218-231: IsConnected reads w.lastKnownState under w.muxAgent but
onConnectionStateChange updates lastKnownState without taking that lock, causing
a data race; fix by acquiring the same mutex (w.muxAgent.Lock() / Unlock() or
defer Unlock()) around all writes to w.lastKnownState inside
onConnectionStateChange so readers via IsConnected and any other accessors are
synchronized with the callback updates on WorkerICE.
In `@client/server/setconfig_test.go`:
- Around line 204-214: The test is advertising connection-mode and timeout
fields as allowed no-ops in SetConfigRequest which misrepresents the RPC
surface; update the test to stop permitting these fields until the daemon
actually implements them by removing "ConnectionMode", "P2PTimeoutSeconds",
"RelayTimeoutSeconds", and "P2PRetryMaxSeconds" from the allowed-fields table in
setconfig_test.go (and the analogous entries at the later block around lines
279-289), or alternatively implement wiring in the SetConfigRequest handler to
persist/apply those fields and update the test to assert persistence; reference
the SetConfigRequest handling logic and the named fields above when making the
change.
In `@client/ui/peers_tab.go`:
- Line 83: The showFull.OnChanged handler currently calls render() directly on
the Fyne UI goroutine which blocks while render() does a synchronous conn.Status
RPC; change the OnChanged handler so it does not perform RPCs on the UI thread:
either (A) run render() asynchronously off the UI goroutine (spawn a goroutine
or use the same background worker the ticker uses) so conn.Status is executed
off the UI thread, or (B) better, cache the last FullStatus snapshot and update
only the rows/presentation when showFull changes (have showFull.OnChanged
rebuild rows from the cached FullStatus rather than calling conn.Status). Update
references to showFull, render(), conn.Status, and the FullStatus cache/usage
accordingly.
In `@management/server/http/handlers/peer_connections/handler_test.go`:
- Around line 40-46: The fake GetPeerByPubKey currently ignores the account
parameter (signature GetPeerByPubKey(_ context.Context, _, pubKey string)) and
returns any peer by pubkey, allowing cross-account leaks; update the
implementation in fakeAM.GetPeerByPubKey to enforce account scoping by using the
second parameter (accountID) — either change fakeAM.peersByKey to be keyed by
accountID -> pubKey (e.g., map[string]map[string]*nbpeer.Peer) or, if peersByKey
stores Peer objects with an AccountID field, check that p.AccountID == accountID
before returning; if there is no match return nil, errors.New("not found").
In `@management/server/http/handlers/peer_connections/handler.go`:
- Around line 161-189: The buildResponse function performs an N+1 lookup by
calling h.account.GetPeerByPubKey for each entry to set apiEntry.RemoteFQDN;
change this to fetch all peers for the account once (or otherwise build a
request-scoped map/cache) before the loop and then look up remote FQDNs from
that map when populating apiEntry.RemoteFQDN. Locate buildResponse and replace
per-entry GetPeerByPubKey calls with a single batch call (or cache population)
using account APIs, index the results by pubkey, and fall back to the existing
best-effort behavior if a pubkey is missing.
In `@management/server/peer_connections/store.go`:
- Around line 73-100: The code currently accepts a delta push as authoritative
when prev == nil; change the logic in the block around stored/prev/m so that if
!m.GetFullSnapshot() && prev == nil you do NOT store the delta into s.maps
(i.e., reject/ignore the update and return early or skip assigning
s.maps[peerPubKey]). Specifically, in the function handling peer map updates,
detect m.GetFullSnapshot() == false and prev == nil and abort the merge flow
that sets stored and s.maps[peerPubKey] (keep existing memEntry untouched) until
a full snapshot arrives; keep references to stored, prev, merged, m, and
memEntry to locate the code. Ensure updatedAt is not changed for ignored deltas.
---
Duplicate comments:
In `@client/internal/peer/ice_backoff.go`:
- Around line 161-169: SetMaxBackoff currently rebuilds the scheduler but does
not clear an existing suspension, so calling SetMaxBackoff(0) can leave a peer
suspended until the old nextRetry elapses; update SetMaxBackoff (in
iceBackoffState) to, under the same lock, when d == 0 clear any active
suspension by resetting s.suspended (and s.nextRetry to zero/empty) before
rebuilding s.bo and assigning s.maxBackoff so the peer is immediately
unsuspended when backoff is disabled.
In `@client/internal/peer/status.go`:
- Around line 422-465: The code in updatePeerStateLocked builds router/peer
notifications after releasing d.mux which allows races; while still holding the
lock you must construct/capture any closures and snapshots (call
snapshotRouterPeersLocked, notifyConnStateChange(receivedState.PubKey,
peerState) and any notifyPeerStateChangeListeners equivalents) into local
variables, then release d.mux and invoke those captured closures (and call
d.notifier.peerListChanged) outside the lock; update the other updatePeer*Locked
helpers similarly so all listener/router notification closures and snapshots are
created under the lock before unlocking.
- Around line 506-540: In UpdatePeerRemoteMeta, you currently update Remote*
fields for an online peer (d.peers[pubKey]) or an offline entry
(d.offlinePeers[i]) and return without notifying subscribers; capture (snapshot)
the required notifications — e.g., the per-peer connection-state listener
callback and any peer-list/router change events — while still holding d.mux
(after mutating the peer record), then release the lock and dispatch those
notifications outside the lock in both the online branch (when st exists) and
the offline branch (when offlinePeers[i].PubKey matches); ensure you record
which peer key and the minimal changed state needed for the notifications so the
post-unlock dispatch logic can run safely.
In `@management/internals/shared/grpc/conversion.go`:
- Around line 263-267: The code currently uses derefUint32OrZero to set
cfgP2pRetryMax which collapses nil and explicit 0; instead preserve the
p2pRetryMaxDisabledSentinel used by toPeerConfig so consumers can tell "unset"
from "explicit 0". Change the assignment in conversion.go that sets
cfgP2pRetryMax (the variable that populates ConfiguredP2PRetryMaxSecs) to check
c.Cfg.P2pRetryMaxSeconds: if nil set p2pRetryMaxDisabledSentinel, otherwise use
the dereferenced value (keep explicit 0); replace derefUint32OrZero for this
field or add a small helper (e.g., derefUint32OrSentinel) and add a regression
test asserting the sentinel is preserved in the resulting
ConfiguredP2PRetryMaxSecs and that explicit 0 remains 0.
In `@management/internals/shared/grpc/server.go`:
- Around line 487-489: The select case reading from snapshotCh currently does
"if !ok { continue }" which leaves the case permanently ready and can spin the
goroutine; update the handling in the select branch that reads "case nonce, ok
:= <-snapshotCh:" to either set snapshotCh = nil when ok is false (to disable
that select case) or return/exit the goroutine instead, ensuring you reference
and modify the snapshotCh variable in the scope of the goroutine where it is
used (adjust logic in the surrounding loop/select to handle snapshotCh being
nil).
- Around line 1175-1183: After parsing the request in SyncPeerConnections (using
parseRequest) avoid blindly writing to peerConnections; instead resolve the
sender to a registered peer/account (e.g. call your existing lookup/resolve
method using peerKey or the peer public key) and if that lookup fails return an
error, only calling s.peerConnections.Put(peerKey.String(), pcm) when the
peer/account is confirmed registered; ensure you reference peerKey and pcm and
return a clear permission/unauthorized error for unregistered senders.
In `@management/server/http/handler.go`:
- Around line 143-149: Construct the peer-connections handler via the
permissions manager so the endpoints enforce Peers RBAC: instead of registering
peer_connections_http.NewHandler(...) directly, create the handler with
peer_connections_http.NewHandler(peerConnStore, &pcAccountManagerAdapter{am:
accountManager, nmc: networkMapController}, peerConnRouter) and then wrap its
methods with the permissions middleware (or RequirePermission) to enforce Peers
read on GetPeerConnections and Peers write on PostRefresh (similar to how
peers.AddEndpoints does it); register the wrapped functions with
router.HandleFunc("/peers/{peerId}/connections",
wrappedGet).Methods("GET","OPTIONS") and
router.HandleFunc("/peers/{peerId}/connections/refresh",
wrappedPost).Methods("POST","OPTIONS").
In `@management/server/peer_connections/store.go`:
- Around line 68-70: The current seq-check in the peer map update (using s.maps,
prev, m.GetSeq(), prev.m.GetSeq()) wrongly drops authoritative full snapshots
with lower seqs; change the guard so you only return when the incoming map is
not an authoritative full snapshot — e.g. replace the condition "if prev != nil
&& m.GetSeq() > 0 && m.GetSeq() <= prev.m.GetSeq() { return }" with a check that
also requires the map to be non-full/ non-authoritative (something like "&&
!m.IsFullSnapshot()" or "&& !m.GetMeta().GetAuthoritative()") before returning;
ensure you use the actual predicate your Map type exposes to detect an
authoritative/full snapshot so full repairs are accepted even if seq is lower.
In `@management/server/peer.go`:
- Around line 1228-1235: GetPeerByPubKey currently returns a plain fmt.Errorf
when the peer.AccountID doesn't match, causing inconsistent HTTP handling;
replace that fmt.Errorf with the same typed status/account-mismatch error used
by the other peer lookup APIs so the HTTP layer treats it consistently. Locate
GetPeerByPubKey (and compare with the other peer lookup functions that call
am.Store.GetPeerByPeerPubKey) and return the same typed error value/function
(the project’s account/peer status error helper) instead of fmt.Errorf,
preserving the error message/context about pubKey and accountID.
---
Nitpick comments:
In `@client/ui/network.go`:
- Line 125: The auto-refresh is repainting the network grid even when the Peers
tab is active; modify startAutoRefresh (the call site and/or its implementation
that invokes updateNetworksBasedOnDisplayTab) to check the currently selected
tab before refreshing networks: gate the call to updateNetworksBasedOnDisplayTab
(and any network-related refresh work for allGrid/overlappingGrid/exitNodeGrid)
so it only runs when tabs.Selected().Text != peersText, leaving the peers_tab.go
30s peers ticker to handle peers updates; ensure you reference tabs and
peersText consistently so only the visible tab's data is polled.
- Around line 60-76: The selectAll and deselectAll button handlers call
getGridAndFilterFromTab which dereferences tabs.Selected().Text; add the same
nil-guard used in refreshBtn: check if tabs.Selected() != nil before calling
getGridAndFilterFromTab and otherwise return (or no-op). Update the selectAllBtn
and deselectAllBtn callbacks to early-return when tabs.Selected() is nil, then
call getGridAndFilterFromTab and proceed to
s.selectAllFilteredNetworks/s.deselectAllFilteredNetworks and
s.updateNetworksBasedOnDisplayTab.
In `@client/ui/peers_tab.go`:
- Around line 219-273: Sonar flags buildPeerDetailText as too complex; split its
linear sections into small helpers to reduce cognitive complexity. Create helper
functions (e.g., writeBasicPeerFields(sb *strings.Builder, p *proto.PeerState),
writeEndpointFields(sb *strings.Builder, p *proto.PeerState),
writeMetadataFields(sb *strings.Builder, p *proto.PeerState), and
writeFullPeerFields(sb *strings.Builder, p *proto.PeerState)) that contain the
corresponding fmt.Fprintf blocks currently in buildPeerDetailText, then rewrite
buildPeerDetailText to instantiate the strings.Builder and call those helpers
(and only append the "\n--- Full details ---\n" header before calling
writeFullPeerFields when full==true). Ensure all existing uses of p.Get... and
orDashStr/peerLatencyStr/humanBytes remain unchanged inside the helpers.
- Around line 51-52: In buildPeersTabContent, the temporary call context is
created with context.WithTimeout(context.Background(), 5*time.Second) which
ignores the parent ctx; change it to derive from the parent (use
context.WithTimeout(ctx, 5*time.Second)) so callCtx and cancel propagate
cancellation from the UI/process ctx (affecting the Status RPC and any in-flight
operations).
In `@management/server/http/testing/testing_tools/channel/channel.go`:
- Line 138: The call to NewAPIHandler passes seven consecutive nils after
serviceManager which makes it unclear which optional parameters are being set to
nil; update the apiHandler construction (the call to http2.NewAPIHandler where
apiHandler, err := ...) to either replace the trailing nils with named inline
comments indicating which parameter each nil corresponds to (e.g., /*
peerConnections */, /* snapshotRouter */, etc.), or create a small
helper/wrapper that accepts the required arguments and supplies named nil
defaults before calling NewAPIHandler; reference the NewAPIHandler signature to
use the exact parameter names and ensure the helper or inline comments map each
nil to the correct parameter.
In `@management/server/peer_connections/store_test.go`:
- Around line 38-52: Add a second assertion in TestMemoryStore_DeepCopyOnReturn
to cover mutation-after-Put: after creating the original
mgmProto.PeerConnectionMap (the value passed into s.Put in the test that uses
newStoreWithClock), mutate fields on that original protobuf (e.g., change
Entries[0].RemotePubkey or LastHandshake) immediately after calling
s.Put("peerA", ...), then call s.Get("peerA") and assert the returned copy still
has the original values ("peerB" etc.). This verifies s.Put clones the input
rather than retaining the caller's pointer.
In `@management/server/peer_test.go`:
- Line 1159: The test calls ToSyncResponse with a nil groupNamesByPeerID which
leaves the new group-enrichment branch untested; update the call in peer_test.go
to pass a small non-nil map (e.g. map[peer.ID][]string with the test peer's ID
mapped to one or two group names) into the position currently nil, then add an
assertion that the returned response's peer metadata (from response.Peers or the
specific peer entry asserted elsewhere in this fixture) contains those group
names so the groupNamesByPeerID -> peer-groups projection is exercised.
In `@management/server/peer/peer_test.go`:
- Around line 145-152: The test TestPeerSystemMeta_isEqual_ChecksEffectiveFields
currently only asserts a difference for EffectiveConnectionMode; update the test
to either (a) rename it to
TestPeerSystemMeta_isEqual_ChecksEffectiveConnectionMode if you intend to keep
it narrow, or preferably (b) extend it to cover all four Effective* fields by
using a table-driven approach that mutates each of EffectiveConnectionMode,
EffectiveRelayTimeoutSecs, EffectiveP2PTimeoutSecs, and EffectiveP2PRetryMaxSecs
on the copied other PeerSystemMeta and asserts isEqual returns false for each;
target the PeerSystemMeta.isEqual method and add one subcase per field so
omissions in isEqual will fail the test.
In `@shared/management/client/grpc.go`:
- Around line 483-490: The snapshot callback is invoked inline on the Sync
receive loop (see c.onSnapshotRequest usage and cb(req.GetNonce())), which can
block the stream; change the invocation to dispatch the callback asynchronously
(e.g., spawn a goroutine) after acquiring and releasing snapMu so the recv loop
isn’t blocked, and ensure the goroutine is safe (recover panics) or document
that onSnapshotRequest must be non-blocking if you choose not to spawn a
goroutine.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 7a14726c-0e01-4303-b6cf-2bc41685f606
⛔ Files ignored due to path filters (3)
client/proto/daemon.pb.gois excluded by!**/*.pb.goshared/management/proto/management.pb.gois excluded by!**/*.pb.goshared/management/proto/management_grpc.pb.gois excluded by!**/*.pb.go
📒 Files selected for processing (60)
client/android/client.goclient/android/peer_notifier.goclient/android/preferences.goclient/cmd/service.goclient/cmd/service_installer.goclient/internal/conn_mgr.goclient/internal/conn_mgr_test.goclient/internal/conn_state_pusher.goclient/internal/conn_state_pusher_test.goclient/internal/debouncer/debouncer.goclient/internal/engine.goclient/internal/engine_pusher_adapters.goclient/internal/engine_test.goclient/internal/lazyconn/inactivity/manager.goclient/internal/lazyconn/inactivity/manager_test.goclient/internal/lazyconn/support.goclient/internal/peer/conn.goclient/internal/peer/env_test.goclient/internal/peer/guard/guard.goclient/internal/peer/handshaker.goclient/internal/peer/ice_backoff.goclient/internal/peer/ice_backoff_test.goclient/internal/peer/status.goclient/internal/peer/status_test.goclient/internal/peer/worker_ice.goclient/proto/daemon.protoclient/server/server.goclient/server/setconfig_test.goclient/ui/client_ui.goclient/ui/const.goclient/ui/event_handler.goclient/ui/network.goclient/ui/peers_tab.gomanagement/internals/controllers/network_map/controller/controller.gomanagement/internals/server/boot.gomanagement/internals/shared/grpc/conversion.gomanagement/internals/shared/grpc/server.gomanagement/server/account/manager.gomanagement/server/account/manager_mock.gomanagement/server/http/handler.gomanagement/server/http/handlers/peer_connections/handler.gomanagement/server/http/handlers/peer_connections/handler_test.gomanagement/server/http/testing/testing_tools/channel/channel.gomanagement/server/management_proto_test.gomanagement/server/management_test.gomanagement/server/mock_server/account_mock.gomanagement/server/peer.gomanagement/server/peer/peer.gomanagement/server/peer/peer_test.gomanagement/server/peer_connections/snapshot_router.gomanagement/server/peer_connections/snapshot_router_test.gomanagement/server/peer_connections/store.gomanagement/server/peer_connections/store_test.gomanagement/server/peer_test.gomanagement/server/store/sql_store.goshared/management/client/client.goshared/management/client/client_test.goshared/management/client/grpc.goshared/management/client/mock.goshared/management/proto/management.proto
💤 Files with no reviewable changes (2)
- client/ui/const.go
- client/ui/event_handler.go
🚧 Files skipped from review as they are similar to previous changes (4)
- client/cmd/service_installer.go
- client/android/client.go
- management/server/peer/peer.go
- client/internal/conn_state_pusher_test.go
| // SetConnectionMode stores a local override for the connection mode. | ||
| // Pass an empty string to clear the override (revert to following the | ||
| // server-pushed value). | ||
| func (p *Preferences) SetConnectionMode(mode string) { | ||
| m := mode | ||
| p.configInput.ConnectionMode = &m | ||
| } |
There was a problem hiding this comment.
SetConnectionMode("") does not actually clear the override.
The doc says "Pass an empty string to clear the override (revert to following the server-pushed value)", but the implementation stores &"" (non-nil pointer to an empty string). GetConnectionMode checks p.configInput.ConnectionMode != nil first, so after SetConnectionMode("") the getter returns "" and never falls through to read cfg.ConnectionMode — the local override is set-to-empty rather than cleared, which is observably different from "follow server".
🛡️ Proposed fix
func (p *Preferences) SetConnectionMode(mode string) {
- m := mode
- p.configInput.ConnectionMode = &m
+ if mode == "" {
+ p.configInput.ConnectionMode = nil
+ return
+ }
+ m := mode
+ p.configInput.ConnectionMode = &m
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // SetConnectionMode stores a local override for the connection mode. | |
| // Pass an empty string to clear the override (revert to following the | |
| // server-pushed value). | |
| func (p *Preferences) SetConnectionMode(mode string) { | |
| m := mode | |
| p.configInput.ConnectionMode = &m | |
| } | |
| // SetConnectionMode stores a local override for the connection mode. | |
| // Pass an empty string to clear the override (revert to following the | |
| // server-pushed value). | |
| func (p *Preferences) SetConnectionMode(mode string) { | |
| if mode == "" { | |
| p.configInput.ConnectionMode = nil | |
| return | |
| } | |
| m := mode | |
| p.configInput.ConnectionMode = &m | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@client/android/preferences.go` around lines 325 - 331, SetConnectionMode
currently stores a non-nil pointer to an empty string so calling
SetConnectionMode("") doesn’t clear the override; change SetConnectionMode to
set p.configInput.ConnectionMode = nil when mode == "" and otherwise set it to a
pointer to the provided string (e.g., allocate a new string variable or use &m)
so GetConnectionMode’s nil-check will correctly fall through to the
server-pushed value; update the logic in the SetConnectionMode method that
manipulates p.configInput.ConnectionMode accordingly.
| func (p *connStatePusher) OnSnapshotRequest(nonce uint64) { | ||
| select { | ||
| case p.snapshotReq <- nonce: | ||
| default: | ||
| } | ||
| } |
There was a problem hiding this comment.
Use the latest refresh nonce when coalescing snapshot requests.
This currently dequeues the oldest nonce and drops newer ones once the buffer fills, so a client waiting on since=<latest> can time out even though a full snapshot was pushed for an older nonce. If this path is meant to coalesce, it needs to drain pending requests and respond with the newest nonce.
Suggested fix
case nonce := <-p.snapshotReq:
+ for {
+ select {
+ case newer := <-p.snapshotReq:
+ nonce = newer
+ default:
+ goto flushSnapshot
+ }
+ }
+ flushSnapshot:
if p.source != nil {
p.flushFull(p.source.SnapshotAllRemotePeers(), nonce)
}Also applies to: 174-177
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@client/internal/conn_state_pusher.go` around lines 107 - 112,
OnSnapshotRequest currently sends nonce into p.snapshotReq but if the channel
buffer is full it drops the new nonce, causing older nonces to be used; change
the logic in connStatePusher.OnSnapshotRequest to coalesce by non-blocking
enqueue of the newest nonce: when attempting to send, if the channel is full
drain/consume all pending values from p.snapshotReq and then send the newest
nonce so the buffer holds only the latest request; apply the same
drain-and-replace pattern to the analogous handler at the other occurrence
(lines referenced in the comment) so pending snapshot requests always resolve to
the most recent nonce.
| func (p *connStatePusher) computeDeltaFromSource() []PeerStateChangeEvent { | ||
| if p.source == nil { | ||
| return nil | ||
| } | ||
| all := p.source.SnapshotAllRemotePeers() | ||
| p.mu.Lock() | ||
| defer p.mu.Unlock() | ||
| delta := make([]PeerStateChangeEvent, 0, len(all)) | ||
| for _, ev := range all { | ||
| prev, had := p.lastPushed[ev.Pubkey] | ||
| if !had || isMaterialChange(prev, ev) { | ||
| delta = append(delta, ev) | ||
| } | ||
| } | ||
| return delta |
There was a problem hiding this comment.
Removed peers never get reconciled out of the pushed state.
computeDeltaFromSource only emits currently present peers, so a peer that disappeared from SnapshotAllRemotePeers() never produces any delta. Because the store's delta path only upserts entries, management keeps showing that stale peer until a manual full snapshot or TTL expiry.
This needs a removal path: either detect missing keys and force a full snapshot, or extend the delta format/store merge so deletions are explicit.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@client/internal/conn_state_pusher.go` around lines 261 - 275,
computeDeltaFromSource currently only iterates current SnapshotAllRemotePeers
and never emits deletions, so keys present in connStatePusher.lastPushed but
missing from SnapshotAllRemotePeers are never reconciled; update
computeDeltaFromSource in connStatePusher to also iterate lastPushed and detect
pubkeys absent from the current snapshot and emit explicit removal events (e.g.,
a PeerStateChangeEvent marked as deletion or a distinct event type) for those
pubkeys; use existing symbols SnapshotAllRemotePeers, lastPushed,
PeerStateChangeEvent and isMaterialChange to decide when to emit updates vs
deletions, ensuring the store receives explicit removal entries instead of
relying on TTL or full snapshots.
| func (d *Debouncer) Trigger(fn func()) { | ||
| d.mu.Lock() | ||
| defer d.mu.Unlock() | ||
| d.fn = fn | ||
| if d.timer != nil { | ||
| d.timer.Stop() | ||
| } | ||
| d.timer = time.AfterFunc(d.delay, func() { | ||
| d.mu.Lock() | ||
| f := d.fn | ||
| d.mu.Unlock() | ||
| if f != nil { | ||
| f() | ||
| } | ||
| }) | ||
| } | ||
|
|
||
| // Stop cancels any pending fn. Safe to call multiple times. | ||
| func (d *Debouncer) Stop() { | ||
| d.mu.Lock() | ||
| defer d.mu.Unlock() | ||
| if d.timer != nil { | ||
| d.timer.Stop() | ||
| d.timer = nil | ||
| } | ||
| } |
There was a problem hiding this comment.
Potential double-execution: in-flight AfterFunc callback can run the latest fn after a fast Trigger re-arm.
Sequence:
Trigger(fnA)schedules timert1and storesd.fn=fnA.t1fires; the callback goroutine starts but is preempted before acquiringd.mu.Trigger(fnB)acquiresd.mu, setsd.fn=fnB, callst1.Stop()(returnsfalse, already fired), schedulest2, releases.- The
t1callback finally acquiresd.mu, readsf=fnB, callsfnB(). t2later fires and runsfnB()again.
Result: fnB runs twice (and fnA is silently dropped — this is the documented behaviour, but the duplicate run of fnB is not). For SyncMeta this is just a redundant RPC, but it can mask convergence bugs and breaks the documented "last-write-wins" contract. Similarly, Stop() cannot prevent an already-fired callback from executing because it doesn't null d.fn.
♻️ Proposed fix using a timer-identity guard
func (d *Debouncer) Trigger(fn func()) {
d.mu.Lock()
defer d.mu.Unlock()
d.fn = fn
if d.timer != nil {
d.timer.Stop()
}
- d.timer = time.AfterFunc(d.delay, func() {
- d.mu.Lock()
- f := d.fn
- d.mu.Unlock()
- if f != nil {
- f()
- }
- })
+ var t *time.Timer
+ t = time.AfterFunc(d.delay, func() {
+ d.mu.Lock()
+ // Stale callback from a timer that was Stop()'d/replaced after firing.
+ if d.timer != t {
+ d.mu.Unlock()
+ return
+ }
+ f := d.fn
+ d.fn = nil
+ d.timer = nil
+ d.mu.Unlock()
+ if f != nil {
+ f()
+ }
+ })
+ d.timer = t
}
func (d *Debouncer) Stop() {
d.mu.Lock()
defer d.mu.Unlock()
if d.timer != nil {
d.timer.Stop()
d.timer = nil
}
+ d.fn = nil
}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@client/internal/debouncer/debouncer.go` around lines 28 - 53, The Debouncer
can run the wrong/duplicate fn when an old timer callback races with a re-arm;
fix Trigger and Stop by using a timer-identity guard and clearing d.fn when
cancelling/executing: in Trigger (method Debouncer.Trigger) after creating the
timer store it as a local variable (e.g. myTimer := d.timer) and capture that in
the time.AfterFunc closure; inside the closure lock d.mu and check if d.timer ==
myTimer — if not, return; otherwise read f := d.fn, set d.fn = nil and set
d.timer = nil before unlocking and then call f(); in Stop (method
Debouncer.Stop) under d.mu, stop the timer if present and set d.timer = nil and
d.fn = nil so an already-fired-but-not-yet-executed callback cannot observe a
retained fn.
| if e.connStatePusher != nil { | ||
| e.connStatePusher.Stop() | ||
| e.connStatePusher = nil | ||
| } |
There was a problem hiding this comment.
Clear the registered conn-state callbacks before nil-ing the pusher.
Start() installs callbacks that dereference e.connStatePusher, but Stop() tears the pusher down by setting the field to nil without unregistering those callbacks. A late status update or snapshot request during teardown can still hit those closures and panic on a nil receiver.
Suggested fix
if e.connStatePusher != nil {
+ e.statusRecorder.SetConnStateListener(nil)
+ e.mgmClient.SetSnapshotRequestHandler(nil)
e.connStatePusher.Stop()
e.connStatePusher = nil
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if e.connStatePusher != nil { | |
| e.connStatePusher.Stop() | |
| e.connStatePusher = nil | |
| } | |
| if e.connStatePusher != nil { | |
| e.statusRecorder.SetConnStateListener(nil) | |
| e.mgmClient.SetSnapshotRequestHandler(nil) | |
| e.connStatePusher.Stop() | |
| e.connStatePusher = nil | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@client/internal/engine.go` around lines 354 - 357, The Stop() path currently
sets e.connStatePusher = nil without unregistering callbacks installed by
Start(), which can leave closures referencing the pusher and cause panics;
update Stop() (and/or implement an Unregister method on the connStatePusher
type) to first unregister or remove all registered conn-state callbacks (the
ones installed in Start()) and only then stop and nil out e.connStatePusher,
ensuring any closures cannot dereference a nil e.connStatePusher (refer to
e.connStatePusher, Start(), and Stop() to locate the affected logic).
| // Phase 3.7i Connection-Mode fields. Currently in the proto so | ||
| // daemons can advertise them via GetConfig, but SetConfig does | ||
| // NOT apply them at runtime — they're only persisted via | ||
| // `netbird service install/reconfigure --connection-mode/...` | ||
| // (writes the active profile file directly; daemon picks up on | ||
| // next start). Wiring them through SetConfig is a follow-up | ||
| // task. Listed here so the structural test passes. | ||
| "ConnectionMode": true, | ||
| "P2PTimeoutSeconds": true, | ||
| "RelayTimeoutSeconds": true, | ||
| "P2PRetryMaxSeconds": true, |
There was a problem hiding this comment.
SetConfigRequest now advertises fields that this suite explicitly allows the daemon to ignore.
These comments/documentation lock in a silent no-op for connection_mode and the timeout fields: the RPC accepts them, the structural test passes, and callers still get no runtime effect. That is a misleading public contract for SetConfig. Either wire these fields through SetConfig now and assert persistence here, or keep them out of the request surface until the handler supports them.
Also applies to: 279-289
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@client/server/setconfig_test.go` around lines 204 - 214, The test is
advertising connection-mode and timeout fields as allowed no-ops in
SetConfigRequest which misrepresents the RPC surface; update the test to stop
permitting these fields until the daemon actually implements them by removing
"ConnectionMode", "P2PTimeoutSeconds", "RelayTimeoutSeconds", and
"P2PRetryMaxSeconds" from the allowed-fields table in setconfig_test.go (and the
analogous entries at the later block around lines 279-289), or alternatively
implement wiring in the SetConfigRequest handler to persist/apply those fields
and update the test to assert persistence; reference the SetConfigRequest
handling logic and the named fields above when making the change.
| }) | ||
| } | ||
|
|
||
| showFull.OnChanged = func(_ bool) { render() } |
There was a problem hiding this comment.
showFull toggle blocks the UI thread on a gRPC call.
OnChanged is invoked on the Fyne UI goroutine. render() performs a synchronous conn.Status RPC with a 5-second timeout (Lines 51‑57) before returning, so a slow/unhealthy daemon can freeze the entire window for up to 5 s every time the user toggles the checkbox. The periodic ticker correctly invokes render() from a background goroutine; the OnChanged path should follow the same pattern (or, since toggling showFull only changes presentation, just re-render rows from the last-known status snapshot without an RPC round-trip).
🛠 Minimal fix: run render() off the UI thread
- showFull.OnChanged = func(_ bool) { render() }
+ showFull.OnChanged = func(_ bool) { go render() }A nicer follow-up is to cache the last FullStatus and let the toggle just rebuild rows locally, avoiding the daemon round-trip entirely.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| showFull.OnChanged = func(_ bool) { render() } | |
| showFull.OnChanged = func(_ bool) { go render() } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@client/ui/peers_tab.go` at line 83, The showFull.OnChanged handler currently
calls render() directly on the Fyne UI goroutine which blocks while render()
does a synchronous conn.Status RPC; change the OnChanged handler so it does not
perform RPCs on the UI thread: either (A) run render() asynchronously off the UI
goroutine (spawn a goroutine or use the same background worker the ticker uses)
so conn.Status is executed off the UI thread, or (B) better, cache the last
FullStatus snapshot and update only the rows/presentation when showFull changes
(have showFull.OnChanged rebuild rows from the cached FullStatus rather than
calling conn.Status). Update references to showFull, render(), conn.Status, and
the FullStatus cache/usage accordingly.
| func (a *fakeAM) GetPeerByPubKey(_ context.Context, _, pubKey string) (*nbpeer.Peer, error) { | ||
| p, ok := a.peersByKey[pubKey] | ||
| if !ok { | ||
| return nil, errors.New("not found") | ||
| } | ||
| return p, nil | ||
| } |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win
Scope the fake pubkey lookup by accountID, or the RBAC test can’t catch cross-account enrichment leaks.
Right now this helper will return any peer keyed by pubkey, regardless of the caller's account. That makes the 200-path test pass even if the handler accidentally enriches PeerConnectionMap entries with metadata from another account. Given that RBAC on these routes is a stated review focus, the fake should enforce account membership too.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@management/server/http/handlers/peer_connections/handler_test.go` around
lines 40 - 46, The fake GetPeerByPubKey currently ignores the account parameter
(signature GetPeerByPubKey(_ context.Context, _, pubKey string)) and returns any
peer by pubkey, allowing cross-account leaks; update the implementation in
fakeAM.GetPeerByPubKey to enforce account scoping by using the second parameter
(accountID) — either change fakeAM.peersByKey to be keyed by accountID -> pubKey
(e.g., map[string]map[string]*nbpeer.Peer) or, if peersByKey stores Peer objects
with an AccountID field, check that p.AccountID == accountID before returning;
if there is no match return nil, errors.New("not found").
| func (h *Handler) buildResponse(ctx context.Context, accountID, dnsDomain, pubkey string, m *mgmProto.PeerConnectionMap) apiResponse { | ||
| resp := apiResponse{ | ||
| PeerPubkey: pubkey, | ||
| Seq: m.GetSeq(), | ||
| FullSnapshot: m.GetFullSnapshot(), | ||
| InResponseTo: m.GetInResponseToNonce(), | ||
| Entries: make([]apiEntry, 0, len(m.GetEntries())), | ||
| } | ||
| for _, e := range m.GetEntries() { | ||
| entry := apiEntry{ | ||
| RemotePubkey: e.GetRemotePubkey(), | ||
| ConnType: connTypeToStr(e.GetConnType()), | ||
| LatencyMs: e.GetLatencyMs(), | ||
| Endpoint: e.GetEndpoint(), | ||
| RelayServer: e.GetRelayServer(), | ||
| RxBytes: e.GetRxBytes(), | ||
| TxBytes: e.GetTxBytes(), | ||
| } | ||
| if hs := e.GetLastHandshake(); hs != nil && hs.IsValid() { | ||
| entry.LastHandshake = hs.AsTime().Format(time.RFC3339) | ||
| } | ||
| // Enrich remote_fqdn via account-peer lookup (best-effort). | ||
| if rPeer, err := h.account.GetPeerByPubKey(ctx, accountID, e.GetRemotePubkey()); err == nil && rPeer != nil { | ||
| entry.RemoteFQDN = rPeer.FQDN(dnsDomain) | ||
| } | ||
| resp.Entries = append(resp.Entries, entry) | ||
| } | ||
| return resp | ||
| } |
There was a problem hiding this comment.
N+1 peer lookup in buildResponse.
For every entry in the connection map, h.account.GetPeerByPubKey is called once to enrich RemoteFQDN. A peer connected to N other peers triggers N lookups per HTTP request, and this endpoint is intended to be polled (the since= long-poll path can issue one such request every 200 ms while waiting). On accounts with many peers and/or many simultaneous viewers this becomes a noticeable backend hot path.
Consider one of:
- Batch-fetching all peers for the account once and indexing them by pubkey before the loop.
- Adding a small in-memory pubkey→FQDN cache scoped to the request.
- Pushing the FQDN resolution into the store layer so the snapshot already carries enriched names.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@management/server/http/handlers/peer_connections/handler.go` around lines 161
- 189, The buildResponse function performs an N+1 lookup by calling
h.account.GetPeerByPubKey for each entry to set apiEntry.RemoteFQDN; change this
to fetch all peers for the account once (or otherwise build a request-scoped
map/cache) before the loop and then look up remote FQDNs from that map when
populating apiEntry.RemoteFQDN. Locate buildResponse and replace per-entry
GetPeerByPubKey calls with a single batch call (or cache population) using
account APIs, index the results by pubkey, and fall back to the existing
best-effort behavior if a pubkey is missing.
| stored := proto.Clone(m).(*mgmProto.PeerConnectionMap) | ||
| if !m.GetFullSnapshot() && prev != nil { | ||
| merged := proto.Clone(prev.m).(*mgmProto.PeerConnectionMap) | ||
| merged.Seq = m.GetSeq() | ||
| merged.FullSnapshot = false | ||
| // Keep the latest non-zero refresh-nonce. A snapshot pushed in | ||
| // response to nonce N must remain reachable via GET ?since=N | ||
| // even when a regular delta with InResponseToNonce=0 arrives | ||
| // shortly after; otherwise the refresh polling client gives up | ||
| // and falls back to the next sync interval (~60 s gap). | ||
| if m.GetInResponseToNonce() > merged.GetInResponseToNonce() { | ||
| merged.InResponseToNonce = m.GetInResponseToNonce() | ||
| } | ||
| byKey := make(map[string]int, len(merged.Entries)) | ||
| for i, e := range merged.Entries { | ||
| byKey[e.GetRemotePubkey()] = i | ||
| } | ||
| for _, ne := range stored.Entries { | ||
| if idx, ok := byKey[ne.GetRemotePubkey()]; ok { | ||
| merged.Entries[idx] = ne | ||
| } else { | ||
| merged.Entries = append(merged.Entries, ne) | ||
| byKey[ne.GetRemotePubkey()] = len(merged.Entries) - 1 | ||
| } | ||
| } | ||
| stored = merged | ||
| } | ||
| s.maps[peerPubKey] = &memEntry{m: stored, updatedAt: s.clock.Now()} |
There was a problem hiding this comment.
Do not treat a delta as authoritative when there is no cached baseline.
If the cache was evicted or the server restarted, prev is nil and a delta-only push gets stored as the whole map. From that point, reads return only the recently changed peers until some later full snapshot repairs the cache.
At minimum, reject delta updates when prev == nil and wait for a full snapshot before serving data for that peer.
Suggested fix
stored := proto.Clone(m).(*mgmProto.PeerConnectionMap)
+ if prev == nil && !m.GetFullSnapshot() {
+ return
+ }
if !m.GetFullSnapshot() && prev != nil {
merged := proto.Clone(prev.m).(*mgmProto.PeerConnectionMap)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| stored := proto.Clone(m).(*mgmProto.PeerConnectionMap) | |
| if !m.GetFullSnapshot() && prev != nil { | |
| merged := proto.Clone(prev.m).(*mgmProto.PeerConnectionMap) | |
| merged.Seq = m.GetSeq() | |
| merged.FullSnapshot = false | |
| // Keep the latest non-zero refresh-nonce. A snapshot pushed in | |
| // response to nonce N must remain reachable via GET ?since=N | |
| // even when a regular delta with InResponseToNonce=0 arrives | |
| // shortly after; otherwise the refresh polling client gives up | |
| // and falls back to the next sync interval (~60 s gap). | |
| if m.GetInResponseToNonce() > merged.GetInResponseToNonce() { | |
| merged.InResponseToNonce = m.GetInResponseToNonce() | |
| } | |
| byKey := make(map[string]int, len(merged.Entries)) | |
| for i, e := range merged.Entries { | |
| byKey[e.GetRemotePubkey()] = i | |
| } | |
| for _, ne := range stored.Entries { | |
| if idx, ok := byKey[ne.GetRemotePubkey()]; ok { | |
| merged.Entries[idx] = ne | |
| } else { | |
| merged.Entries = append(merged.Entries, ne) | |
| byKey[ne.GetRemotePubkey()] = len(merged.Entries) - 1 | |
| } | |
| } | |
| stored = merged | |
| } | |
| s.maps[peerPubKey] = &memEntry{m: stored, updatedAt: s.clock.Now()} | |
| stored := proto.Clone(m).(*mgmProto.PeerConnectionMap) | |
| if prev == nil && !m.GetFullSnapshot() { | |
| return | |
| } | |
| if !m.GetFullSnapshot() && prev != nil { | |
| merged := proto.Clone(prev.m).(*mgmProto.PeerConnectionMap) | |
| merged.Seq = m.GetSeq() | |
| merged.FullSnapshot = false | |
| // Keep the latest non-zero refresh-nonce. A snapshot pushed in | |
| // response to nonce N must remain reachable via GET ?since=N | |
| // even when a regular delta with InResponseToNonce=0 arrives | |
| // shortly after; otherwise the refresh polling client gives up | |
| // and falls back to the next sync interval (~60 s gap). | |
| if m.GetInResponseToNonce() > merged.GetInResponseToNonce() { | |
| merged.InResponseToNonce = m.GetInResponseToNonce() | |
| } | |
| byKey := make(map[string]int, len(merged.Entries)) | |
| for i, e := range merged.Entries { | |
| byKey[e.GetRemotePubkey()] = i | |
| } | |
| for _, ne := range stored.Entries { | |
| if idx, ok := byKey[ne.GetRemotePubkey()]; ok { | |
| merged.Entries[idx] = ne | |
| } else { | |
| merged.Entries = append(merged.Entries, ne) | |
| byKey[ne.GetRemotePubkey()] = len(merged.Entries) - 1 | |
| } | |
| } | |
| stored = merged | |
| } | |
| s.maps[peerPubKey] = &memEntry{m: stored, updatedAt: s.clock.Now()} |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@management/server/peer_connections/store.go` around lines 73 - 100, The code
currently accepts a delta push as authoritative when prev == nil; change the
logic in the block around stored/prev/m so that if !m.GetFullSnapshot() && prev
== nil you do NOT store the delta into s.maps (i.e., reject/ignore the update
and return early or skip assigning s.maps[peerPubKey]). Specifically, in the
function handling peer map updates, detect m.GetFullSnapshot() == false and prev
== nil and abort the merge flow that sets stored and s.maps[peerPubKey] (keep
existing memEntry untouched) until a full snapshot arrives; keep references to
stored, prev, merged, m, and memEntry to locate the code. Ensure updatedAt is
not changed for ignored deltas.
907bc1c to
3e758d1
Compare
There was a problem hiding this comment.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
management/server/peer/peer.go (1)
199-217:⚠️ Potential issue | 🟡 Minor | ⚡ Quick win
isEmpty()not updated for the newEffective*fields — inconsistent withisEqual.
isEqual(Line 192‑195) now compares the four Phase 3.7iEffective*fields, butisEmptydoes not include them. BecauseUpdateMetaIfNewearly-returns whenisEmpty()is true (Line 257‑259), a hypothetical update that sets only theEffective*fields would be silently dropped. In practice peers always populateHostname/GoOS/etc., so this is unlikely to matter today, but the asymmetry is a footgun for future call sites that might construct a meta with only runtime fields.Suggested fix
func (p PeerSystemMeta) isEmpty() bool { return p.Hostname == "" && ... p.Environment.Cloud == "" && p.Environment.Platform == "" && - len(p.Files) == 0 + len(p.Files) == 0 && + p.EffectiveConnectionMode == "" && + p.EffectiveRelayTimeoutSecs == 0 && + p.EffectiveP2PTimeoutSecs == 0 && + p.EffectiveP2PRetryMaxSecs == 0 }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@management/server/peer/peer.go` around lines 199 - 217, The isEmpty method on PeerSystemMeta is missing the four Effective* runtime fields that isEqual now compares, causing updates that set only those Effective* fields to be treated as empty; update PeerSystemMeta.isEmpty to include the same Effective* fields checked by isEqual (i.e., add the four Effective... fields from the struct into the emptiness check) so that UpdateMetaIfNew's early-return behavior matches isEqual's comparison logic.
♻️ Duplicate comments (20)
client/internal/lazyconn/inactivity/manager.go (1)
87-95:⚠️ Potential issue | 🟡 Minor | ⚡ Quick win
NewManagerWithTwoTimersstill bypasses threshold validation.The Phase-1 path runs
validateInactivityThreshold(rejecting sub-minute values), butNewManagerWithTwoTimersaccepts any non-zeroiceTimeout/relayTimeoutdirectly, so a misconfigured caller that passes e.g.30*time.Second(or a negative duration) will produce a manager that tears down on the first tick or behaves nonsensically. Consider routing both timeouts through the same minimum-bound check (treating0as "disabled" specially) so behavior is consistent across constructors.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@client/internal/lazyconn/inactivity/manager.go` around lines 87 - 95, NewManagerWithTwoTimers currently accepts any non-zero durations and bypasses validateInactivityThreshold; update it to validate both iceTimeout and relayTimeout the same way as the Phase-1 path: for each timeout, if timeout != 0 call validateInactivityThreshold(timeout) (treat 0 as "disabled" and skip validation), and if validation fails handle it consistently (e.g. log the error and return nil to reject the constructor) before calling newManager; reference NewManagerWithTwoTimers, validateInactivityThreshold, and newManager when making the change.client/internal/peer/worker_ice.go (1)
545-563:⚠️ Potential issue | 🟠 MajorSynchronize
lastKnownStatewrites withmuxAgent.
IsConnected()readslastKnownStateunderw.muxAgent, but this callback still writes it without the same lock on Lines 550 and 561-562. That leaves a real data race and lets callers make network-change decisions from unsynchronized state.Suggested fix
case ice.ConnectionStateConnected: - w.lastKnownState = ice.ConnectionStateConnected + w.muxAgent.Lock() + w.lastKnownState = ice.ConnectionStateConnected + w.muxAgent.Unlock() w.logSuccessfulPaths(agent) w.conn.onICEConnected() return case ice.ConnectionStateFailed, ice.ConnectionStateDisconnected, ice.ConnectionStateClosed: sessionChanged := w.closeAgent(agent, dialerCancel) - if w.lastKnownState == ice.ConnectionStateConnected { - w.lastKnownState = ice.ConnectionStateDisconnected + w.muxAgent.Lock() + wasConnected := w.lastKnownState == ice.ConnectionStateConnected + if wasConnected { + w.lastKnownState = ice.ConnectionStateDisconnected + } + w.muxAgent.Unlock() + if wasConnected { w.conn.onICEStateDisconnected(sessionChanged) }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@client/internal/peer/worker_ice.go` around lines 545 - 563, The onConnectionStateChange callback in WorkerICE writes w.lastKnownState without holding w.muxAgent, creating a race with IsConnected which reads under that mutex; update the callback in the onConnectionStateChange method (the Connected and the Failed/Disconnected/Closed branches where lastKnownState is set) to acquire and release w.muxAgent around any writes to w.lastKnownState (same lock used by IsConnected), keeping other logic (logSuccessfulPaths, closeAgent, w.conn.onICEStateDisconnected/onICEConnected) unchanged and only surrounding the assignments to lastKnownState with the mutex to ensure synchronized access.management/server/peer.go (1)
1228-1237:⚠️ Potential issue | 🟠 Major | ⚡ Quick winReturn a typed peer/account error instead of
fmt.Errorf.Other peer-lookup paths in this file (e.g. Line 320, 396, 422, 449) return
status.NewPeerNotPartOfAccountError()for the same account-mismatch case. The plainfmt.Errorfhere will be classified asInternalby the HTTP layer instead of the properNotFound/account-scoped error, which is inconsistent and will produce 5xx responses for what is really a not-found/forbidden situation — particularly noticeable since the newpeer_connectionsHTTP handler relies on this method.Suggested fix
func (am *DefaultAccountManager) GetPeerByPubKey(ctx context.Context, accountID, pubKey string) (*nbpeer.Peer, error) { p, err := am.Store.GetPeerByPeerPubKey(ctx, store.LockingStrengthNone, pubKey) if err != nil { return nil, err } if p.AccountID != accountID { - return nil, fmt.Errorf("peer with pubkey %s not in account %s", pubKey, accountID) + return nil, status.NewPeerNotPartOfAccountError() } return p, nil }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@management/server/peer.go` around lines 1228 - 1237, Replace the generic fmt.Errorf in GetPeerByPubKey with the typed account/peer error used elsewhere so the HTTP layer classifies it correctly; specifically, in DefaultAccountManager.GetPeerByPubKey, when p.AccountID != accountID return status.NewPeerNotPartOfAccountError(accountID, pubKey) (matching other callers like the ones at lines ~320/396/422/449) instead of fmt.Errorf, ensuring the function returns the standardized error type.client/ui/peers_tab.go (1)
83-83:⚠️ Potential issue | 🟠 Major | ⚡ Quick win
showFulltoggle still performs a synchronous gRPC on the UI goroutine.Same concern as a previous review:
OnChangedruns on Fyne's UI goroutine, andrender()performs aconn.StatusRPC with a 5 s timeout (Lines 51‑57). A slow daemon will freeze the window for up to 5 s on every toggle. The same applies to the footer Refresh button innetwork.go(Line 70‑76), sincepeersBundle.Refreshisrenderitself; the periodic ticker already runsrender()off-thread.Minimal fix
- showFull.OnChanged = func(_ bool) { render() } + showFull.OnChanged = func(_ bool) { go render() }A nicer follow-up is to cache the last
FullStatusand let the toggle just rebuild rows locally without a daemon round-trip.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@client/ui/peers_tab.go` at line 83, The showFull.OnChanged handler currently calls render() on Fyne's UI goroutine which blocks because render() does a conn.Status RPC with timeout; change the handler to avoid any synchronous RPC on the UI thread by either (a) spawning a goroutine to perform the RPC/fetch and then schedule UI updates back on the UI goroutine (e.g., using fyne.CurrentApp().Driver().RunOnMain or equivalent) or (b) implement a cached FullStatus in peersBundle and have showFull.OnChanged only rebuild the rows from that cached data so no daemon round-trip occurs; also update peersBundle.Refresh and the Refresh button in network.go to follow the same pattern (do network fetches off-thread and only update UI on the main goroutine).client/internal/peer/ice_backoff.go (1)
161-169:⚠️ Potential issue | 🟠 Major | ⚡ Quick win
SetMaxBackoff(0)still does not clear an in-flight suspension.When the management server pushes a value of 0 to disable backoff,
maxBackoff/boare swapped butsuspended/nextRetryfrom a priormarkFailure()remain.IsSuspended()will then keep honoring the stale timer until it naturally expires, so the live "disable backoff" push has no immediate effect on an already-suspended peer.Suggested fix
func (s *iceBackoffState) SetMaxBackoff(d time.Duration) { s.mu.Lock() defer s.mu.Unlock() if d == s.maxBackoff { return } s.maxBackoff = d s.bo = buildBackoff(d) + if d == 0 { + s.suspended = false + s.nextRetry = time.Time{} + } }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@client/internal/peer/ice_backoff.go` around lines 161 - 169, SetMaxBackoff on iceBackoffState swaps maxBackoff/bo but doesn't clear an already-set suspension, so calling SetMaxBackoff(0) doesn't immediately unsuspend peers; change SetMaxBackoff to, under the same mu.Lock(), detect when d == 0 and clear the suspended state and any pending retry by setting s.suspended = false and resetting s.nextRetry to the zero time (or otherwise cancelling any in-flight timer), so IsSuspended() returns false immediately after disabling backoff; ensure this logic lives alongside the existing maxBackoff/bo assignment and uses iceBackoffState's mu for thread safety, referencing SetMaxBackoff, suspended, nextRetry, markFailure, IsSuspended, bo and maxBackoff.client/internal/peer/handshaker.go (1)
116-153:⚠️ Potential issue | 🟠 Major | ⚖️ Poor tradeoff
RemoveICEListenercan still dispatch one stale ICE callback.
readICEListener()(Line 125‑129) only snapshots the function pointer underh.muand releases the mutex beforeListen()invokes it (Lines 148‑153 / 173‑178). IfRemoveICEListener()runs in the gap between snapshot and call, the offer/answer is still delivered to the just-detached listener. This meansDetachICE/ ICE-backoff is not a hard stop — at least one extra ICE callback can fire after detachment.Either hold the mutex across dispatch (acceptable since the listener should be lightweight, e.g. a non-blocking enqueue), or add a generation counter that the dispatcher re-checks under lock immediately before calling.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@client/internal/peer/handshaker.go` around lines 116 - 153, The snapshot-and-call race lets RemoveICEListener detach but still have one last callback delivered; fix by ensuring the mutex protects the actual dispatch: in Listen, acquire h.mu, read h.iceListener into a local, and invoke that listener while still holding h.mu (or alternatively implement a generation counter: add an h.iceGen incremented in RemoveICEListener, have readICEListener return both listener and gen, then re-lock and verify gen unchanged before calling). Update RemoveICEListener, readICEListener, and Listen to use the chosen approach (referencing h.mu, h.iceListener, RemoveICEListener, readICEListener, Listen) so no ICE callback can be invoked after detachment.client/android/preferences.go (2)
325-331:⚠️ Potential issue | 🟠 Major | ⚡ Quick winClear the connection-mode override instead of storing
&"".
SetConnectionMode("")still leavesp.configInput.ConnectionModenon-nil, soGetConnectionMode()never falls back to the server-pushed value despite the method comment promising that behavior.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@client/android/preferences.go` around lines 325 - 331, SetConnectionMode currently assigns a pointer to an empty string which keeps p.configInput.ConnectionMode non-nil; change it so when mode == "" it clears the override by setting p.configInput.ConnectionMode = nil, otherwise store a pointer to the provided value (use the existing Preferences.SetConnectionMode and p.configInput.ConnectionMode symbols to locate the fix and ensure GetConnectionMode will then fall back to the server-pushed value).
347-352:⚠️ Potential issue | 🟠 Major | ⚡ Quick winDon't cast signed timeout values straight to
uint32.These setters still treat
0as an explicit override instead of “clear”, and negative inputs wrap into huge timeout values. Guardsecs <= 0by clearing the pointer, and clamp oversized values before converting.Also applies to: 368-373, 388-393
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@client/android/preferences.go` around lines 347 - 352, SetRelayTimeoutSeconds currently casts signed secs directly to uint32 which lets negative values wrap and treats 0 as an explicit override; change it to treat secs <= 0 as “clear” by setting p.configInput.RelayTimeoutSeconds = nil, otherwise clamp secs to a safe maximum (e.g. uint32 max or your configured max constant) before converting and storing a pointer to the uint32 value; apply the same guard-and-clamp pattern to the other timeout setter functions in this file (the two other timeout setters nearby) so none perform direct signed-to-uint32 casts or treat 0 as an override.client/internal/peer/status.go (2)
453-465:⚠️ Potential issue | 🟠 Major | ⚡ Quick winBuild the notification snapshots while
d.muxis still held.These
updatePeer*Lockedpaths still unlock first and then callnotifyConnStateChange()/notifyPeerStateChangeListeners(), even though both helpers snapshot listener/subscriber state and are documented to run underd.mux. That reintroduces the race with concurrent listener or subscription changes.Also applies to: 664-671, 717-724, 769-772, 820-823
506-544:⚠️ Potential issue | 🟠 Major | ⚡ Quick winRemote-meta-only changes still never trigger an immediate refresh.
UpdatePeerRemoteMeta()updatesRemoteLiveOnline, connection-mode fields, timeout fields, and groups, then returns from both branches without notifying peer-state subscribers or refreshing the peer list. A management-only liveness/mode flip will stay stale until some unrelated transport update or the next poll.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@client/internal/peer/status.go` around lines 506 - 544, UpdatePeerRemoteMeta currently mutates Remote* fields on d.peers and d.offlinePeers and returns without signaling any change; update the method to invoke the same immediate refresh/notification path used when transport/state changes occur so remote-only changes propagate instantly. After you set the Remote* fields in both the online branch (where you modify st and assign d.peers[pubKey]) and the offline branch (where you modify d.offlinePeers[i]), call the existing peer-state notification/refresh helper the codebase uses for transport updates (the same function UpdatePeerTransport or the refresh/notify function that other updates call) passing the pubKey (or the updated peer entry) so subscribers are notified and the peer list is refreshed immediately. Ensure the call happens before returning nil in both branches.client/internal/peer/conn.go (1)
202-212:⚠️ Potential issue | 🟠 Major | ⚡ Quick winPreserve the explicit “disable ICE backoff” state across reopen.
SetIceBackoffMax(0)persists0intoconn.config.P2pRetryMaxSeconds, butOpen()interprets0asDefaultP2PRetryMax. After the next close/open, a peer that explicitly disabled backoff comes back with the default cap instead.Also applies to: 1166-1172
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@client/internal/peer/conn.go` around lines 202 - 212, Open() currently treats conn.config.P2pRetryMaxSeconds == 0 as DefaultP2PRetryMax which overrides an explicit SetIceBackoffMax(0) disable; change the logic so a stored 0 is preserved as “disabled” instead of being replaced by DefaultP2PRetryMax: compute backoffCap := time.Duration(conn.config.P2pRetryMaxSeconds) * time.Second and remove the branch that substitutes DefaultP2PRetryMax when backoffCap == 0, then pass that backoffCap unchanged into newIceBackoff(...) or iceBackoff.SetMaxBackoff(...); apply the same change to the other occurrence around the SetMaxBackoff/newIceBackoff calls at 1166-1172 so an explicit 0 remains disabled across reopen.management/internals/shared/grpc/conversion.go (1)
263-289:⚠️ Potential issue | 🟠 Major | ⚡ Quick winPreserve the disabled sentinel in
ConfiguredP2PRetryMaxSecs.
toPeerConfig()already encodes an explicit0retry cap asp2pRetryMaxDisabledSentinel, but Line 267 collapsesniland explicit zero withderefUint32OrZero. That makesConfiguredP2PRetryMaxSecslose the “user explicitly disabled backoff” marker and diverge from the wire format used elsewhere.Suggested fix
if c.Cfg != nil { cfgConnMode = derefStringOrEmpty(c.Cfg.ConnectionMode) cfgRelayTO = derefUint32OrZero(c.Cfg.RelayTimeoutSeconds) cfgP2pTO = derefUint32OrZero(c.Cfg.P2pTimeoutSeconds) - cfgP2pRetryMax = derefUint32OrZero(c.Cfg.P2pRetryMaxSeconds) + if c.Cfg.P2pRetryMaxSeconds != nil { + if *c.Cfg.P2pRetryMaxSeconds == 0 { + cfgP2pRetryMax = p2pRetryMaxDisabledSentinel + } else { + cfgP2pRetryMax = *c.Cfg.P2pRetryMaxSeconds + } + } }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@management/internals/shared/grpc/conversion.go` around lines 263 - 289, The code in toPeerConfig collapses a nil vs explicit-zero P2P retry cap by using derefUint32OrZero for cfgP2pRetryMax, losing the p2pRetryMaxDisabledSentinel marker; update the assignment so ConfiguredP2PRetryMaxSecs preserves the sentinel: when c.Cfg is nil keep cfgP2pRetryMax as zero/empty as before, but when c.Cfg.P2pRetryMaxSeconds is non-nil set cfgP2pRetryMax to derefUint32OrZero(c.Cfg.P2pRetryMaxSeconds) and when it is explicitly zero set it to p2pRetryMaxDisabledSentinel (i.e., check c.Cfg.P2pRetryMaxSeconds for nil vs zero before assigning ConfiguredP2PRetryMaxSecs) so toPeerConfig retains the explicit-disabled wire marker.client/internal/conn_mgr.go (1)
286-324:⚠️ Potential issue | 🟠 Major | ⚡ Quick winRestart the lazy manager when only the inactivity thresholds change.
initLazyManager()snapshotsrelayTimeoutSecsandp2pTimeoutSecsintomanager.Config, but this path only restarts onmodeChanged. A management push that keepsp2p-lazy/p2p-dynamicand only changes the relay/ICE inactivity timeouts will keep enforcing the old thresholds until the daemon restarts.Suggested fix
- prev := e.mode + prev := e.mode + prevRelay := e.relayTimeoutSecs + prevP2P := e.p2pTimeoutSecs e.mode = newMode e.relayTimeoutSecs = newRelay e.p2pTimeoutSecs = newP2P e.p2pRetryMaxSecs = newP2pRetry e.propagateP2pRetryMaxToConns() wasManaged := modeUsesLazyMgr(prev) isManaged := modeUsesLazyMgr(newMode) modeChanged := prev != newMode + timeoutsChanged := prevRelay != newRelay || prevP2P != newP2P if modeChanged && wasManaged && !isManaged { log.Infof("lazy/dynamic connection manager disabled by management push (mode=%s)", newMode) e.closeManager(ctx) e.statusRecorder.UpdateLazyConnection(false) return nil } - if modeChanged && wasManaged && isManaged { + if (modeChanged || timeoutsChanged) && wasManaged && isManaged { // Switching between lazy and dynamic at runtime: tear down the - // existing manager so initLazyManager picks up the new timeouts. - log.Infof("lazy/dynamic mode change %s -> %s, restarting manager", prev, newMode) + // existing manager so initLazyManager picks up the new mode/timeouts. + log.Infof("lazy/dynamic config change (mode %s -> %s, relay %ds -> %ds, ice %ds -> %ds), restarting manager", + prev, newMode, prevRelay, newRelay, prevP2P, newP2P) e.closeManager(ctx) e.statusRecorder.UpdateLazyConnection(false) }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@client/internal/conn_mgr.go` around lines 286 - 324, The code only restarts the lazy/dynamic manager when mode changes, so updates that only modify e.relayTimeoutSecs, e.p2pTimeoutSecs or e.p2pRetryMaxSecs leave the running manager using stale thresholds; change the logic to detect when any of those timeout fields differ from their previous values while modeUsesLazyMgr(newMode) is true and e.lazyConnMgr != nil, and in that case call e.closeManager(ctx), update status via e.statusRecorder.UpdateLazyConnection(false), then reinitialize via e.initLazyManager(ctx), call e.startModeSideEffects(), and ensure e.propagateP2pRetryMaxToConns() is called so the new values are picked up by the manager and active connections (reference symbols: e.relayTimeoutSecs, e.p2pTimeoutSecs, e.p2pRetryMaxSecs, modeUsesLazyMgr, e.lazyConnMgr, e.closeManager, e.initLazyManager, e.startModeSideEffects, e.propagateP2pRetryMaxToConns, e.statusRecorder.UpdateLazyConnection).management/server/http/handler.go (1)
142-149:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftPeer-connections REST routes still missing explicit RBAC.
/peers/{peerId}/connections(read) and/peers/{peerId}/connections/refresh(write — triggers a snapshot dispatch over gRPC) are wired here, butpeer_connections_http.NewHandleris not given apermissionsManager, so unlike the siblingpeers.AddEndpointsat Line 140 the handler can't callValidateUserPermissions(..., modules.Peers, operations.Read|Write). The current authorization story relies entirely onaccountManager.GetPeerto fail for unauthorized callers, which is a side-effect rather than a deliberate RBAC contract — and the refresh endpoint mutates network state.Thread
permissionsManagerintopeer_connections_http.NewHandlerand gateGetPeerConnectionsonPeers.ReadandPostRefreshonPeers.Writebefore merging.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@management/server/http/handler.go` around lines 142 - 149, The peer-connections routes lack explicit RBAC: thread the existing permissionsManager into peer_connections_http.NewHandler (instead of omitting it) and update the handler so GetPeerConnections calls ValidateUserPermissions(..., modules.Peers, operations.Read) and PostRefresh calls ValidateUserPermissions(..., modules.Peers, operations.Write) before proceeding; adjust the pcAccountManagerAdapter/new handler constructor usage to accept and store the permissionsManager and enforce these checks (use the same ValidateUserPermissions pattern used by peers.AddEndpoints and accountManager.GetPeer to perform explicit authorization).management/internals/shared/grpc/server.go (2)
487-498:⚠️ Potential issue | 🟠 Major | ⚡ Quick winClosed
snapshotChcauses the select to spin.
SnapshotRouter.Registeris documented to close the previous channel on fast reconnect (per the Phase 3.7i design). When that happens,nonce, ok := <-snapshotChimmediately returns withok==falseandcontinuefalls back into the sameselect, which sees the closed channel as permanently ready — pegging this goroutine untilsrv.Context()is cancelled. SetsnapshotCh = nil(orreturn) on close so the case is disabled.🛡️ Suggested fix
case nonce, ok := <-snapshotCh: if !ok { - continue + snapshotCh = nil + continue }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@management/internals/shared/grpc/server.go` around lines 487 - 498, The select case reading from snapshotCh spins when the channel is closed because on receive it returns ok==false and continue leaves the case enabled; update the handling in the select case for "case nonce, ok := <-snapshotCh:" so that when ok==false you disable the case by setting snapshotCh = nil (or return) instead of continue, thereby preventing the select from repeatedly firing on the closed channel; keep the rest of the logic (building snapMsg and calling s.sendUpdate) unchanged and reference snapshotCh, s.sendUpdate, and the select block around srv.Context().
1175-1185:⚠️ Potential issue | 🔴 Critical | ⚡ Quick win
SyncPeerConnectionsaccepts uploads from any keypair.Unlike
SyncMeta(Line 1146) andSync(Line 238), this handler stops at envelope decryption — it never resolves thepeerKeyagainst the account/peer store. Any client that owns a WireGuard keypair can encrypt aPeerConnectionMapagainst the management server's pubkey and have it written into the TTL store under their own pubkey, growing the map unboundedly and seeding state that the dashboard will surface.Mirror the
Syncflow'sGetAccountIDForPeerKey(orGetPeerByPeerPubKey) lookup before callings.peerConnections.Put:🔒️ Suggested fix
func (s *Server) SyncPeerConnections(ctx context.Context, req *proto.EncryptedMessage) (*proto.Empty, error) { pcm := &proto.PeerConnectionMap{} peerKey, err := s.parseRequest(ctx, req, pcm) if err != nil { return nil, err } + if _, err := s.accountManager.GetAccountIDForPeerKey(ctx, peerKey.String()); err != nil { + if e, ok := internalStatus.FromError(err); ok && e.Type() == internalStatus.NotFound { + return nil, status.Errorf(codes.PermissionDenied, "peer is not registered") + } + return nil, mapError(ctx, err) + } if s.peerConnections != nil { s.peerConnections.Put(peerKey.String(), pcm) } return &proto.Empty{}, nil }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@management/internals/shared/grpc/server.go` around lines 1175 - 1185, The SyncPeerConnections handler currently writes decrypted PeerConnectionMap for any valid WireGuard key; update Server.SyncPeerConnections to resolve the decrypted peerKey to an account/peer before storing: call the same lookup used in Sync (e.g. GetAccountIDForPeerKey or GetPeerByPeerPubKey) with the peerKey returned from s.parseRequest, return an error (permission denied / not found) if no account/peer is found, and only call s.peerConnections.Put with the resolved account/peer identifier (peer ID or account ID string) instead of peerKey.String(); ensure you reuse the same error handling pattern as in Sync/SyncMeta.client/ui/client_ui.go (1)
741-750:⚠️ Potential issue | 🟠 Major | ⚡ Quick winInvalid timeout input is still silently coerced to 0.
parseUint32Field(Line 656) returns0on empty, malformed, or out-of-range input. At Line 745–750 the result is unconditionally written intoreq.RelayTimeoutSeconds/P2PTimeoutSeconds/P2PRetryMaxSeconds, so a typo (e.g.30s,-5,abc) ends up clearing the local override without any user feedback — andvalidateSettings(Line 577) does not check these fields.hasConnectionModeChanges(Line 638–640) likewise normalizes via the same lossy parser, so the change-detection path can also miss real edits.Validate the three entries in
validateSettings(or haveparseUint32Fieldreturn(value, ok)) and surface an error before the window closes, so non-empty text either parses or aborts the save.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@client/ui/client_ui.go` around lines 741 - 750, The code silently coerces invalid timeout text to 0 because parseUint32Field returns 0 on error and the values are unconditionally assigned to req.RelayTimeoutSeconds / req.P2PTimeoutSeconds / req.P2PRetryMaxSeconds (used also by hasConnectionModeChanges), so invalid input clears overrides without feedback; fix by changing parsing/validation so non-empty invalid input is rejected: either update parseUint32Field to return (uint32, bool/error) and propagate the error back to the caller, or add explicit checks in validateSettings that iRelayTimeout.Text, iP2pTimeout.Text, and iP2pRetryMax.Text either are empty or successfully parse to a uint32 (using a new parse helper), and if parsing fails surface a user-facing error and abort save rather than writing 0 into req.* and closing the window.client/internal/conn_state_pusher.go (2)
107-112:⚠️ Potential issue | 🟠 Major | ⚡ Quick winCoalesce snapshot requests to the newest nonce.
Current behavior can drop the newest nonce when the queue is full and then flush an older nonce first, which breaks the “latest nonce” coalescing contract.
Proposed fix
func (p *connStatePusher) OnSnapshotRequest(nonce uint64) { select { case p.snapshotReq <- nonce: + return default: } + // Queue full: keep only the latest nonce. + for { + select { + case <-p.snapshotReq: + default: + select { + case p.snapshotReq <- nonce: + default: + } + return + } + } } @@ case nonce := <-p.snapshotReq: + // Coalesce pending requests and honor the latest nonce. + for { + select { + case newer := <-p.snapshotReq: + nonce = newer + default: + goto flushSnapshot + } + } + flushSnapshot: if p.source != nil { p.flushFull(p.source.SnapshotAllRemotePeers(), nonce) }Also applies to: 174-177
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@client/internal/conn_state_pusher.go` around lines 107 - 112, OnSnapshotRequest in connStatePusher currently drops the newest nonce when the snapshotReq channel is full; change the send logic so that if p.snapshotReq would block you first drain one value from the channel (non-blocking receive) and then send the new nonce, ensuring the newest nonce is always enqueued. Implement this replace-oldest behavior in the OnSnapshotRequest implementation(s) (the occurrence at lines ~107-112 and the similar block at ~174-177) by attempting a non-blocking send, and on failure perform a non-blocking receive from p.snapshotReq before sending the new nonce.
261-275:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftReconcile removed peers during source-delta computation.
Removed peers never generate an update here, so management can retain stale peer entries until TTL or an unrelated full snapshot.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@client/internal/conn_state_pusher.go` around lines 261 - 275, computeDeltaFromSource currently only emits events for peers present in the source snapshot and misses peers that were removed; update computeDeltaFromSource to also detect pubkeys present in p.lastPushed but missing from p.source.SnapshotAllRemotePeers() and append corresponding removal events so stale entries are reconciled. Specifically: inside computeDeltaFromSource (while holding p.mu), build a set/map of current pubkeys from all, then iterate p.lastPushed to find keys not in that set and append a PeerStateChangeEvent representing removal for each (use the project’s canonical removed status/flag for PeerStateChangeEvent, e.g., PeerStateRemoved); ensure delta capacity accounts for these extra events and keep locking semantics (p.mu.Lock()/defer Unlock) as in the original function.client/internal/engine.go (1)
354-357:⚠️ Potential issue | 🟠 Major | ⚡ Quick winUnregister conn-state callbacks before clearing the pusher.
Stopnilse.connStatePusherwhile callbacks installed inStartcan still fire. That leaves a nil-deref path during teardown races.Proposed fix
if e.connStatePusher != nil { + e.statusRecorder.SetConnStateListener(nil) + e.mgmClient.SetSnapshotRequestHandler(nil) e.connStatePusher.Stop() e.connStatePusher = nil }Also applies to: 632-637
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@client/internal/engine.go` around lines 354 - 357, The teardown clears e.connStatePusher while callbacks registered in Start can still fire, creating a nil-deref race; update the shutdown sequence in the methods around e.connStatePusher (the Stop/Start pair) to first unregister or remove any conn-state callbacks (the same callback IDs or unregister API used in Start) while the pusher is still valid, then call e.connStatePusher.Stop(), and only after Stop returns set e.connStatePusher = nil; apply the same change to the other occurrence around lines 632-637 to ensure callbacks are removed before nil-ing the pusher.
🧹 Nitpick comments (7)
client/internal/lazyconn/support.go (1)
26-29: 💤 Low value
-dev-/-ci-embedded marker check could match non-dev release stringsThe
strings.Contains(agentVersion, "-dev-")andstrings.Contains(agentVersion, "-ci-")guards will returntruefor any version string that happens to contain those substrings — e.g. a hypothetical"1.0.0-preview-ci-build"or a version emitted by a third-party packaging pipeline. Because these cases short-circuit before theminVersioncomparison, a peer running an older build with such a version string would be incorrectly classified as lazy-conn–capable.The risk is low in practice (the formats are tightly controlled), but it's worth documenting the assumption explicitly or tightening the match, e.g. requiring the marker to be the second segment only (
"0.0.0-dev-*").🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@client/internal/lazyconn/support.go` around lines 26 - 29, The current substring checks on agentVersion (strings.Contains(agentVersion, "-dev-") / "-ci-") can match unintended versions; update the guard in support.go to parse agentVersion by '-' and require the marker to be the second segment (e.g. split := strings.SplitN(agentVersion, "-", 3) and test split[1] == "dev" or "ci") so only versions like "0.0.0-dev-*" or "0.0.0-ci-*" are treated as dev/ci, leaving the minVersion comparison intact for all others.client/internal/peer/guard/guard.go (1)
144-147: ⚡ Quick winSynchronous callback can stall the reconnect select loop.
g.onNetworkChange()is called inline inside thesrReconnectedChanselect branch. The callback is documented as "reset iceBackoff + recreate workerICE," which may acquire locks, send on channels, or do other non-trivial work. While it blocks, the guard cannot processrelayedConnDisconnected,iCEConnDisconnected, or ticker events, potentially delaying reconnection.Dispatch it in a goroutine so the select loop stays live:
♻️ Proposed fix
- // Phase 3.5 (`#5989`): notify Conn to reset iceBackoff + recreate workerICE - if g.onNetworkChange != nil { - g.onNetworkChange() - } + // Phase 3.5 (`#5989`): notify Conn to reset iceBackoff + recreate workerICE. + // Fire in a goroutine so the select loop is not stalled. + if g.onNetworkChange != nil { + go g.onNetworkChange() + }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@client/internal/peer/guard/guard.go` around lines 144 - 147, The select branch handling srReconnectedChan currently calls g.onNetworkChange() synchronously, which can block the guard's select loop and delay handling of relayedConnDisconnected, iCEConnDisconnected, and ticker events; change this to dispatch the callback in a new goroutine (e.g., go func() { g.onNetworkChange() }()) inside the srReconnectedChan branch so the select loop remains responsive, and consider wrapping the call with a recover to avoid a panic in the callback from crashing the goroutine.management/server/peer/peer_test.go (1)
145-152: ⚡ Quick winExtend coverage to the three numeric
Effective*Secsfields.The new test only exercises
EffectiveConnectionMode. The three uint32 fields (EffectiveRelayTimeoutSecs,EffectiveP2PTimeoutSecs,EffectiveP2PRetryMaxSecs) are added toisEqualin the same change — a regression that drops any of them from the comparison would not be caught.Suggested table-driven extension
func TestPeerSystemMeta_isEqual_ChecksEffectiveFields(t *testing.T) { - base := PeerSystemMeta{Hostname: "h", EffectiveConnectionMode: "p2p-dynamic"} - other := base - other.EffectiveConnectionMode = "p2p" - if base.isEqual(other) { - t.Error("isEqual should return false when EffectiveConnectionMode differs") - } + base := PeerSystemMeta{ + Hostname: "h", + EffectiveConnectionMode: "p2p-dynamic", + EffectiveRelayTimeoutSecs: 30, + EffectiveP2PTimeoutSecs: 45, + EffectiveP2PRetryMaxSecs: 900, + } + tests := []struct { + name string + mutate func(*PeerSystemMeta) + }{ + {"EffectiveConnectionMode", func(m *PeerSystemMeta) { m.EffectiveConnectionMode = "p2p" }}, + {"EffectiveRelayTimeoutSecs", func(m *PeerSystemMeta) { m.EffectiveRelayTimeoutSecs = 60 }}, + {"EffectiveP2PTimeoutSecs", func(m *PeerSystemMeta) { m.EffectiveP2PTimeoutSecs = 90 }}, + {"EffectiveP2PRetryMaxSecs", func(m *PeerSystemMeta) { m.EffectiveP2PRetryMaxSecs = 1800 }}, + } + for _, tc := range tests { + t.Run(tc.name, func(t *testing.T) { + other := base + tc.mutate(&other) + if base.isEqual(other) { + t.Errorf("isEqual should return false when %s differs", tc.name) + } + }) + } }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@management/server/peer/peer_test.go` around lines 145 - 152, The test TestPeerSystemMeta_isEqual_ChecksEffectiveFields currently only verifies EffectiveConnectionMode; extend it to also cover the three numeric fields on PeerSystemMeta (EffectiveRelayTimeoutSecs, EffectiveP2PTimeoutSecs, EffectiveP2PRetryMaxSecs) by converting the test to a small table-driven loop that starts from a base PeerSystemMeta, clones it to "other", mutates each field one at a time (including the uint32 fields) to a different value, and asserts that base.isEqual(other) returns false for each case; use the existing function name TestPeerSystemMeta_isEqual_ChecksEffectiveFields and the struct PeerSystemMeta to locate where to update the test.management/internals/server/boot.go (1)
111-125: Document the replica-local nature of the new peer-connection state.
PeerConnStore()andPeerConnRouter()are in-process only, so GET/refresh works predictably only when the dashboard request and the peer's gRPC stream land on the same management replica. If multi-instance deployments are supported, this feature will need sticky routing or a shared backend for consistent behavior.Also applies to: 127-129, 192-192
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@management/internals/server/boot.go` around lines 111 - 125, Update the comments for PeerConnStore and PeerConnRouter to explicitly state that the returned peer-connection state is in-process and replica-local (i.e., only valid for requests and gRPC streams handled by the same management replica) and that multi-replica deployments require sticky routing or a shared backend for consistent GET/refresh behavior; apply the same clarified comment wording to the other occurrences noted (around the blocks at the referenced locations) so all mentions consistently document the replica-local constraint and recommended mitigations.management/server/peer_connections/snapshot_router_test.go (1)
3-3: ⚡ Quick winAvoid blocking forever on the close assertions.
Both unregister checks do a plain receive. If the router stops closing the channel, this test hangs until the package timeout instead of failing here.
Suggested change
-import "testing" +import ( + "testing" + "time" +) @@ r := NewSnapshotRouter() ch := r.Register("peerA") r.Unregister("peerA", ch) - if _, ok := <-ch; ok { - t.Error("channel should be closed after Unregister") - } + select { + case _, ok := <-ch: + if ok { + t.Error("channel should be closed after Unregister") + } + case <-time.After(100 * time.Millisecond): + t.Fatal("timed out waiting for channel close") + } @@ // Proper Unregister with the fresh token tears it down. r.Unregister("peerA", fresh) - if _, ok := <-fresh; ok { - t.Error("fresh channel should be closed after its own Unregister") - } + select { + case _, ok := <-fresh: + if ok { + t.Error("fresh channel should be closed after its own Unregister") + } + case <-time.After(100 * time.Millisecond): + t.Fatal("timed out waiting for fresh channel close") + } }Also applies to: 28-34, 57-60
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@management/server/peer_connections/snapshot_router_test.go` at line 3, The two plain receives used to assert channel closes in snapshot_router_test.go (the unregister checks that currently do `<-ch` without timeout) can hang forever; change each to a select that waits for the close/receive case and a time.After case that calls t.Fatalf/t.Fatal on timeout (e.g., select { case <-unregisterCh: /* ok */ case <-time.After(time.Second): t.Fatalf("unregister channel not closed") }). Replace both unregister/close assertions with this pattern so the test fails fast instead of blocking.management/internals/shared/grpc/server.go (2)
435-437: 💤 Low valueGuard against a nil
snapshotRouter.
handleUpdatesdereferencess.snapshotRouterunconditionally (s.snapshotRouter.Register(...)), but the rest of the file is defensive about the companion store (if s.peerConnections != nilat Line 1181). If a future caller (test harness, alternateboot.go) ever constructsServerwithout a router, everySyncwill panic. A nil-check around the register/unregister + select case keeps the two fields consistent in defensiveness.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@management/internals/shared/grpc/server.go` around lines 435 - 437, handleUpdates currently dereferences s.snapshotRouter unguarded; wrap the Register/Unregister and related select-case logic so it only registers when s.snapshotRouter != nil. Specifically, in handleUpdates, check s.snapshotRouter != nil before calling s.snapshotRouter.Register(peerKey.String()) and defer s.snapshotRouter.Unregister(...), and ensure the select branch that uses snapshotCh is only added/used when snapshotCh is non-nil (or replace it with a guarded case). This keeps snapshotRouter usage consistent with the existing nil-guards around s.peerConnections and prevents panics when Server is constructed without a router.
487-498: ⚡ Quick winSnapshot send-error path is inconsistent with the regular update path.
The regular update branch at Line 457–460 returns the
sendUpdateerror (which already callscancelPeerRoutines), terminating the loop. This new snapshot branch (Line 496–498) only logs and falls through, butsendUpdatehas already cancelled the peer routines and the encrypted-stream send has failed — so the next iteration is operating on a half-broken stream and any subsequent update will hit the same failure. Returning the error keeps the two paths uniform and lets the stream tear down cleanly.♻️ Suggested fix
if err := s.sendUpdate(ctx, accountID, peerKey, peer, snapMsg, srv, streamStartTime); err != nil { - log.WithContext(ctx).Warnf("send snapshot request to %s: %v", peerKey.String(), err) + log.WithContext(ctx).Warnf("send snapshot request to %s: %v", peerKey.String(), err) + return err }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@management/internals/shared/grpc/server.go` around lines 487 - 498, The snapshot branch that reads from snapshotCh logs sendUpdate errors and continues, but sendUpdate already cancels peer routines (cancelPeerRoutines) and leaves the stream half-broken; make the snapshot branch mirror the regular update branch by returning the error from sendUpdate (propagate the err returned by s.sendUpdate(...) instead of only logging) so the surrounding peer loop/handler (the function that reads snapshotCh and calls sendUpdate) terminates and the stream is torn down cleanly.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 370076ef-ff00-496b-9477-7545b47086ae
⛔ Files ignored due to path filters (3)
client/proto/daemon.pb.gois excluded by!**/*.pb.goshared/management/proto/management.pb.gois excluded by!**/*.pb.goshared/management/proto/management_grpc.pb.gois excluded by!**/*.pb.go
📒 Files selected for processing (59)
client/android/client.goclient/android/peer_notifier.goclient/android/preferences.goclient/cmd/service.goclient/cmd/service_installer.goclient/internal/conn_mgr.goclient/internal/conn_mgr_test.goclient/internal/conn_state_pusher.goclient/internal/conn_state_pusher_test.goclient/internal/debouncer/debouncer.goclient/internal/engine.goclient/internal/engine_pusher_adapters.goclient/internal/engine_test.goclient/internal/lazyconn/inactivity/manager.goclient/internal/lazyconn/inactivity/manager_test.goclient/internal/lazyconn/manager/manager.goclient/internal/lazyconn/support.goclient/internal/peer/conn.goclient/internal/peer/guard/guard.goclient/internal/peer/handshaker.goclient/internal/peer/ice_backoff.goclient/internal/peer/ice_backoff_test.goclient/internal/peer/status.goclient/internal/peer/status_test.goclient/internal/peer/worker_ice.goclient/proto/daemon.protoclient/server/server.goclient/ui/client_ui.goclient/ui/const.goclient/ui/event_handler.goclient/ui/network.goclient/ui/peers_tab.gomanagement/internals/controllers/network_map/controller/controller.gomanagement/internals/server/boot.gomanagement/internals/shared/grpc/conversion.gomanagement/internals/shared/grpc/server.gomanagement/server/account/manager.gomanagement/server/account/manager_mock.gomanagement/server/http/handler.gomanagement/server/http/handlers/peer_connections/handler.gomanagement/server/http/handlers/peer_connections/handler_test.gomanagement/server/http/testing/testing_tools/channel/channel.gomanagement/server/management_proto_test.gomanagement/server/management_test.gomanagement/server/mock_server/account_mock.gomanagement/server/peer.gomanagement/server/peer/peer.gomanagement/server/peer/peer_test.gomanagement/server/peer_connections/snapshot_router.gomanagement/server/peer_connections/snapshot_router_test.gomanagement/server/peer_connections/store.gomanagement/server/peer_connections/store_test.gomanagement/server/peer_test.gomanagement/server/store/sql_store.goshared/management/client/client.goshared/management/client/client_test.goshared/management/client/grpc.goshared/management/client/mock.goshared/management/proto/management.proto
💤 Files with no reviewable changes (2)
- client/ui/event_handler.go
- client/ui/const.go
🚧 Files skipped from review as they are similar to previous changes (2)
- shared/management/client/client.go
- management/server/http/handlers/peer_connections/handler.go
| if err := applyConnectionModeFlagsToProfile(cmd); err != nil { | ||
| cmd.PrintErrf("Warning: failed to persist connection-mode flags: %v\n", err) | ||
| } |
There was a problem hiding this comment.
Don't downgrade explicit profile-write failures to warnings.
If applyConnectionModeFlagsToProfile() returns an error here, the user explicitly passed one of the new mode/timeout flags and that requested config will not reach the daemon. Letting install/reconfigure continue silently applies defaults instead of the requested settings.
Suggested fix
- if err := applyConnectionModeFlagsToProfile(cmd); err != nil {
- cmd.PrintErrf("Warning: failed to persist connection-mode flags: %v\n", err)
- }
+ if err := applyConnectionModeFlagsToProfile(cmd); err != nil {
+ return fmt.Errorf("persist connection-mode flags: %w", err)
+ }Also applies to: 263-265
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@client/cmd/service_installer.go` around lines 137 - 139, The call to
applyConnectionModeFlagsToProfile currently logs errors as warnings and
continues, which silently loses user-specified settings; change both call sites
that invoke applyConnectionModeFlagsToProfile (the install/reconfigure command
handlers) to treat a non-nil error as fatal by propagating/returning the error
(or exiting with non-zero) instead of printing a warning so the
install/reconfigure aborts and the daemon never runs with defaulted settings.
| func NewConnMgr(engineConfig *EngineConfig, statusRecorder *peer.Status, peerStore *peerstore.Store, iface lazyconn.WGIface) *ConnMgr { | ||
| e := &ConnMgr{ | ||
| envMode, envRelayTimeout := peer.ResolveModeFromEnv() | ||
|
|
||
| // First-pass resolution without server input -- updated later when | ||
| // the first NetworkMap arrives via UpdatedRemotePeerConfig. | ||
| mode, relayTimeout, p2pTimeout, p2pRetryMax := resolveConnectionMode( | ||
| envMode, envRelayTimeout, | ||
| engineConfig.ConnectionMode, engineConfig.RelayTimeoutSeconds, | ||
| engineConfig.P2pTimeoutSeconds, | ||
| engineConfig.P2pRetryMaxSeconds, | ||
| nil, | ||
| ) | ||
|
|
||
| return &ConnMgr{ | ||
| peerStore: peerStore, | ||
| statusRecorder: statusRecorder, | ||
| iface: iface, | ||
| rosenpassEnabled: engineConfig.RosenpassEnabled, | ||
| mode: mode, | ||
| relayTimeoutSecs: relayTimeout, | ||
| p2pTimeoutSecs: p2pTimeout, | ||
| p2pRetryMaxSecs: p2pRetryMax, | ||
| envMode: envMode, | ||
| envRelayTimeout: envRelayTimeout, | ||
| cfgMode: engineConfig.ConnectionMode, | ||
| cfgRelayTimeout: engineConfig.RelayTimeoutSeconds, | ||
| cfgP2pTimeout: engineConfig.P2pTimeoutSeconds, | ||
| cfgP2pRetryMax: engineConfig.P2pRetryMaxSeconds, | ||
| } | ||
| } |
There was a problem hiding this comment.
Don't keep p2p-lazy/p2p-dynamic as the effective mode when Rosenpass blocks the manager.
Lines 181-187 and 317-319 skip starting the lazy manager, but the resolved mode is still stored in e.mode. That means later mode-gated paths like ActivatePeer, DeactivatePeer, and the new effective-mode reporting still behave as if lazy/dynamic is active even though the daemon stayed on the eager path.
Suggested fix
+func normalizeEffectiveMode(mode connectionmode.Mode, rosenpassEnabled bool) connectionmode.Mode {
+ if rosenpassEnabled && modeUsesLazyMgr(mode) {
+ return connectionmode.ModeP2P
+ }
+ return mode
+}
+
func NewConnMgr(engineConfig *EngineConfig, statusRecorder *peer.Status, peerStore *peerstore.Store, iface lazyconn.WGIface) *ConnMgr {
envMode, envRelayTimeout := peer.ResolveModeFromEnv()
@@
mode, relayTimeout, p2pTimeout, p2pRetryMax := resolveConnectionMode(
envMode, envRelayTimeout,
engineConfig.ConnectionMode, engineConfig.RelayTimeoutSeconds,
engineConfig.P2pTimeoutSeconds,
engineConfig.P2pRetryMaxSeconds,
nil,
)
+ mode = normalizeEffectiveMode(mode, engineConfig.RosenpassEnabled)
@@
newMode, newRelay, newP2P, newP2pRetry := resolveConnectionMode(
e.envMode, e.envRelayTimeout, e.cfgMode, e.cfgRelayTimeout,
e.cfgP2pTimeout, e.cfgP2pRetryMax, pc,
)
+ newMode = normalizeEffectiveMode(newMode, e.rosenpassEnabled)Also applies to: 176-187, 281-319
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@client/internal/conn_mgr.go` around lines 78 - 107, NewConnMgr currently
records the resolved mode (possibly "p2p-lazy"/"p2p-dynamic") even when
rosenpass blocks starting the lazy manager, causing the ConnMgr.mode to claim
lazy/dynamic while the daemon stayed on the eager path; change NewConnMgr to
detect when rosenpass blocks the lazy manager (rosenpassEnabled true / the same
condition used later to skip starting the lazy manager) and override the stored
mode to the eager equivalent before returning the ConnMgr (update the mode field
set in NewConnMgr after calling resolveConnectionMode), so code paths that read
ConnMgr.mode (e.g., ActivatePeer/DeactivatePeer and effective-mode reporting)
reflect the actual running mode.
| relaySecs := parseUint32Field(s.iRelayTimeout.Text) | ||
| p2pSecs := parseUint32Field(s.iP2pTimeout.Text) | ||
| retrySecs := parseUint32Field(s.iP2pRetryMax.Text) | ||
| req.RelayTimeoutSeconds = &relaySecs | ||
| req.P2PTimeoutSeconds = &p2pSecs | ||
| req.P2PRetryMaxSeconds = &retrySecs |
There was a problem hiding this comment.
Stale timeout overrides are still sent when switching to "Follow server".
updateTimeoutEntriesEnabled (Line 902–905) only Disable()s the entries for the non-lazy/dynamic modes — it never clears their Text. So if a user previously had p2p-lazy with RelayTimeout=60 and now picks "Follow server", buildSetConfigRequest at Line 745–750 still parses the disabled entry and writes RelayTimeoutSeconds=60 into the request. The override silently persists on the daemon and reappears the next time the user switches back to a lazy/dynamic mode.
Consider clearing the entry text (or skipping the assignment) in modes where the override is meaningless:
♻️ Suggested fix
func (s *serviceClient) updateTimeoutEntriesEnabled() {
if s.iRelayTimeout == nil {
return
}
switch s.sConnectionMode.Selected {
case "p2p-lazy":
s.iRelayTimeout.Enable()
s.iP2pTimeout.Disable()
+ s.iP2pTimeout.SetText("")
s.iP2pRetryMax.Disable()
+ s.iP2pRetryMax.SetText("")
case "p2p-dynamic":
s.iRelayTimeout.Enable()
s.iP2pTimeout.Enable()
s.iP2pRetryMax.Enable()
default:
s.iRelayTimeout.Disable()
+ s.iRelayTimeout.SetText("")
s.iP2pTimeout.Disable()
+ s.iP2pTimeout.SetText("")
s.iP2pRetryMax.Disable()
+ s.iP2pRetryMax.SetText("")
}
}Also applies to: 889-907
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@client/ui/client_ui.go` around lines 745 - 750, The disabled timeout fields
(iRelayTimeout.Text, iP2pTimeout.Text, iP2pRetryMax.Text) are still parsed in
buildSetConfigRequest via parseUint32Field and written into req.* even when
updateTimeoutEntriesEnabled has disabled them; modify buildSetConfigRequest to
check the current mode (use the same mode/source that
updateTimeoutEntriesEnabled reads) and skip assigning RelayTimeoutSeconds,
P2PTimeoutSeconds, and P2PRetryMaxSeconds when the UI is in "Follow server" (or
other modes where overrides are meaningless), or alternatively clear the Text
values when disabling in updateTimeoutEntriesEnabled so parseUint32Field returns
empty; reference functions/fields: buildSetConfigRequest,
updateTimeoutEntriesEnabled, parseUint32Field, iRelayTimeout.Text,
iP2pTimeout.Text, iP2pRetryMax.Text, and the
req.RelayTimeoutSeconds/P2PTimeoutSeconds/P2PRetryMaxSeconds assignments.
| func (s *MemoryStore) GetWithNonceCheck(peerPubKey string, since uint64) (*mgmProto.PeerConnectionMap, bool) { | ||
| s.mu.Lock() | ||
| defer s.mu.Unlock() | ||
| e, ok := s.maps[peerPubKey] | ||
| if !ok { | ||
| return nil, false | ||
| } | ||
| if since > 0 && e.m.GetInResponseToNonce() < since { | ||
| return nil, false | ||
| } | ||
| if s.clock.Now().Sub(e.updatedAt) > s.ttl { | ||
| delete(s.maps, peerPubKey) | ||
| return nil, false | ||
| } | ||
| return proto.Clone(e.m).(*mgmProto.PeerConnectionMap), true |
There was a problem hiding this comment.
Evict expired entries before the nonce gate.
GetWithNonceCheck returns on the nonce comparison before running TTL cleanup. An expired entry with an older nonce then stays in s.maps indefinitely if callers only poll with sinceNonce.
Suggested change
func (s *MemoryStore) GetWithNonceCheck(peerPubKey string, since uint64) (*mgmProto.PeerConnectionMap, bool) {
s.mu.Lock()
defer s.mu.Unlock()
e, ok := s.maps[peerPubKey]
if !ok {
return nil, false
}
+ if s.clock.Now().Sub(e.updatedAt) > s.ttl {
+ delete(s.maps, peerPubKey)
+ return nil, false
+ }
if since > 0 && e.m.GetInResponseToNonce() < since {
return nil, false
}
- if s.clock.Now().Sub(e.updatedAt) > s.ttl {
- delete(s.maps, peerPubKey)
- return nil, false
- }
return proto.Clone(e.m).(*mgmProto.PeerConnectionMap), true
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| func (s *MemoryStore) GetWithNonceCheck(peerPubKey string, since uint64) (*mgmProto.PeerConnectionMap, bool) { | |
| s.mu.Lock() | |
| defer s.mu.Unlock() | |
| e, ok := s.maps[peerPubKey] | |
| if !ok { | |
| return nil, false | |
| } | |
| if since > 0 && e.m.GetInResponseToNonce() < since { | |
| return nil, false | |
| } | |
| if s.clock.Now().Sub(e.updatedAt) > s.ttl { | |
| delete(s.maps, peerPubKey) | |
| return nil, false | |
| } | |
| return proto.Clone(e.m).(*mgmProto.PeerConnectionMap), true | |
| func (s *MemoryStore) GetWithNonceCheck(peerPubKey string, since uint64) (*mgmProto.PeerConnectionMap, bool) { | |
| s.mu.Lock() | |
| defer s.mu.Unlock() | |
| e, ok := s.maps[peerPubKey] | |
| if !ok { | |
| return nil, false | |
| } | |
| if s.clock.Now().Sub(e.updatedAt) > s.ttl { | |
| delete(s.maps, peerPubKey) | |
| return nil, false | |
| } | |
| if since > 0 && e.m.GetInResponseToNonce() < since { | |
| return nil, false | |
| } | |
| return proto.Clone(e.m).(*mgmProto.PeerConnectionMap), true | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@management/server/peer_connections/store.go` around lines 122 - 136,
GetWithNonceCheck currently checks the nonce before evicting TTL-expired
entries, allowing expired entries to persist if callers only use since/nonce
polling; update the function (GetWithNonceCheck) to perform the TTL expiry check
immediately after loading e from s.maps (using s.clock.Now().Sub(e.updatedAt) >
s.ttl), delete the map entry and return nil,false if expired, and only then
perform the since/nonce comparison and final return; keep the existing mutex
usage (s.mu.Lock()/defer s.mu.Unlock()) and the proto.Clone(e.m) return
behavior.
| if sRelayTimeoutSeconds.Valid { | ||
| v := uint32(sRelayTimeoutSeconds.Int64) | ||
| account.Settings.RelayTimeoutSeconds = &v | ||
| } | ||
| if sP2pTimeoutSeconds.Valid { | ||
| v := uint32(sP2pTimeoutSeconds.Int64) | ||
| account.Settings.P2pTimeoutSeconds = &v | ||
| } | ||
| if sP2pRetryMaxSeconds.Valid { | ||
| v := uint32(sP2pRetryMaxSeconds.Int64) | ||
| account.Settings.P2pRetryMaxSeconds = &v |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# 1) Find schema/migration definitions for the new settings columns.
rg -n "settings_(connection_mode|relay_timeout_seconds|p2p_timeout_seconds|p2p_retry_max_seconds)" --type sql --type go
# 2) Find all read/write paths using these settings fields.
rg -n "ConnectionMode|RelayTimeoutSeconds|P2pTimeoutSeconds|P2pRetryMaxSeconds" --type go
# 3) Verify existing validation/constraints for non-negative and bounded values.
rg -n "CHECK\\s*\\(|>=\\s*0|validate|clamp|sanitize" --type sql --type goRepository: netbirdio/netbird
Length of output: 50374
🏁 Script executed:
# Find the handler validation logic for these timeout fields
rg -n "RelayTimeoutSeconds|P2pTimeoutSeconds|P2pRetryMaxSeconds" management/server/http/handlers/accounts/accounts_handler.go -B 2 -A 2Repository: netbirdio/netbird
Length of output: 1226
🏁 Script executed:
# Check migration files for any schema constraints
fd -e sql -e go | xargs grep -l "settings_relay_timeout_seconds\|settings_p2p_timeout_seconds\|settings_p2p_retry_max_seconds" 2>/dev/null | head -20Repository: netbirdio/netbird
Length of output: 98
🏁 Script executed:
# Look for validation functions in the types package
rg -n "func.*Settings.*Validate\|func.*validate.*timeout" management/server/types/ --type goRepository: netbirdio/netbird
Length of output: 43
🏁 Script executed:
# Check the API types that define the request structure
rg -n "type AccountSettings" shared/management/http/api/types.gen.go -A 30 | head -60Repository: netbirdio/netbird
Length of output: 3017
🏁 Script executed:
# Look for migration or schema files
find . -type f \( -name "*.sql" -o -name "*migration*" -o -name "*schema*" \) 2>/dev/null | grep -v node_modules | head -20Repository: netbirdio/netbird
Length of output: 1114
🏁 Script executed:
# Search for database initialization or schema setup
rg -n "CREATE TABLE.*settings|ALTER TABLE.*settings" --type sql -A 5Repository: netbirdio/netbird
Length of output: 50373
🏁 Script executed:
# Look for migrations that define the timeout settings columns
rg -n "settings_relay_timeout_seconds|settings_p2p_timeout_seconds|settings_p2p_retry_max_seconds" management/cmd/migration_up.go -B 3 -A 3Repository: netbirdio/netbird
Length of output: 43
🏁 Script executed:
# Check sql_store.go for migration/schema setup
rg -n "ALTER TABLE.*accounts.*ADD\|CREATE TABLE" management/server/store/sql_store.go -A 2 | head -100Repository: netbirdio/netbird
Length of output: 43
🏁 Script executed:
# Look for GORM migration in sql_store.go
rg -n "AutoMigrate|migration|CreateTable" management/server/store/sql_store.go | head -30Repository: netbirdio/netbird
Length of output: 186
🏁 Script executed:
# Check the Settings struct in types for GORM tags and constraints
cat -n management/server/types/settings.go | sed -n '60,85p'Repository: netbirdio/netbird
Length of output: 1471
🏁 Script executed:
# Check handler for any validation before the cast
cat -n management/server/http/handlers/accounts/accounts_handler.go | sed -n '215,245p'Repository: netbirdio/netbird
Length of output: 1524
🏁 Script executed:
# Search for tests with negative or invalid timeout values
rg -n "RelayTimeoutSeconds|P2pTimeoutSeconds|P2pRetryMaxSeconds" management/server/http/handlers/accounts/ --type go -B 2 -A 2 | grep -i "negative\|invalid\|-[0-9]" | head -20Repository: netbirdio/netbird
Length of output: 1670
🏁 Script executed:
# Check if there's any request validation middleware or OpenAPI schema constraints
rg -n "validate|constraint|minimum|maximum" management/server/http/handlers/accounts/accounts_handler.go -i | head -20Repository: netbirdio/netbird
Length of output: 1241
🏁 Script executed:
# Look for any other safeguards before reaching the store
rg -n "func.*UpdateSettings\|func.*SetSettings" management/server/ --type go | head -5Repository: netbirdio/netbird
Length of output: 43
🏁 Script executed:
# Find the handler method that processes the update settings request
rg -n "func.*Update.*Account.*Settings\|HandleUpdateSettings" management/server/http/handlers/accounts/accounts_handler.goRepository: netbirdio/netbird
Length of output: 43
🏁 Script executed:
# Check the complete flow - find where the handler calls the store
rg -n "\.SaveAccount\|UpdateAccount" management/server/http/handlers/accounts/accounts_handler.go -A 3 | head -40Repository: netbirdio/netbird
Length of output: 43
🏁 Script executed:
# Find the function that handles account updates in the handler
rg -n "func.*handler.*\(.*Account" management/server/http/handlers/accounts/accounts_handler.go | head -10Repository: netbirdio/netbird
Length of output: 187
🏁 Script executed:
# Search for where settings are saved after being validated
rg -n "SaveAccount\|StoreEvent\|return.*settings" management/server/http/handlers/accounts/accounts_handler.go | head -20Repository: netbirdio/netbird
Length of output: 43
🏁 Script executed:
# Find where updateAccountRequestSettings is called
rg -n "updateAccountRequestSettings" management/server/http/handlers/accounts/accounts_handler.go -B 5 -A 5Repository: netbirdio/netbird
Length of output: 929
🏁 Script executed:
# Look at the full updateAccountRequestSettings method
cat -n management/server/http/handlers/accounts/accounts_handler.go | sed -n '170,260p'Repository: netbirdio/netbird
Length of output: 4644
🏁 Script executed:
# Confirm the same unsafe cast exists in the store layer at the exact lines mentioned
cat -n management/server/store/sql_store.go | sed -n '1630,1650p'Repository: netbirdio/netbird
Length of output: 812
Add bounds checks before casting int64 timeout values to uint32.
Lines 1636, 1640, and 1644 perform unchecked casts from sql.NullInt64.Int64 to uint32, which allows negative or out-of-range values from the database to silently wrap. The same issue exists in the HTTP handler (management/server/http/handlers/accounts/accounts_handler.go lines 230, 234, 238). Add validation to reject invalid values before assignment.
🛠️ Suggested fix
if sRelayTimeoutSeconds.Valid {
+ if sRelayTimeoutSeconds.Int64 < 0 || sRelayTimeoutSeconds.Int64 > int64(^uint32(0)) {
+ return nil, status.NewGetAccountFromStoreError(
+ fmt.Errorf("invalid settings_relay_timeout_seconds: %d", sRelayTimeoutSeconds.Int64),
+ )
+ }
v := uint32(sRelayTimeoutSeconds.Int64)
account.Settings.RelayTimeoutSeconds = &v
}
if sP2pTimeoutSeconds.Valid {
+ if sP2pTimeoutSeconds.Int64 < 0 || sP2pTimeoutSeconds.Int64 > int64(^uint32(0)) {
+ return nil, status.NewGetAccountFromStoreError(
+ fmt.Errorf("invalid settings_p2p_timeout_seconds: %d", sP2pTimeoutSeconds.Int64),
+ )
+ }
v := uint32(sP2pTimeoutSeconds.Int64)
account.Settings.P2pTimeoutSeconds = &v
}
if sP2pRetryMaxSeconds.Valid {
+ if sP2pRetryMaxSeconds.Int64 < 0 || sP2pRetryMaxSeconds.Int64 > int64(^uint32(0)) {
+ return nil, status.NewGetAccountFromStoreError(
+ fmt.Errorf("invalid settings_p2p_retry_max_seconds: %d", sP2pRetryMaxSeconds.Int64),
+ )
+ }
v := uint32(sP2pRetryMaxSeconds.Int64)
account.Settings.P2pRetryMaxSeconds = &v
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if sRelayTimeoutSeconds.Valid { | |
| v := uint32(sRelayTimeoutSeconds.Int64) | |
| account.Settings.RelayTimeoutSeconds = &v | |
| } | |
| if sP2pTimeoutSeconds.Valid { | |
| v := uint32(sP2pTimeoutSeconds.Int64) | |
| account.Settings.P2pTimeoutSeconds = &v | |
| } | |
| if sP2pRetryMaxSeconds.Valid { | |
| v := uint32(sP2pRetryMaxSeconds.Int64) | |
| account.Settings.P2pRetryMaxSeconds = &v | |
| if sRelayTimeoutSeconds.Valid { | |
| if sRelayTimeoutSeconds.Int64 < 0 || sRelayTimeoutSeconds.Int64 > int64(^uint32(0)) { | |
| return nil, status.NewGetAccountFromStoreError( | |
| fmt.Errorf("invalid settings_relay_timeout_seconds: %d", sRelayTimeoutSeconds.Int64), | |
| ) | |
| } | |
| v := uint32(sRelayTimeoutSeconds.Int64) | |
| account.Settings.RelayTimeoutSeconds = &v | |
| } | |
| if sP2pTimeoutSeconds.Valid { | |
| if sP2pTimeoutSeconds.Int64 < 0 || sP2pTimeoutSeconds.Int64 > int64(^uint32(0)) { | |
| return nil, status.NewGetAccountFromStoreError( | |
| fmt.Errorf("invalid settings_p2p_timeout_seconds: %d", sP2pTimeoutSeconds.Int64), | |
| ) | |
| } | |
| v := uint32(sP2pTimeoutSeconds.Int64) | |
| account.Settings.P2pTimeoutSeconds = &v | |
| } | |
| if sP2pRetryMaxSeconds.Valid { | |
| if sP2pRetryMaxSeconds.Int64 < 0 || sP2pRetryMaxSeconds.Int64 > int64(^uint32(0)) { | |
| return nil, status.NewGetAccountFromStoreError( | |
| fmt.Errorf("invalid settings_p2p_retry_max_seconds: %d", sP2pRetryMaxSeconds.Int64), | |
| ) | |
| } | |
| v := uint32(sP2pRetryMaxSeconds.Int64) | |
| account.Settings.P2pRetryMaxSeconds = &v | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@management/server/store/sql_store.go` around lines 1635 - 1645, The code
casts sql.NullInt64 values (sRelayTimeoutSeconds, sP2pTimeoutSeconds,
sP2pRetryMaxSeconds) directly to uint32 and may wrap negative or out-of-range
values; validate each Int64 before assigning to account.Settings.*: check that
value is >= 0 and <= math.MaxUint32, otherwise return/propagate an error (or
skip assignment) and do not perform the cast; then assign via v := uint32(val)
and account.Settings.RelayTimeoutSeconds = &v (and similarly for
P2pTimeoutSeconds and P2pRetryMaxSeconds). Apply the same validation in the HTTP
handler that reads timeouts (the accounts handler functions that set
RelayTimeoutSeconds/P2pTimeoutSeconds/P2pRetryMaxSeconds) and return a 400 error
on invalid values. Ensure checks reference the same symbols
(sRelayTimeoutSeconds, sP2pTimeoutSeconds, sP2pRetryMaxSeconds and
account.Settings.*) so reviewers can locate and verify the fixes.
|
@MichaelUray can you open a discussion for us to talk about the PRs you are submitting? We have a few things coming that might overlap with your proposals, and we are also focusing on stability for the 1.0 release this quarter. |
|
Hi @mlsmaycon, I actually already opened an issue for a discussion, but i got no response to it so far. The changes in these PRs shouldn't require a major version change, since they should be compatible to older versions as well. |
5b18f3e to
fbd0df5
Compare
…nfig Additive change for issue netbirdio#5989 Phase 1. New fields use new tag numbers (11, 12, 13); existing fields (including LazyConnectionEnabled tag 6) are unchanged so old clients ignore the additions and old servers send UNSPECIFIED, which the new client maps back via the legacy boolean. Note: the regenerated pb.go files now report protoc v5.29.3 in their header (this branch was generated with locally-installed protoc 29.3 instead of upstream's v7.34.1). Functionally identical; header diff is the only delta beyond the actual schema additions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Defines Mode enum (relay-forced, p2p, p2p-lazy, p2p-dynamic plus the client-only sentinels Unspecified and FollowServer), ParseString for CLI/env input, ToProto/FromProto for wire translation, and the two backwards-compat helpers ResolveLegacyLazyBool / ToLazyConnectionEnabled that bridge the old Settings.LazyConnectionEnabled boolean. Phase 1 of issue netbirdio#5989. Pure addition -- no existing callers touched in this commit; the engine/conn_mgr migration follows in subsequent commits in the same PR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…on warns NB_CONNECTION_MODE wins over the legacy pair (NB_FORCE_RELAY, NB_ENABLE_EXPERIMENTAL_LAZY_CONN); when the legacy pair is set together, NB_FORCE_RELAY wins (most-restrictive, mirrors the group-conflict rule from issue netbirdio#5990). Each legacy var emits a one-shot deprecation warning when it actually contributes to the resolved mode. NB_LAZY_CONN_INACTIVITY_THRESHOLD becomes an alias for the future relay_timeout setting and warns once. IsForceRelayed() is kept for callers that have not yet been migrated (conn.go, statusrecorder); they will be updated in the engine/conn refactor commits later in this PR. Phase 1 of issue netbirdio#5989. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three new CLI flags map onto the new connection-mode plumbing: - --connection-mode <relay-forced|p2p|p2p-lazy|p2p-dynamic|follow-server> - --relay-timeout <seconds> - --p2p-timeout <seconds> Plumbed through three sites in cmd/up.go (SetConfigRequest, ConfigInput, LoginRequest), persisted in profilemanager.Config, and added as new fields on the daemon.proto IPC messages. Empty / not-changed flags fall back to the server-pushed value (which itself falls back to the legacy lazy_connection_enabled boolean for old servers). Phase 1 of issue netbirdio#5989. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
EngineConfig gains ConnectionMode, RelayTimeoutSeconds, P2pTimeoutSeconds. ConnMgr now stores the resolved Mode plus the raw inputs (env, config) so it can re-resolve when the server pushes a new PeerConfig. UpdatedRemoteFeatureFlag is renamed to UpdatedRemotePeerConfig and takes the full PeerConfig pointer; a thin shim with the old name delegates to it for callers that haven't been updated yet. connect.go copies the three new fields from profilemanager.Config into the EngineConfig builder, with a tolerant parser that logs and falls through to Unspecified on invalid input. Phase 1 of issue netbirdio#5989. peer/conn.go forwarding follows in C4. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ConnConfig gains a Mode field forwarded from the engine. Open() now checks Mode == ModeRelayForced instead of calling the global env-reader IsForceRelayed(). The local 'forceRelay' variable name is renamed to 'skipICE' to make the new branching intent explicit. The PeerStateUpdate block at the end of Open() also reads from conn.config.Mode now, so the StatusRecorder sees the per-peer mode rather than the global env var. A single remaining caller of IsForceRelayed() (srWatcher.Start in engine.go) is left for a follow-up; that path uses a process-wide flag not per-peer state, so it can be migrated in Phase 2 once srWatcher itself learns about ConnectionMode. Phase 1 of issue netbirdio#5989. Engine forwarding (C5) follows. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
createPeerConn now reads ConnMgr.Mode() and copies it into peer.ConnConfig, so the per-peer Open() loop in conn.go can take the ModeRelayForced skip-ICE branch without reading the global env var. This is the last wiring commit for the client side of Phase 1; the server-side mgmt changes (Settings + OpenAPI + handler + audit + NetworkMap-build) follow in Section D. Phase 1 of issue netbirdio#5989. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
All three fields are nullable to distinguish 'use built-in default' (NULL) from explicit values (incl. 0 = never tear down). Copy() now deep-clones the new pointer fields via two small helpers. GORM AutoMigrate creates the new columns at first start; existing accounts have NULL in all three columns and resolve via the legacy LazyConnectionEnabled boolean. Phase 1 of issue netbirdio#5989. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tings Three new optional, nullable fields with descriptions of the NULL = built-in-default semantics and the Phase-1-vs-Phase-2 status of p2p-dynamic. Regenerated types.gen.go via the existing oapi-codegen tooling. The generated AccountSettingsConnectionMode enum has the canonical values relay-forced / p2p / p2p-lazy / p2p-dynamic, plus a Valid() helper for handler-side validation. Phase 1 of issue netbirdio#5989. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PUT /api/accounts/{id} now accepts connection_mode (validated against
the four-value enum via the generated AccountSettingsConnectionMode.
Valid()), p2p_timeout_seconds and relay_timeout_seconds. NULL in the
JSON body keeps the existing value untouched (= "no client-side
override on this round-trip"); explicit NULL-clear via API uses a
distinct PATCH-style call which is out-of-scope for Phase 1.
Response payload mirrors the input fields back as nullable so the
dashboard can distinguish "use default" from "explicit value".
Phase 1 of issue netbirdio#5989. Audit-event emission follows in D5.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
AccountConnectionModeChanged (121), AccountRelayTimeoutChanged (122), AccountP2pTimeoutChanged (123) -- emitted from account.go when settings change. Per-peer / per-group event codes are reserved for Phase 3 (issue netbirdio#5990). Phase 1 of issue netbirdio#5989. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
handleConnectionModeSettings is invoked from the same diff-detection block as handleLazyConnectionSettings; emits one StoreEvent per changed field (ConnectionMode, RelayTimeoutSeconds, P2pTimeoutSeconds) with old/new values in the meta payload. Four small ptr-equality / deref helpers are added for nullable string and uint32 fields. They are package-private and named after the existing convention used elsewhere in the package. Phase 1 of issue netbirdio#5989. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…elds Two changes in one commit because they're inseparable: 1. Move client/internal/peer/connectionmode/ to shared/connectionmode/. The package now needs to be importable from BOTH client/ and management/ (which is impossible while it lives under client/internal/ per Go's internal-package rule). All imports updated; tests pass on both sides. 2. Extend management/internals/shared/grpc/conversion.go::toPeerConfig to populate the three new PeerConfig fields (ConnectionMode, P2PTimeoutSeconds, RelayTimeoutSeconds) using the connectionmode helpers. The legacy LazyConnectionEnabled boolean is now derived from the resolved Mode via ToLazyConnectionEnabled() rather than copied verbatim from Settings -- this is the central backwards-compat contract: old clients see only the boolean, new clients prefer the explicit enum and ignore the bool. Resolution rules (Phase 1, account-wide only): - Settings.ConnectionMode != nil and parses -> wins - Otherwise -> ResolveLegacyLazyBool(LazyConnectionEnabled) - timeouts: Settings.RelayTimeoutSeconds / P2pTimeoutSeconds when non-NULL, else 0 (= server has no preference; client uses built-in default) Per-peer / per-group resolution comes in Phase 3 (netbirdio#5990). Phase 1 of issue netbirdio#5989. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Nine sub-cases cover the Phase-1 resolution matrix from spec section 3: - no settings -> default (P2P + lazy=false) - legacy bool only -> mapped via ResolveLegacyLazyBool - explicit ConnectionMode -> wins over the legacy bool - timeouts propagate - garbage ConnectionMode value -> tolerant fallback to legacy bool Particular attention to the structural compat gap: relay-forced cannot be expressed via the legacy boolean, so the wire field for old clients is sent as false. Documented in the spec, asserted here. Existing TestAccount_GetPeerNetworkMap remains green: existing test peers have ConnectionMode=NULL in Settings, falls through to the legacy ResolveLegacyLazyBool(false) -> ModeP2P -> wire bool false. Phase 1 of issue netbirdio#5989. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 2 of netbirdio#5989 needs to dynamically detach the ICE listener at runtime (when p2p-dynamic mode hits its ICE-inactivity threshold). Adds the RemoveICEListener method and extends the Add/RemoveICEListener pair to hold h.mu; Listen() now reads h.iceListener through readICEListener under the same mutex, fixing a latent race that didn't matter while AddICEListener was the only writer. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 3.7i of netbirdio#5989. Each NetworkMap update copies the RemotePeerConfig mode/timeouts/groups/last-seen into peer.State via UpdatePeerRemoteMeta. Both online (RemotePeers) and offline (OfflinePeers) lists get populated so the daemon-RPC StatusResponse reflects everything. peer.TimestampOrZero exported so the engine can nil-guard last_seen_at_server cleanly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 3.7i of netbirdio#5989. Mirrors existing peers_handler RBAC: extracts userAuth via nbcontext.GetUserAuthFromContext, looks up peer through accountManager.GetPeer (which already enforces account-isolation + permissions). Tests cover 200/401/404/202 and confirm cross-account isolation. Enriches remote_fqdn via GetPeerByPubKey lookup. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 3.7i of netbirdio#5989. pcAccountManagerAdapter bridges the real AccountManager into the small interface peer_connections.Handler uses. GetPeerByPubKey added to AccountManager when missing — used by REST handlers to enrich PeerConnectionMap entries with FQDNs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 3.7i of netbirdio#5989. Adds a 4th tab to the existing Networks AppTabs in client/ui/network.go. Tab content: counter widget on top (configured/online/breakdown), sorted accordion list (P2P -> Relayed -> Idle -> Offline), Show-Full checkbox + Refresh button at bottom. 30 s polling via ctx-respecting goroutine; uses fyne.Do for thread-safe widget updates. ASCII glyphs for status (avoids Unicode render quirks on Fyne Windows). Real proto field names: GetLocalIceCandidateEndpoint, GetRemoteIceCandidateEndpoint, GetRelayAddress. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 3.7i (netbirdio#5989). Exposes ConfiguredPeersTotal, ServerOnlinePeers, P2PConnectedPeers, RelayedConnectedPeers, IdleOnlinePeers, ServerOfflinePeers from recorder.GetFullStatus() as int64 methods so gomobile can generate the corresponding Java getters on Client. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 3.7i of netbirdio#5989 (follow-up to Phase 6.3). PeerInfo now exposes Relayed, ServerOnline, ICE-candidate endpoints, RelayServerAddress, LatencyMs, LastWireguardHandshake (RFC3339 string), LastSeenAtServer (RFC3339 string), EffectiveConnectionMode, ConfiguredConnectionMode, Groups (CSV), AgentVersion, OsVersion (last two currently empty — peer.State has no Version/Os fields yet). Strings instead of time.Time / []string for gomobile compatibility. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 3.7i UX polish: - Tray menu entry "Networks" -> "Peers und Netzwerke" so the new Peers tab is discoverable from the menu name. - "Peers" tab moves from 4th to 1st position in the Networks window (most-used per Phase 3.7i). User opens window -> sees Peers immediately. - Accordion list wrapped in container.NewStack so VScroll expands to the center area's full size instead of collapsing to the Accordion's intrinsic content height (was ~1 row tall on a 3-peer network). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 3.7i fix for inflated counter (Android showed 27/27 online when 4 peers were actually offline). Root cause: NetworkMap.OfflinePeers contains only LOGIN-EXPIRED peers; peers in OfflinePeers are not "currently unreachable", they're "session needs re-auth". Peers with hardware/network down still live in NetworkMap.RemotePeers as long as their login is valid. So the previous counter math (online = len(d.peers); offline = len(d.offlinePeers)) mapped login-validity to liveness, which over-counts online. Fix: - Add bool live_online = 16 to RemotePeerConfig in mgmt proto. - Mgmt server fills it from rPeer.Status.Connected (the actual liveness flag the server already tracks via heartbeat). - Daemon: peer.RemoteMeta + State.RemoteLiveOnline carry the value. - GetFullStatus counter: peer counted as online iff isLive (= RemoteLiveOnline || !mgmtKnowsLiveness for backward compat with pre-Phase-3.7i mgmt servers — fallback assumes online when mgmt doesn't populate LastSeenAtServer). Backward compat: pre-Phase-3.7i mgmt servers leave both LastSeenAtServer zero and LiveOnline false. The fallback condition (LastSeenAtServer empty → assume online) preserves the legacy counter behaviour. Once the upgraded mgmt server starts populating both fields, the daemon honours the actual liveness flag. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
User feedback after Phase 3.7i deploy: - Tray menu entry was German "Peers und Netzwerke"; should be English "Peers and Networks" to match the rest of the app. - Same name applies to the window title (was "Networks"). - Peer list area was still small; the previous NewStack wrap inside buildPeersTabContent was not enough because the AppTabs container doesn't expand its children unless they request to fill. Fix: - network.go:38 -> "Peers and Networks" window title - client_ui.go:1242 -> "Peers and Networks" tray entry - network.go: wrap the Peers tab content in container.NewStack at the TabItem level so the tab area gives Border the full tab content rectangle. - peers_tab.go: SetMinSize(400, 500) on the VScroll as a floor so the layout never collapses below a reasonable size; the parent containers can grow it beyond. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…are actions
User feedback round 2:
- "Show full peer details" checkbox should live in the WINDOW footer
(next to Refresh / Select all / Deselect All), not duplicated inside
the Peers tab.
- The duplicate Refresh button inside the tab should go away; the
outer Refresh button now triggers Peers refresh when on the Peers
tab and Networks refresh otherwise.
- Expanded peer rows wasted vertical space (Accordion distributes
children across the available height when there are few of them).
- Multiple peers should be expandable simultaneously.
Refactor:
- buildPeersTabContent now returns a peersTabBundle{Content, ShowFull,
Refresh}; network.go places ShowFull + Refresh in the outer footer
and switches Select-all / Deselect-All visibility based on the
active tab (hidden on Peers, shown on Networks tabs).
- Replaces widget.Accordion with a VBox of newPeerRow() elements:
each row is a header-button + hidden detail-label that toggles on
tap. Multi-expand naturally allowed (each row is independent).
VBox packs tightly — no inter-row gap when only some are expanded.
- Counter format aligned with Android: "N of M peers online (server)"
+ "n P2P | n relayed | n idle | n offline".
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two fixes for the Win UI Peers tab: 1. newPeerRow now adds/removes the detail Label dynamically on tap instead of pre-creating it Hidden. Some Fyne layouts still allocate space for hidden children, which made collapsed rows take ~150px each (inflating the VBox to where only 1-2 rows fit visibly). Dynamic add/remove guarantees collapsed rows are header-only. 2. scroll.SetMinSize(400, 600) restores the floor so the scroll area fills most of the window even when only a few peers exist. Without this floor the routeCheckContainer (an outer VBox) sized the tabs to MinSize and the Peers tab collapsed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tent Two follow-up bugs reported on Windows after the dynamic add/remove rewrite: - Refresh collapsed all rows the user had opened. The render closure rebuilds listVBox from scratch and newPeerRow defaulted to collapsed. Persist expand state in a mutex-guarded map keyed by pubkey; rows consult it on construction (addDetail() if startExpanded) and update it on every tap. - The inner VScroll inside the Peers tab over-allocated vertical space (scrollbar reaches far past the actual content). The outer Networks window already wraps the tab content in a VScroll, so the inner one was redundant and made the tab content claim a hard 600px floor regardless of how many peers were present. Remove the inner VScroll and let listVBox sit directly in container.NewBorder's center -- it now sizes to its content and the outer scroll handles overflow. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…rant version parsing
Fixes A/B/C reported after the first p2p-dynamic deployment:
A) When management activates lazy/dynamic mode at runtime, the previous
path called lazyConnMgr.AddActivePeers() on every existing peer,
keeping the open WireGuard tunnels running until their inactivity
timers fired hours later. That contradicted the user-visible promise
of lazy/dynamic ("idle until traffic"). Replaced with
resetPeersToLazyIdle() which Close()s every supported peer and re-
registers it via lazyConnMgr.AddPeer (idle entry). Mode changes are
rare and the brief 1-2 s tunnel rebuild per peer is acceptable.
B) IsSupported() rejected any agent version without a "." in its name.
Custom dev/CI builds tagged dev-<sha> or ci-<sha> by build-android-
lib.sh and similar build scripts therefore got eager-opened by the
lazy manager (treated like a pre-0.45 daemon). Now also accept
"dev-" / "ci-" prefixes — those builds come from the same source
tree as the special-cased "development" string and have the same
lazy support guarantees.
C) startModeSideEffects() only set statusRecorder.UpdateLazyConnection(true)
for ModeP2PLazy. Under p2p-dynamic the status recorder still
reported "Lazy connection: false" even though peers were managed by
the inactivity manager. Now both ModeP2PLazy and ModeP2PDynamic
flip the flag, since they're the same to the end user.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codex#1: store.go merge dropped a non-zero refresh-nonce when a normal delta with InResponseToNonce=0 followed a snapshot pushed in response to nonce N. Subsequent GET ?since=N then returned 404 until the next snapshot. Take max(prev, new) so the latest non-zero nonce sticks. Codex#2: SnapshotRouter.Register overwrote any existing channel for the same peer; a stale stream calling Unregister(peerKey) afterwards could close the new stream's channel. Register now closes a previous channel inline (fast-reconnect case), and Unregister takes the channel token returned by Register and only acts on the matching channel. Added regression test for the stale-Unregister race. Codex#9: PostRefresh always returned 202 even when no active Sync stream existed for the target peer (offline / between connections). Response now includes "dispatched": bool so the caller can distinguish "snapshot will arrive" from "no live stream — fall back to cache". Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codex#3: flushDelta and flushFull marked events as lastPushed BEFORE calling sink.Push and ignored its error. A failed push (mgmt reconnect, transient gRPC error) silently lost state — the affected peers stayed stale until their next state change or the 60 s heartbeat. Now lastPushed is updated only AFTER a successful Push; on failure the events stay dirty so the next computeDeltaFromSource will re-include them. Codex#4: connStatePusher.loop() emitted the initial flushFull unconditionally on goroutine entry. The pusher is created in engine.Start, BEFORE the first NetworkMap arrives and addNewPeers populates the status recorder, so the initial snapshot was an empty map. Real peers only surfaced to management on the next per-peer state change or via the 60 s heartbeat. Loop now waits on initialReady which the engine closes via TriggerInitialSnapshot() right after the first FinishPeerListModifications(). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codex#5: UpdatePeerICEState and UpdatePeerRelayedState only invoked notifyConnStateChange when ConnStatus or Relayed flipped. Endpoint roams (NAT rebind, ICE candidate type change host->srflx) and relay-server moves did not surface to management until the next 60 s heartbeat tick, defeating the "endpoint flips immediate" UX promise. Added hasMaterialICEChange / hasMaterialRelayChange helpers that include local/remote endpoint, candidate type, and relay-server fields in the change-detection. Both Update* paths now use them so any material movement enqueues an immediate push. Codex#6: added a server_liveness_known proto field on RemotePeerConfig + State.RemoteServerLivenessKnown. Phase-3.7i+ management servers set it; the daemon's GetFullStatus counter trusts LiveOnline directly when known instead of guessing from LastSeenAtServer-zero (the legacy heuristic still applies when ServerLivenessKnown is false, so old servers keep working). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase-3.7i+ management server now explicitly tells clients "I track per-peer liveness authoritatively" via the new server_liveness_known field on RemotePeerConfig. New clients (engine.go) plumb it through RemoteMeta -> State.RemoteServerLivenessKnown so the GetFullStatus counter can trust LiveOnline directly instead of falling back to the LastSeenAtServer-zero heuristic. Old servers leave the field at default (false); old clients ignore it. Net effect: counter accuracy improves on the new-server / new-client combination without breaking either backwards-compat path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codex#7: peerLatencyStr() called p.GetLatency().AsDuration() without a nil check. The local daemon usually populates the field but a defensive nil guard avoids a panic when a remote peer reports a status without the latency field set. Codex#8a: onICEConnected() paused wgProxyRelay BEFORE configuring the new WG endpoint, opening a 1-2 s window where relay packets were dropped while WG still pointed at the relay address. Reordered: bring new ICE proxy up, configure WG endpoint, redirect leftover relay traffic, THEN pause the relay. The redirect+pause swap covers the remaining bytes that were in flight on the relay. Codex#8b: onICEStateDisconnected() in the no-relay-fallback branch called RemoveEndpointAddress, creating a black-hole window for any brief ICE flap. Removed the active removal: a transient disconnect is followed within 1-2 s by a fresh ICE-connected callback that re-points WG; a permanent disconnect surfaces via WG's own keepalive timeout. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When the postgres pool is configured, GetAccount uses the pgx fast path (getAccountPgx) instead of the GORM Take() path. The pgx SELECT listed every settings_* column EXCEPT the four Phase-3.7i additions: - settings_connection_mode - settings_relay_timeout_seconds - settings_p2p_timeout_seconds - settings_p2p_retry_max_seconds So account.Settings.ConnectionMode came back nil regardless of the DB value, and toPeerConfig fell back to ResolveLegacyLazyBool — clients only ever saw ModeP2P / ModeP2PLazy via the bool, never relay-forced or p2p-dynamic. Manifested as: dashboard saved p2p-dynamic, postgres stored it, but the FWR daemon's first sync after deploy logged "lazy/dynamic connection manager disabled by management push (mode=p2p)" and tore down its already-correctly-initialized lazy manager. The GORM path (getAccountGorm) loaded these columns automatically via Take(&account); only the hand-written pgx SELECT was missing them. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…meout to 24h Two unrelated fixes that surfaced during user verification: 1) Android UI showed 0 B / 0 B for every peer's transfer counters regardless of actual traffic. Root cause: BytesRx/BytesTx in peer State are only updated by RefreshWireGuardStats(), which the desktop daemon's Status RPC calls before serving CLI status output. The Android gomobile bridge's PeersList() snapshotted GetFullStatus() directly without that refresh, so the cached counters stayed at the value last written when the peer was opened/closed (typically 0). Now PeersList() calls RefreshWireGuardStats() first; transfer counters update on every UI poll (~30 s in the new peers tab). 2) DefaultInactivityThreshold (the relay-tunnel idle teardown fallback when neither client config nor server-pushed setting provides one) bumped 15 min -> 24 h. The 15-min default was aggressive: peers exchanging only NAT-keepalives got torn down every 15 min and had to re-handshake on the next packet. The new 24 h default matches the dashboard placeholder change and what most users actually want for a mesh topology. The two p2p-lazy / p2p-dynamic test cases that hard-coded 20 min of past-activity were updated to 25 h to remain triggered. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… form Fix B in 853d26c only handled the bare "dev-<sha>" / "ci-<sha>" prefix. The Android build script (build-android-lib.sh) and the infrastructure build wrappers (scripts/netbird-windows-build/, build-and-deploy.sh) write a SEMVER-PADDED form like "0.0.0-dev-1b923aad9" so version.NewVersion can parse it cleanly. normalizeVersion() splits on "-" and takes index 0 = "0.0.0", which is < the 0.45.0 minVersion, so IsSupported returned FALSE and every peer running a custom dev build was eager-opened by the lazy manager. This made p2p-dynamic ineffective on Bad Mitterndorf routers + S21 + W11-test1 because they all run the dev build right now. Now also accept any version containing "-dev-" or "-ci-". The bare- prefix form is still accepted for backwards compatibility with older build scripts. Random short-hash versions ("a6c5960") still get rejected because they have no marker at all. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…Listeners helpers Phase 3.7i UI-push helpers used by the peer-status-visibility code above. Thin wrappers around the existing notifier.peerListChanged and snapshotRouterPeersLocked + dispatchRouterPeers paths.
…onnections.Store + SnapshotRouter) Phase 3.7i added the peer-connections Store and SnapshotRouter to nbgrpc.NewServer's signature, but the test fixture in client/cmd wasn't updated, so the package fails to compile against PR-C's manangement code.
…xture Phase 3.7i added peer_connections.Store and *peer_connections.SnapshotRouter to nbgrpc.NewServer(). Production callers (BaseServer.PeerConnStore / PeerConnRouter) always pass non-nil values, but several test fixtures pass nil. Without nil-guards the Sync handler nil-derefs on snapshotRouter.Register(peerKey). - mgmt/grpc/server.go: NewServer now substitutes a fresh in-memory store + a fresh SnapshotRouter when either argument is nil. Cheap defaults; no behavioural change for production. - client/server/server_test.go: pass non-nil store + router to the 13-arg signature so the package compiles against the new signature on top of all this.
- mgmt/internals/server/boot.go: collapse PeerConnRouter() lambda to Create(s, peer_connections.NewSnapshotRouter) (unlambda staticcheck) - client/internal/conn_mgr.go: //nolint:unused on addPeersToLazyConnManager (deliberately retained for the eventual snapshot-import path; documented inline) - client/internal/peer/status.go: //nolint:unused on notifyPeerListChanged (defined here for the UpdatePeerRemoteMeta caller that lands in a follow-up commit on PR-D)
Fixes a panic on Engine.Stop teardown:
Engine.Stop()
e.connStatePusher.Stop()
e.connStatePusher = nil ← nilled here
...
e.removeAllPeers() ← still triggers
Conn.Close
UpdatePeerState ← fires status listener
listener-closure
e.connStatePusher. ← nil receiver
OnPeerStateChange() ← deref panic at p.events <- ev
Test_Engine_SSH on the netbirdio CI runners hit this reliably because
the runners can actually create the wg interface (sandbox cannot, so
the test fails earlier locally and never reaches the Stop path).
Stack: client/internal/conn_state_pusher.go:99 → engine.go:633 →
peer/status.go:300 (notify closure) → peer/conn.go:736 → conn.go:329
→ conn_mgr.go:454 → engine.go:864 → engine.go:852 → engine.go:391.
Defending in the pusher itself (vs. nil-checking each callsite) keeps
the engine cleanup path simple and matches the no-op-on-nil pattern
already used elsewhere in the package.
fbd0df5 to
c8fcd96
Compare
|




[client+management] Phase 3.7i of #5989: peer-status visibility (RemotePeerConfig + conn-state pusher + Peers UI)
Branch:
pr/c-phase3.7i-visibility→ basepr/b-phase3.5-network-change(stacked PR — depends on PR-B landing first)Compare: https://github.com/netbirdio/netbird/compare/MichaelUray:netbird:pr/b-phase3.5-network-change...MichaelUray:netbird:pr/c-phase3.7i-visibility?expand=1
Summary
This PR delivers the peer-status visibility layer of #5989: a peer can now see the configured + effective connection-mode of its remotes, the remotes' liveness status as known to the management server, and per-peer aggregate counters. It introduces a unary
SyncPeerConnectionsRPC plus an in-memory store with TTL on the management side, a client-sideconn_state_pusherwith adaptive heartbeat that batches updates, and a new "Peers" tab in the desktop client UI listing every peer with status badge + per-peer accordion.This is the third of four stacked PRs implementing #5989. It builds on PR-B.
Why
For diagnosing "why is this peer on relay?", a user needs three things visible from one peer's daemon:
relay-forcedset? Is it onp2p-dynamic?)Pre-Phase-3.7i, only the first piece was indirectly available, by parsing the peer-list endpoint on the management API. There was no per-peer visibility into the lifecycle a remote was in. The "Peers" tab in the desktop UI used to be a flat list with just status colours.
Issue #5990 ("Per-peer and per-group connection-mode and inactivity-threshold override on the management server") depends on this visibility layer landing first.
What's in this PR
Wire schema
proto/management:RemotePeerConfig+PeerSystemMetaextensions for peer-status (effective + configured connection mode, P2P/relay timeouts, last-seen-at-server, server-liveness-known marker, groups).proto/management:SyncPeerConnectionsunary RPC +PeerConnectionMapmessages.proto/daemon:FullStatusaggregate counters +PeerStateenrichment.Management
mgmt/peer_connections:Storeinterface +MemoryStoreimplementation with TTL.mgmt/peer_connections:SnapshotRouterfor on-demand refresh dispatch (one channel per stream, registered when the SyncPeerConnections client subscribes).mgmt: sharedpeer_connectionsbootstrap +SyncPeerConnectionshandler routing.mgmt/grpc: routeSnapshotRequestvia the existing Sync server-stream, bypassing the per-account debouncer (critical: the snapshot needs to arrive within seconds, not after the next accountUpdate cycle).mgmt/peer: storeeffective_connection_modeinPeerSystemMeta.mgmt/conversion:appendRemotePeerConfigfills effective + configured + groups + last-seen.mgmt/http:GET+POST /api/peers/{id}/connectionshandlers with RBAC (so a user can only see remotes they are policy-permitted to reach, and only an admin can trigger a refresh).mgmt/router: register/api/peers/{id}/connections+/refreshroutes.Client
client/internal:conn_state_pusherwith adaptive heartbeat + snapshot. Batches updates, escalates frequency on dirty state, deescalates back to the configured heartbeat when steady.client/internal: wireconn_state_pusherinto Engine + the managementSyncreceiver loop.mgmt/client+engine: reporteffective_connection_mode+SyncMetadebounce.mgmt/client:SyncPeerConnectionsunary client method + interface + mock.client/peer/status: counters inFullStatus+UpdatePeerRemoteMeta. New helpersnotifyPeerListChanged,notifyPeerStateChangeListeners.client/engine: plumbRemotePeerConfigintopeer.UpdatePeerRemoteMeta.Desktop UI
client/ui: new "Peers" tab in the Networks window. Tab is now first; window renamed to "Peers and Networks"; tray menu rename.Show Fulltoggle + tab-aware actions.Refresh+ size-to-content.Android shim
client/android: gomobile getters for the 6 peer-status aggregate counters.client/android: enrichPeerInfowithpeer.Statefields for the Android UI to consume.Other
client/conn_mgr: lazy/dynamic mode change resets peers to Idle + tolerant version parsing.mgmt/peer_connections: nonce/router/refresh hardening per code review (TTL semantics, snapshot router cleanup, refresh-rate-limit).client/conn_state_pusher: dirty-on-fail + initial-snapshot gating.Tests
go test ./client/internal/peer ./client/internal/peer/guard ./management/server/peer_connections/...— pass.go build ./client/... ./management/...— pass on linux/amd64, linux/arm64, windows/amd64.netbird status -d.Test plan
peer_connections.Store+SnapshotRouter(TTL, expiry, concurrent reads).conn_state_pusher(adaptive heartbeat escalation, dirty-snapshot gate).notifyPeerListChanged,notifyPeerStateChangeListeners) covered by the existing peer-state-recorder test suite.Use case
A user opening the desktop client UI clicks the new "Peers" tab and immediately sees:
Today this is information a NetBird dev would gather by SSH'ing to the daemon, running
netbird status -d, then querying the management API by hand. The Peers tab makes it routine.Linked work
Maintainers are welcome to push directly to this branch.
Summary by CodeRabbit
New Features
Improvements
Documentation
These changes are internal lifecycle / behavioural improvements; no user-visible API or CLI flag added that warrants new public docs. Existing flags/Settings already documented at netbirdio/docs cover the surface area.