Skip to content

[client, management] Phase 3.7i of #5989: peer-status visibility — RemotePeerConfig + conn-state pusher + Peers UI (stack 3/4) #6083

Open
MichaelUray wants to merge 108 commits intonetbirdio:mainfrom
MichaelUray:pr/c-phase3.7i-visibility
Open

[client, management] Phase 3.7i of #5989: peer-status visibility — RemotePeerConfig + conn-state pusher + Peers UI (stack 3/4) #6083
MichaelUray wants to merge 108 commits intonetbirdio:mainfrom
MichaelUray:pr/c-phase3.7i-visibility

Conversation

@MichaelUray
Copy link
Copy Markdown
Contributor

@MichaelUray MichaelUray commented May 5, 2026

[client+management] Phase 3.7i of #5989: peer-status visibility (RemotePeerConfig + conn-state pusher + Peers UI)

Branch: pr/c-phase3.7i-visibility → base pr/b-phase3.5-network-change (stacked PR — depends on PR-B landing first)
Compare: https://github.com/netbirdio/netbird/compare/MichaelUray:netbird:pr/b-phase3.5-network-change...MichaelUray:netbird:pr/c-phase3.7i-visibility?expand=1


Summary

This PR delivers the peer-status visibility layer of #5989: a peer can now see the configured + effective connection-mode of its remotes, the remotes' liveness status as known to the management server, and per-peer aggregate counters. It introduces a unary SyncPeerConnections RPC plus an in-memory store with TTL on the management side, a client-side conn_state_pusher with adaptive heartbeat that batches updates, and a new "Peers" tab in the desktop client UI listing every peer with status badge + per-peer accordion.

This is the third of four stacked PRs implementing #5989. It builds on PR-B.

Why

For diagnosing "why is this peer on relay?", a user needs three things visible from one peer's daemon:

  1. What does the management server know about my remote peers' connectivity? (Are they reachable? When did they last sync?)
  2. What is the configured / effective connection-mode of each remote? (Does the remote have relay-forced set? Is it on p2p-dynamic?)
  3. What is the per-peer ICE-backoff state? (Is this peer in hourly retry?)

Pre-Phase-3.7i, only the first piece was indirectly available, by parsing the peer-list endpoint on the management API. There was no per-peer visibility into the lifecycle a remote was in. The "Peers" tab in the desktop UI used to be a flat list with just status colours.

Issue #5990 ("Per-peer and per-group connection-mode and inactivity-threshold override on the management server") depends on this visibility layer landing first.

What's in this PR

Wire schema

  • proto/management: RemotePeerConfig + PeerSystemMeta extensions for peer-status (effective + configured connection mode, P2P/relay timeouts, last-seen-at-server, server-liveness-known marker, groups).
  • proto/management: SyncPeerConnections unary RPC + PeerConnectionMap messages.
  • proto/daemon: FullStatus aggregate counters + PeerState enrichment.

Management

  • mgmt/peer_connections: Store interface + MemoryStore implementation with TTL.
  • mgmt/peer_connections: SnapshotRouter for on-demand refresh dispatch (one channel per stream, registered when the SyncPeerConnections client subscribes).
  • mgmt: shared peer_connections bootstrap + SyncPeerConnections handler routing.
  • mgmt/grpc: route SnapshotRequest via the existing Sync server-stream, bypassing the per-account debouncer (critical: the snapshot needs to arrive within seconds, not after the next accountUpdate cycle).
  • mgmt/peer: store effective_connection_mode in PeerSystemMeta.
  • mgmt/conversion: appendRemotePeerConfig fills effective + configured + groups + last-seen.
  • mgmt/http: GET + POST /api/peers/{id}/connections handlers with RBAC (so a user can only see remotes they are policy-permitted to reach, and only an admin can trigger a refresh).
  • mgmt/router: register /api/peers/{id}/connections + /refresh routes.

Client

  • client/internal: conn_state_pusher with adaptive heartbeat + snapshot. Batches updates, escalates frequency on dirty state, deescalates back to the configured heartbeat when steady.
  • client/internal: wire conn_state_pusher into Engine + the management Sync receiver loop.
  • mgmt/client+engine: report effective_connection_mode + SyncMeta debounce.
  • mgmt/client: SyncPeerConnections unary client method + interface + mock.
  • client/peer/status: counters in FullStatus + UpdatePeerRemoteMeta. New helpers notifyPeerListChanged, notifyPeerStateChangeListeners.
  • client/engine: plumb RemotePeerConfig into peer.UpdatePeerRemoteMeta.

Desktop UI

  • client/ui: new "Peers" tab in the Networks window. Tab is now first; window renamed to "Peers and Networks"; tray menu rename.
  • Per-peer expandable rows (accordion) with status badge, connection details, latency, ICE-backoff state.
  • Outer-footer Show Full toggle + tab-aware actions.
  • Peer rows use dynamic add/remove + scroll has 600px floor.
  • Persist per-peer expand state across Refresh + size-to-content.
  • Endpoint+RelayServer changes trigger an immediate state push (so the UI doesn't wait for the next heartbeat to reflect a relay-server change).

Android shim

  • client/android: gomobile getters for the 6 peer-status aggregate counters.
  • client/android: enrich PeerInfo with peer.State fields for the Android UI to consume.

Other

  • client/conn_mgr: lazy/dynamic mode change resets peers to Idle + tolerant version parsing.
  • mgmt/peer_connections: nonce/router/refresh hardening per code review (TTL semantics, snapshot router cleanup, refresh-rate-limit).
  • client/conn_state_pusher: dirty-on-fail + initial-snapshot gating.

Tests

  • go test ./client/internal/peer ./client/internal/peer/guard ./management/server/peer_connections/... — pass.
  • go build ./client/... ./management/... — pass on linux/amd64, linux/arm64, windows/amd64.
  • Hardware-validated end-to-end on a 4-peer testbed: management server (NetBird mgmt + dashboard) on Linux, daemon + UI on Windows 11, three OpenWrt routers as remote peers (LAN, Internet srflx, LTE srflx). The new Peers tab correctly shows each peer's connection mode, liveness, ICE-backoff state, and the per-peer counters add up to the totals shown in netbird status -d.

Test plan

  • Mgmt unit tests for peer_connections.Store + SnapshotRouter (TTL, expiry, concurrent reads).
  • Client unit tests for conn_state_pusher (adaptive heartbeat escalation, dirty-snapshot gate).
  • Status-recorder helpers (notifyPeerListChanged, notifyPeerStateChangeListeners) covered by the existing peer-state-recorder test suite.
  • Maintainer review of the new HTTP routes (RBAC behaviour for non-admin users).
  • Maintainer review of the unary RPC bypass of the per-account debouncer (intended; needs explicit acknowledgement).

Use case

A user opening the desktop client UI clicks the new "Peers" tab and immediately sees:

  • Which of their remotes are currently online (server-side liveness).
  • Which connection mode each remote is configured for and what it actually settled on.
  • Whether any remote is in ICE-backoff (with the next-retry timestamp).

Today this is information a NetBird dev would gather by SSH'ing to the daemon, running netbird status -d, then querying the management API by hand. The Peers tab makes it routine.

Linked work

Maintainers are welcome to push directly to this branch.

Summary by CodeRabbit

  • New Features

    • Connection Mode controls (Follow server, relay-forced, p2p, p2p-lazy, p2p-dynamic) surfaced in UI, CLI, installer and per-device preferences; server-pushed defaults shown as hints.
    • New Peers tab with expandable per-peer details, endpoints, latency, byte counters, ICE backoff info, agent/OS and group info.
    • On-demand peer-connection snapshots and refresh API for live per-peer connection maps.
  • Improvements

    • Richer status output and aggregate counters (online/offline, p2p/relayed/idle) for UI and status commands.

Documentation

  • Documentation is not needed

These changes are internal lifecycle / behavioural improvements; no user-visible API or CLI flag added that warrants new public docs. Existing flags/Settings already documented at netbirdio/docs cover the surface area.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 5, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds connection-mode and timeout controls (env/CLI/profile/server precedence), per-peer ICE backoff and two-timer inactivity, a conn-state pusher with SyncPeerConnections RPC + in-memory store/router, expanded status/proto fields, engine→management wiring for effective config and snapshots, UI/Android exposure, persistence, and tests.

Changes

Connection-mode, per-peer state, and sync (single DAG)

Layer / File(s) Summary
Proto / Data Shape
shared/management/proto/management.proto, client/proto/daemon.proto
Added ConnectionMode enum/fields, P2P/Relay timeout fields, server-pushed GetConfigResponse fields, PeerConnectionMap/PeerConnectionEntry/ConnType/PeerSnapshotRequest, SyncPeerConnections RPC, PeerState enrichment and FullStatus aggregate counters.
Core mode type & parsing
shared/connectionmode/mode.go, shared/connectionmode/mode_test.go
New connectionmode.Mode type with String/ParseString/ToProto/FromProto, legacy lazy-compat helpers and tests.
Env/CLI/Profile wiring
client/internal/peer/env.go, client/cmd/*, client/internal/profilemanager/config.go, client/cmd/service_installer.go, client/android/preferences.go
ResolveModeFromEnv added (NB_CONNECTION_MODE support and deprecation path), CLI flags registered, profile persistence for connection-mode/timeouts, profile manager fields added, Android preferences getters/setters for local overrides.
Conn manager resolution & lifecycle
client/internal/conn_mgr.go, client/internal/conn_mgr_test.go
ConnMgr resolves mode/timeout precedence (env, config, server), stores server-pushed state with mutex, integrates lazy/dynamic lifecycle, exposes accessors (Mode/RelayTimeout/P2pTimeout/P2pRetryMax and ServerPushed* getters), and tests.
Per-peer ICE/backoff & Conn wiring
client/internal/peer/ice_backoff.go, client/internal/peer/conn.go, client/internal/peer/worker_ice.go, client/internal/peer/handshaker.go, tests
Per-peer exponential iceBackoff with grace window; Conn gains Mode and P2pRetryMaxSeconds, AttachICE/DetachICE, network-change re-arm, worker ICE state hooks (IsConnected/onICEFailed/onICEConnected), and thread-safe ICE listener APIs plus tests.
Inactivity two-timer model
client/internal/lazyconn/inactivity/manager.go, client/internal/lazyconn/manager/manager.go, tests
Inactivity manager supports separate ICE and Relay timers, exposes ICEInactiveChan and RelayInactiveChan, new constructors and config mapping; default threshold adjusted and tests added.
Status shape, listener & WG stats
client/internal/peer/status.go, client/internal/peer/status_test.go, client/status/status.go
Peer State extended with ICE-backoff and remote meta (effective/configured modes/timeouts, groups, last-seen, server-liveness), FullStatus counters added, SetConnStateListener and RefreshWireGuardStats added, ToProto updated and status outputs extended; tests updated.
Conn-state pusher & engine adapters
client/internal/conn_state_pusher.go, client/internal/conn_state_pusher_test.go, client/internal/engine_pusher_adapters.go
New connStatePusher batching/delta/full snapshot logic, PushSink/PeerStateSource interfaces, engine adapters translating status→PeerConnectionEntry, and unit tests for delta/full behavior.
Engine integration & debounced SyncMeta
client/internal/engine.go, client/internal/debouncer/debouncer.go
Engine wires connStatePusher and syncMetaDebouncer, mirrors RemotePeerConfig into status, pushes EffectiveConnConfig to MGM, triggers initial snapshots and debounced SyncMeta; debouncer utility added.
Management server: router/store & HTTP
management/server/peer_connections/snapshot_router.go, management/server/peer_connections/store.go, management/server/http/handlers/peer_connections/handler.go
SnapshotRouter coordinates per-peer snapshot requests, MemoryStore provides TTL + delta-merge, new REST endpoints GET/POST /peers/{peerId}/connections and refresh with enrichment, and tests for router/store/handler.
Management gRPC & conversion
management/internals/shared/grpc/server.go, management/internals/shared/grpc/conversion.go, management/internals/controllers/network_map/controller/controller.go
gRPC server accepts peerConnStore/snapshotRouter, added SyncPeerConnections RPC handler, ToSyncResponse/appendRemotePeerConfig accept groupNamesByPeerID/context, BuildGroupNamesByPeerID helper, and conversion tests.
Account settings, persistence & auditing
management/server/types/settings.go, management/server/store/sql_store.go, management/server/account.go, management/server/http/handlers/accounts/accounts_handler.go
Settings gain nullable fields ConnectionMode, RelayTimeoutSeconds, P2pTimeoutSeconds, P2pRetryMaxSeconds; DB scan/load wired, API schema updated, and auditing emitted on changes; tests updated.
Client UI / Android / status exposure
client/ui/*, client/android/*
Desktop UI: Connection Mode selector, timeout inputs, new Peers tab and peers UI, server-pushed hints shown; Android PeersList and PeerInfo enriched and new getters on android Client exposed.
Shared management client & mocks
shared/management/client/grpc.go, shared/management/client/client.go, shared/management/client/mock.go
Added EffectiveConnConfig type, SetEffectiveConnConfig, SetSnapshotRequestHandler and SyncPeerConnections methods; infoToMetaData now includes effective config; mocks updated.
Tests & supporting utilities
multiple *_test.go, client/internal/debouncer/debouncer.go
Many new and updated unit tests across conn_mgr, conn_state_pusher, ice_backoff, lazyconn, peer env resolution, snapshot router/store, conversion, and HTTP/handler tests; debouncer utility added.
Small wiring / call-site updates
various call sites (nbgrpc.NewServer, API handler constructors, test utilities)
nbgrpc.NewServer and API/NewAPIHandler call sites updated to pass new peer-connections dependencies (store/router); corresponding test call sites adjusted.

Sequence Diagram(s)

sequenceDiagram
    participant Engine as Engine
    participant ConnMgr as ConnMgr
    participant Status as Status/PeerRecorder
    participant Grpc as GrpcClient
    participant MGM as Management
    participant Store as PeerConnStore

    Note over Engine,ConnMgr: resolve mode & timeouts
    Engine->>ConnMgr: resolveConnectionMode(env,cfg,server)
    ConnMgr-->>Engine: Mode/Timeouts/P2PRetryMax

    Note over Status,Engine: peer events & deltas
    Status->>Engine: Peer state changes / ICE events
    Engine->>Engine: connStatePusher computes delta/full
    Engine->>Grpc: SyncPeerConnections(encrypted PeerConnectionMap)
    Grpc->>MGM: SyncPeerConnections RPC
    MGM->>Store: Persist PeerConnectionMap
Loading
sequenceDiagram
    participant UI as Desktop UI
    participant Daemon as Daemon/Engine
    participant Profile as ProfileManager
    participant MGM as Management

    UI->>Daemon: SetConfigRequest(connection-mode/timeouts)
    Daemon->>Profile: applyConnectionModeFlagsToProfile
    Daemon->>Daemon: ConnMgr applies local overrides
    MGM->>Daemon: UpdatedRemotePeerConfig (server-pushed defaults)
    Daemon->>UI: GetConfigResponse(server_pushed_*)
    UI->>UI: Render server hints and user overrides
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related issues

Possibly related PRs

Suggested reviewers

  • mlsmaycon
  • pascal-fischer

"I hopped and I nudged the modes anew,
Timers ticked, snapshots flew,
Backoff counted, peers took flight,
Server hints glowed soft and bright,
A rabbit cheers the network's view 🐇"

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

@MichaelUray
Copy link
Copy Markdown
Contributor Author

Stack: #6081#6082#6083 (this PR)#6084. Depends on #6082 landing first.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 18

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (5)
client/internal/peer/worker_ice.go (1)

549-563: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Guard lastKnownState with muxAgent in the state callback.

IsConnected() reads lastKnownState under muxAgent, but this callback writes it without the same lock. That is a real data race, and it can make network-change handling observe stale connection state.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@client/internal/peer/worker_ice.go` around lines 549 - 563, The callback
updates w.lastKnownState without holding w.muxAgent while IsConnected() reads it
under that lock, causing a data race; fix by acquiring the same mutex
(w.muxAgent) around all reads/writes of lastKnownState inside the state callback
(the branch handling ice.ConnectionStateConnected and the branch handling
ice.ConnectionStateFailed/Disconnected/Closed) so the assignments to
w.lastKnownState and the subsequent calls that depend on it (e.g.,
w.logSuccessfulPaths, w.conn.onICEConnected, w.conn.onICEStateDisconnected, and
the sessionChanged handling from w.closeAgent) are performed while holding the
lock to match IsConnected()'s protection.
client/status/status.go (1)

605-659: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Keep this duplicate FullStatus -> proto mapper in sync with the expanded schema.

daemon.proto now includes aggregate counters plus the per-peer server_online / remote-mode / timeout fields, and client/internal/peer/status.go already serializes them. This copy only emits the legacy fields plus ICE backoff, so any path that goes through status.ToProtoFullStatus() will silently drop the new Peers-tab metadata.

Suggested fix
 	pbFullStatus.NumberOfForwardingRules = int32(fullStatus.NumOfForwardingRules)
 	pbFullStatus.LazyConnectionEnabled = fullStatus.LazyConnectionEnabled
+	pbFullStatus.ConfiguredPeersTotal = fullStatus.ConfiguredPeersTotal
+	pbFullStatus.ServerOnlinePeers = fullStatus.ServerOnlinePeers
+	pbFullStatus.P2PConnectedPeers = fullStatus.P2PConnectedPeers
+	pbFullStatus.RelayedConnectedPeers = fullStatus.RelayedConnectedPeers
+	pbFullStatus.IdleOnlinePeers = fullStatus.IdleOnlinePeers
+	pbFullStatus.ServerOfflinePeers = fullStatus.ServerOfflinePeers

 	for _, peerState := range fullStatus.Peers {
 		pbPeerState := &proto.PeerState{
 			IP:                         peerState.IP,
 			PubKey:                     peerState.PubKey,
@@
 			SshHostKey:                 peerState.SSHHostKey,
 			IceBackoffFailures:         int32(peerState.IceBackoffFailures),
 			IceBackoffNextRetry:        timestamppb.New(peerState.IceBackoffNextRetry),
 			IceBackoffSuspended:        peerState.IceBackoffSuspended,
+			ServerOnline:               peerState.ServerOnline,
+			Groups:                     peerState.RemoteGroups,
+			EffectiveConnectionMode:    peerState.RemoteEffectiveConnectionMode,
+			ConfiguredConnectionMode:   peerState.RemoteConfiguredConnectionMode,
+			EffectiveRelayTimeoutSecs:  peerState.RemoteEffectiveRelayTimeoutSecs,
+			EffectiveP2PTimeoutSecs:    peerState.RemoteEffectiveP2PTimeoutSecs,
+			EffectiveP2PRetryMaxSecs:   peerState.RemoteEffectiveP2PRetryMaxSecs,
+			ConfiguredRelayTimeoutSecs: peerState.RemoteConfiguredRelayTimeoutSecs,
+			ConfiguredP2PTimeoutSecs:   peerState.RemoteConfiguredP2PTimeoutSecs,
+			ConfiguredP2PRetryMaxSecs:  peerState.RemoteConfiguredP2PRetryMaxSecs,
 		}
+		if !peerState.RemoteLastSeenAtServer.IsZero() {
+			pbPeerState.LastSeenAtServer = timestamppb.New(peerState.RemoteLastSeenAtServer)
+		}
 		pbFullStatus.Peers = append(pbFullStatus.Peers, pbPeerState)
 	}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@client/status/status.go` around lines 605 - 659, ToProtoFullStatus currently
maps only the legacy Peer fields and ICE backoff, so it drops newly added
per-peer fields (server_online, remote_mode, timeout) and aggregate counters
from the expanded daemon.proto; update ToProtoFullStatus (in
client/status/status.go) to copy the same fields that
client/internal/peer/status.go emits for each peer and the top-level aggregate
counters from peer.FullStatus (e.g., add mapping for PeerState.ServerOnline,
PeerState.RemoteMode, PeerState.Timeout and set pbFullStatus aggregate counters
like NumPeersOnline/NumRemote/NumTimedOut or whatever aggregate names exist on
peer.FullStatus), ensuring pbPeerState includes the new proto fields and
pbFullStatus includes the new counters so paths using ToProtoFullStatus no
longer drop the new metadata.
management/server/account.go (1)

332-337: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Push connection-mode setting changes through the peer update path.

These new settings are now audited, but they still don't participate in the updateAccountPeers condition above. Changing ConnectionMode, RelayTimeoutSeconds, P2pTimeoutSeconds, or P2pRetryMaxSeconds will therefore persist the account settings without bumping the network serial or notifying connected peers, so clients can keep running with stale effective config until some unrelated update happens.

Suggested fix
-		if oldSettings.RoutingPeerDNSResolutionEnabled != newSettings.RoutingPeerDNSResolutionEnabled ||
-			oldSettings.LazyConnectionEnabled != newSettings.LazyConnectionEnabled ||
-			oldSettings.DNSDomain != newSettings.DNSDomain ||
-			oldSettings.AutoUpdateVersion != newSettings.AutoUpdateVersion ||
-			oldSettings.AutoUpdateAlways != newSettings.AutoUpdateAlways {
+		if oldSettings.RoutingPeerDNSResolutionEnabled != newSettings.RoutingPeerDNSResolutionEnabled ||
+			oldSettings.LazyConnectionEnabled != newSettings.LazyConnectionEnabled ||
+			oldSettings.DNSDomain != newSettings.DNSDomain ||
+			oldSettings.AutoUpdateVersion != newSettings.AutoUpdateVersion ||
+			oldSettings.AutoUpdateAlways != newSettings.AutoUpdateAlways ||
+			!equalStringPtr(oldSettings.ConnectionMode, newSettings.ConnectionMode) ||
+			!equalUint32Ptr(oldSettings.RelayTimeoutSeconds, newSettings.RelayTimeoutSeconds) ||
+			!equalUint32Ptr(oldSettings.P2pTimeoutSeconds, newSettings.P2pTimeoutSeconds) ||
+			!equalUint32Ptr(oldSettings.P2pRetryMaxSeconds, newSettings.P2pRetryMaxSeconds) {
 			updateAccountPeers = true
 		}

Also applies to: 372-375

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@management/server/account.go` around lines 332 - 337, The current comparison
that sets updateAccountPeers (comparing oldSettings vs newSettings) misses
connection-mode related fields so changes to ConnectionMode,
RelayTimeoutSeconds, P2pTimeoutSeconds, and P2pRetryMaxSeconds do not trigger
the peer update path; update the conditional that sets updateAccountPeers (the
block comparing oldSettings.RoutingPeerDNSResolutionEnabled,
LazyConnectionEnabled, DNSDomain, AutoUpdateVersion, AutoUpdateAlways) to also
compare oldSettings.ConnectionMode, oldSettings.RelayTimeoutSeconds,
oldSettings.P2pTimeoutSeconds, and oldSettings.P2pRetryMaxSeconds against
newSettings and set updateAccountPeers = true when any of those differ (and make
the same change in the other comparable conditional later in the same file that
does the peer-update decision).
client/ui/client_ui.go (1)

1944-1945: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use the popup-scoped context for the status poll.

The goroutine is meant to be cancelled by the CancelFunc returned from showLoginURL, but this RPC uses s.ctx. If the daemon stalls here, closing the popup will not cancel the in-flight poll.

Suggested fix
-				status, err := conn.Status(s.ctx, &proto.StatusRequest{})
+				pollCtx, cancel := context.WithTimeout(ctx, failFastTimeout)
+				status, err := conn.Status(pollCtx, &proto.StatusRequest{})
+				cancel()
 				if err != nil {
 					continue
 				}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@client/ui/client_ui.go` around lines 1944 - 1945, The status RPC is using the
long-lived s.ctx instead of the popup-scoped context returned from showLoginURL,
so cancelling the popup doesn't cancel the in-flight conn.Status call; change
the call to use the popup's context (the ctx returned alongside the CancelFunc
from showLoginURL) instead of s.ctx (i.e., replace conn.Status(s.ctx, ...) with
conn.Status(popupCtx, ...) or the local ctx variable tied to the CancelFunc),
ensuring the goroutine is canceled when the popup is closed.
client/cmd/up.go (1)

438-453: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Disallow mixing --connection-mode with the legacy lazy-connection flag.

This builder can now send both LazyConnectionEnabled and ConnectionMode in the same request. Modes like p2p-lazy/p2p-dynamic already encode the lazy-connection behavior, so netbird up --enable-lazy-connection ... --connection-mode ... produces conflicting instructions and leaves precedence to the daemon. The same ambiguity is repeated in setupConfig and setupLoginRequest below.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@client/cmd/up.go` around lines 438 - 453, Detect when both
enableLazyConnectionFlag and connectionModeFlag are provided and fail fast: in
the request-building code that sets req.LazyConnectionEnabled and
req.ConnectionMode (and in the analogous setupConfig and setupLoginRequest
functions), check if cmd.Flag(enableLazyConnectionFlag).Changed &&
cmd.Flag(connectionModeFlag).Changed and return an error (or print a user-facing
message and exit) explaining that --enable-lazy-connection cannot be combined
with --connection-mode because connection modes encode lazy behavior; do this
before mutating req so neither field is set when the flags conflict.
🟡 Minor comments (14)
client/internal/debouncer/debouncer.go-28-53 (1)

28-53: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Generation guard against time.AfterFunc + Stop race.

There is a small window where Trigger (or Stop) cannot actually cancel the previously scheduled callback: once the runtime has started the AfterFunc goroutine but before it acquires d.mu, d.timer.Stop() returns false and the goroutine will still proceed to read d.fn and invoke it. Concretely:

  1. Trigger(A) schedules T1.
  2. T1 fires → goroutine G1 is spawned but not yet on d.mu.
  3. Trigger(B) acquires mu, sets d.fn = B, T1.Stop() returns false, schedules T2.
  4. G1 acquires mu, reads d.fn = B, invokes B().
  5. Later T2 fires and invokes B() again.

Stop() has the same issue: a callback already past the timer-fire boundary will still run because d.fn is not cleared.

For the SyncMeta caller this only causes an extra push, but it does defeat the debounce in racy windows. A generation counter fixes both cases:

♻️ Proposed fix using a generation counter
 type Debouncer struct {
 	delay time.Duration
 	mu    sync.Mutex
 	timer *time.Timer
 	fn    func()
+	gen   uint64
 }

 func (d *Debouncer) Trigger(fn func()) {
 	d.mu.Lock()
 	defer d.mu.Unlock()
 	d.fn = fn
 	if d.timer != nil {
 		d.timer.Stop()
 	}
+	d.gen++
+	myGen := d.gen
 	d.timer = time.AfterFunc(d.delay, func() {
 		d.mu.Lock()
+		if myGen != d.gen {
+			d.mu.Unlock()
+			return
+		}
 		f := d.fn
+		d.fn = nil
 		d.mu.Unlock()
 		if f != nil {
 			f()
 		}
 	})
 }

 func (d *Debouncer) Stop() {
 	d.mu.Lock()
 	defer d.mu.Unlock()
 	if d.timer != nil {
 		d.timer.Stop()
 		d.timer = nil
 	}
+	d.gen++
+	d.fn = nil
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@client/internal/debouncer/debouncer.go` around lines 28 - 53, The
Trigger/Stop race allows an AfterFunc goroutine that has already started to read
a stale d.fn and invoke it; fix by adding a generation counter on Debouncer
(e.g., gen uint64) and use it to validate callbacks: increment d.gen whenever
you replace/clear d.fn (in Trigger before scheduling and in Stop to cancel),
capture the current generation into a local variable when creating
time.AfterFunc, and inside the AfterFunc acquire d.mu and compare the captured
generation to d.gen — only call the captured fn if generations match and the fn
is still non-nil; keep existing mutex usage for d.mu, and still call
d.timer.Stop() as before but rely on the generation check to prevent the
"already fired" goroutine from executing a stale fn.
client/internal/peer/ice_backoff.go-137-155 (1)

137-155: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Reset stale retry metadata on success/reset.

markSuccess() and Reset() leave nextRetry untouched. Snapshot() will then keep exposing an old retry timestamp even though Suspended is false, which can leak stale data into the new peer-status surfaces.

Suggested fix
 func (s *iceBackoffState) markSuccess() {
 	s.mu.Lock()
 	defer s.mu.Unlock()
 	s.failures = 0
 	s.suspended = false
+	s.nextRetry = time.Time{}
 	s.bo.Reset()
 }
@@
 func (s *iceBackoffState) Reset() {
 	s.mu.Lock()
 	defer s.mu.Unlock()
 	s.failures = 0
 	s.suspended = false
+	s.nextRetry = time.Time{}
 	s.bo.Reset()
 	s.lastResetAt = time.Now()
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@client/internal/peer/ice_backoff.go` around lines 137 - 155, markSuccess()
and Reset() currently clear failures/suspended and backoff but leave nextRetry
untouched, so Snapshot() can expose stale retry timestamps; update
iceBackoffState.markSuccess and iceBackoffState.Reset to also clear/reset the
nextRetry field (e.g., set nextRetry to the zero time) after resetting bo and
suspended so that Snapshot() no longer returns stale retry metadata; ensure you
modify both methods that update lastResetAt (Reset) and the success path
(markSuccess) to reset nextRetry consistently.
client/internal/peer/env_test.go-25-25 (1)

25-25: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix the misspelling in this test case name.

unparseable trips the codespell check here; changing it to unparsable should clear the failure.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@client/internal/peer/env_test.go` at line 25, Rename the test case name
string "connection_mode unparseable falls through to legacy" to "connection_mode
unparsable falls through to legacy" in the test table (the literal present in
the slice/array entry) so the misspelling "unparseable" is corrected to
"unparsable" and the codespell check will pass.
client/internal/conn_state_pusher.go-104-112 (1)

104-112: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

OnSnapshotRequest does not actually coalesce — doc claim is wrong.

The comment says "multiple requests in flight result in a single full snapshot with the latest nonce echoed", but the loop reads one nonce per iteration (line 174) and emits a full snapshot per nonce. With buffer size 4, a burst of refresh requests results in up to four back-to-back full snapshots, each echoing a different (older-first) nonce, instead of one snapshot with the latest. Mgmt's store handles it (newer seq wins), but it's wasteful and the doc misleads anyone reasoning about it.

Either fix the doc or actually coalesce by draining the queue (mirrors the events drain pattern at lines 161–169):

🔧 Drain queued nonces and emit one snapshot
 		case nonce := <-p.snapshotReq:
+			// Coalesce any additional refresh requests buffered behind us;
+			// only the latest nonce needs to be echoed back to mgmt.
+			drain := true
+			for drain {
+				select {
+				case n2 := <-p.snapshotReq:
+					if n2 > nonce {
+						nonce = n2
+					}
+				default:
+					drain = false
+				}
+			}
 			if p.source != nil {
 				p.flushFull(p.source.SnapshotAllRemotePeers(), nonce)
 			}

Also applies to: 174-180

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@client/internal/conn_state_pusher.go` around lines 104 - 112, The comment
claims coalescing but the processing loop currently emits one snapshot per
queued nonce; update the processing logic in connStatePusher to actually
coalesce by draining p.snapshotReq: when you receive a nonce from p.snapshotReq
(the same queue written by OnSnapshotRequest), run a non-blocking loop to read
and discard any additional pending nonces, keeping the latest value, then emit a
single full snapshot with that latest nonce (mirror the existing events-drain
pattern used elsewhere); reference connStatePusher, OnSnapshotRequest, and
p.snapshotReq when locating where to implement the drain-and-keep-latest change.
client/internal/conn_state_pusher.go-234-253 (1)

234-253: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

The flushFull "dirty-retain" comment overstates what happens after a failed full snapshot.

The comment claims "the next push (or the next snapshot request) will see every peer as dirty". That's not what computeDeltaFromSource does — it diffs the latest source snapshot against p.lastPushed, and lastPushed still contains whatever was successfully pushed before this failed full snapshot. So if a full snapshot fails:

  • Peers whose state genuinely changed since the last successful push will be re-included in the next delta tick (good).
  • Peers whose state hasn't changed since the last successful delta/full will not be re-pushed — they still match lastPushed and isMaterialChange returns false.

If the intent is "a failed full snapshot causes the next tick to re-send the entire snapshot to repair mgmt", you'd need to either invalidate lastPushed on full-snapshot failure, or set a flag that forces the next tick to call flushFull instead of flushDelta. Otherwise please tone the comment down to match reality:

📝 More accurate comment
 	}); err != nil {
-		// Same dirty-retain semantics as flushDelta. A failed full
-		// snapshot leaves lastPushed unchanged so the next push (or
-		// the next snapshot request) will see every peer as dirty.
+		// Leave lastPushed unchanged so peers that have changed since
+		// the previous successful push remain dirty on the next tick.
+		// NOTE: peers that haven't changed will NOT be retransmitted —
+		// mgmt's view of those peers stays stale until either another
+		// flushFull succeeds or those peers change. Consider invalidating
+		// lastPushed here if full mgmt-side repair is required.
 		return
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@client/internal/conn_state_pusher.go` around lines 234 - 253, The comment in
flushFull overstates behavior: when a full snapshot Push fails, p.lastPushed is
left unchanged so computeDeltaFromSource will only re-include peers whose
current state differs from p.lastPushed (isMaterialChange), not "every peer";
update the comment in connStatePusher.flushFull to state this exact behavior and
note two alternatives if the intent is to force a repair: either invalidate
p.lastPushed (so every peer looks dirty) or set a flag to force the next tick to
call flushFull instead of flushDelta; reference the functions/fields flushFull,
computeDeltaFromSource, p.lastPushed, isMaterialChange, and flushDelta so
readers can find the relevant logic.
management/server/peer_connections/store.go-73-101 (1)

73-101: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Full snapshot path drops prev.InResponseToNonce when the new snapshot's nonce is 0.

The merge path on line 83-85 is careful to preserve the latest non-zero InResponseToNonce so a refresh poller (GET ?since=N) doesn't lose track of an in-flight nonce when a delta with InResponseToNonce=0 arrives. The full-snapshot path on line 73 just clones m and replaces, so a follow-up full snapshot from the client (e.g., reconnect / initial-after-restart) with InResponseToNonce=0 overwrites a previously-stored nonce=N. Any client that was polling ?since=N will then see GetWithNonceCheck refuse the entry until the next refresh request lands.

Apply the same "keep latest non-zero nonce" rule on the replacement path:

🔧 Suggested fix
 	stored := proto.Clone(m).(*mgmProto.PeerConnectionMap)
-	if !m.GetFullSnapshot() && prev != nil {
+	if m.GetFullSnapshot() && prev != nil {
+		// Preserve a previously-recorded refresh nonce when the new
+		// full snapshot wasn't pushed in response to a refresh request.
+		if stored.GetInResponseToNonce() == 0 && prev.m.GetInResponseToNonce() > 0 {
+			stored.InResponseToNonce = prev.m.GetInResponseToNonce()
+		}
+	} else if prev != nil {
 		merged := proto.Clone(prev.m).(*mgmProto.PeerConnectionMap)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@management/server/peer_connections/store.go` around lines 73 - 101, The
full-snapshot path currently overwrites a previously-stored non-zero
InResponseToNonce with a new snapshot that has InResponseToNonce==0; after
cloning m into stored (stored := proto.Clone(m)...), if prev != nil and
stored.GetInResponseToNonce() == 0 and prev.m.GetInResponseToNonce() > 0 then
copy the nonce from prev (stored.InResponseToNonce =
prev.m.GetInResponseToNonce()); keep this check just before s.maps[peerPubKey] =
&memEntry{...} so stored retains the latest non-zero nonce while leaving other
fields from m intact.
shared/management/http/api/types.gen.go-1482-1487 (1)

1482-1487: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Update the generated connection_mode docs to match current behavior.

These comments still say p2p-dynamic is reserved and behaves like p2p in “Phase 1”, but this PR already exposes dynamic-mode-specific timeout/backoff settings. The published OpenAPI/SDK docs will advertise obsolete semantics unless the source schema description is refreshed.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@shared/management/http/api/types.gen.go` around lines 1482 - 1487, The
generated doc comment for the ConnectionMode field is outdated; update the
comment on ConnectionMode (*AccountSettingsConnectionMode) to remove the
"reserved" / "Phase 1" language and instead state the current behavior: that
p2p-dynamic is supported and exposes dynamic-mode-specific timeout and backoff
settings (so it no longer just falls back to p2p), while NULL still falls back
to lazy_connection_enabled for backwards compatibility; ensure the new
description briefly names the supported modes (relay-forced, p2p, p2p-lazy,
p2p-dynamic) and mentions where dynamic-specific settings apply so OpenAPI/SDK
docs reflect current semantics.
client/cmd/service.go-60-76 (1)

60-76: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

These install/reconfigure flags won't be populated from NB_/WT_ env vars.

They are registered on subcommands, but SetFlagsFromEnvVars only walks cmd.Root().PersistentFlags(). Headless installs that rely on environment-based config will silently ignore NB_CONNECTION_MODE, NB_RELAY_TIMEOUT, NB_P2P_TIMEOUT, and NB_P2P_RETRY_MAX unless the env loader is updated to visit the active command's flags too.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@client/cmd/service.go` around lines 60 - 76, The install/reconfigure flags
(connectionModeFlag, relayTimeoutFlag, p2pTimeoutFlag, p2pRetryMaxFlag) are
registered only on installCmd/reconfigureCmd so SetFlagsFromEnvVars doesn't pick
up NB_/WT_ env vars; either make these flags persistent on the root command or
update SetFlagsFromEnvVars to also iterate the active command's local flags
(visit installCmd.Flags() and reconfigureCmd.Flags() or the current command's
Flags()) so NB_CONNECTION_MODE, NB_RELAY_TIMEOUT, NB_P2P_TIMEOUT and
NB_P2P_RETRY_MAX are loaded from environment when running install/reconfigure;
update references in SetFlagsFromEnvVars and ensure the env-to-flag mapping
logic still uses the same flag names/vars.
management/server/http/handlers/peer_connections/handler_test.go-40-45 (1)

40-45: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Make the fake enforce accountID.

This stub ignores the new accountID parameter, so these tests would still pass if the handler accidentally enriched entries with peer data from another account. Checking p.AccountID here keeps the test double aligned with the production contract.

Suggested fix
 func (a *fakeAM) GetPeerByPubKey(_ context.Context, _, pubKey string) (*nbpeer.Peer, error) {
 	p, ok := a.peersByKey[pubKey]
-	if !ok {
+	if !ok {
 		return nil, errors.New("not found")
 	}
+	if p.AccountID != accountID {
+		return nil, errors.New("not found")
+	}
 	return p, nil
 }
-func (a *fakeAM) GetPeerByPubKey(_ context.Context, _, pubKey string) (*nbpeer.Peer, error) {
+func (a *fakeAM) GetPeerByPubKey(_ context.Context, accountID, pubKey string) (*nbpeer.Peer, error) {
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@management/server/http/handlers/peer_connections/handler_test.go` around
lines 40 - 45, The fake GetPeerByPubKey stub currently ignores the accountID
parameter; update its signature to name the account id parameter (e.g.,
accountID string) and enforce it by checking that the found peer's AccountID
matches the passed accountID, returning the "not found" error if there is no
peer or the AccountID mismatches; keep the rest of the lookup logic (lookup by
pubKey in a.peersByKey and return p,nil when both pubKey exists and
p.AccountID==accountID) so the test double mirrors production behavior.
management/server/peer/peer.go-142-147 (1)

142-147: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Treat runtime-config-only metadata as non-empty.

UpdateMetaIfNew() exits early when meta.isEmpty(). Since isEmpty() still ignores EffectiveConnectionMode and the three timeout fields, an update that only changes these new runtime values is discarded before isEqual() ever runs.

Suggested fix
 func (p PeerSystemMeta) isEmpty() bool {
 	return p.Hostname == "" &&
 		p.GoOS == "" &&
 		p.Kernel == "" &&
 		p.Core == "" &&
 		p.Platform == "" &&
 		p.OS == "" &&
 		p.OSVersion == "" &&
 		p.WtVersion == "" &&
 		p.UIVersion == "" &&
 		p.KernelVersion == "" &&
 		len(p.NetworkAddresses) == 0 &&
 		p.SystemSerialNumber == "" &&
 		p.SystemProductName == "" &&
 		p.SystemManufacturer == "" &&
 		p.Environment.Cloud == "" &&
 		p.Environment.Platform == "" &&
+		p.EffectiveConnectionMode == "" &&
+		p.EffectiveRelayTimeoutSecs == 0 &&
+		p.EffectiveP2PTimeoutSecs == 0 &&
+		p.EffectiveP2PRetryMaxSecs == 0 &&
 		len(p.Files) == 0
 }

Also applies to: 256-272

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@management/server/peer/peer.go` around lines 142 - 147, UpdateMetaIfNew()
currently returns early when meta.isEmpty(), but isEmpty() ignores the new
runtime-only fields (EffectiveConnectionMode, EffectiveRelayTimeoutSecs,
EffectiveP2PTimeoutSecs, EffectiveP2PRetryMaxSecs) so runtime-only updates are
dropped; fix by updating the isEmpty() implementation (or its use in
UpdateMetaIfNew()) to treat those four fields as making meta non-empty (i.e.,
consider EffectiveConnectionMode and the three Effective* timeout fields when
deciding emptiness) so that UpdateMetaIfNew() proceeds to isEqual() and applies
runtime-only metadata changes.
client/ui/peers_tab.go-45-57 (1)

45-57: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Clear the previous snapshot when refresh fails.

Both error paths only update summary. breakdown and the peer list keep the last successful snapshot, so the tab can show stale peer state underneath an error banner. Clear the old content in the same UI update so the screen can't be read as current data.

Suggested fix
 		conn, err := s.getSrvClient(failFastTimeout)
 		if err != nil {
-			fyne.Do(func() { summary.SetText("Error: " + err.Error()) })
+			fyne.Do(func() {
+				summary.SetText("Error: " + err.Error())
+				breakdown.SetText("")
+				listVBox.Objects = nil
+				listVBox.Refresh()
+			})
 			return
 		}
@@
 		st, err := conn.Status(callCtx, &proto.StatusRequest{GetFullPeerStatus: true})
 		if err != nil {
-			fyne.Do(func() { summary.SetText("Error: " + err.Error()) })
+			fyne.Do(func() {
+				summary.SetText("Error: " + err.Error())
+				breakdown.SetText("")
+				listVBox.Objects = nil
+				listVBox.Refresh()
+			})
 			return
 		}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@client/ui/peers_tab.go` around lines 45 - 57, When a refresh fails in the
render closure (errors from s.getSrvClient or conn.Status), clear the old
snapshot in the same fyne.Do UI update that sets summary so stale information
isn't shown: in the error branches update summary.SetText(...) AND also clear
the breakdown widget (breakdown.SetText("") or equivalent) and clear/refresh the
peers list UI (e.g., peersList.Clear() or peersContainer.Objects = nil followed
by Refresh()) so both breakdown and peer list are emptied when an error occurs.
management/server/http/handlers/accounts/accounts_handler.go-218-228 (1)

218-228: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Misleading comment about null-clearing semantics for ConnectionMode.

The comment (lines 223–225) says "Clients clear an override by sending JSON null" but then immediately says "leaving the existing value untouched" — these two statements are contradictory. "Clearing an override" and "leaving the value untouched" are opposite outcomes.

The actual runtime behavior depends on whether UpdateAccountSettings performs a full replace or a partial merge of nil fields. If it's a full replace, null propagates as nil and does clear the override. If it's a merge, null is ignored and the old value persists. The comment obscures this distinction and will mislead future maintainers about the expected contract.

Consider replacing with a comment that accurately describes whichever semantic is actually implemented.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@management/server/http/handlers/accounts/accounts_handler.go` around lines
218 - 228, The comment about "Clients clear an override by sending JSON null" is
contradictory to "leaving the existing value untouched" — inspect the update
flow (the UpdateAccountSettings implementation) to determine whether
req.Settings.ConnectionMode being nil means "ignore/merge" or whether a JSON
null value is propagated to clear the field; then replace the misleading comment
above the block that references req.Settings.ConnectionMode and
returnSettings.ConnectionMode with a concise, accurate statement that describes
the actual behavior (e.g., "A JSON null for ConnectionMode is treated as X by
UpdateAccountSettings, so this branch only runs when a non-null value is
provided" or "A JSON null clears the stored value; this branch only handles
non-null overrides"), and remove the contradictory language.
client/server/server.go-1518-1538 (1)

1518-1538: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

spP2pRetMax is never zero when cm != nil, contradicting the inline comment.

ServerPushedP2pRetryMaxSecs() always returns a non-zero value: the pushed value when available, otherwise peer.DefaultP2PRetryMax / time.Second. The other three accessors (ServerPushedRelayTimeoutSecs, ServerPushedP2pTimeoutSecs, ServerPushedMode) return their zero value when the server hasn't pushed anything yet.

The comment on line 1521–1522 says "All zero/empty when the engine has not received PeerConfig yet", which is false for spP2pRetMax. Any UI consumer of ServerPushedP2PRetryMaxSeconds cannot distinguish "server explicitly set this" from "this is just the client default", whereas it can make that distinction for the other three fields.

Either the accessor should drop the default fallback (return 0 when unpushed), or the comment and UI handling need to be updated to account for this field always carrying a non-zero value.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@client/server/server.go` around lines 1518 - 1538, The comment claims all
four sp* values are zero/empty before PeerConfig arrives but spP2pRetMax never
is because ConnMgr.ServerPushedP2pRetryMaxSecs() currently falls back to
peer.DefaultP2PRetryMax; fix this by changing ServerPushedP2pRetryMaxSecs to
return 0 when no server value was pushed (matching
ServerPushedRelayTimeoutSecs/ServerPushedP2pTimeoutSecs/ServerPushedMode
semantics) so spP2pRetMax will be zero when unpushed, and keep callers (e.g.,
the block that sets spP2pRetMax) unchanged. Ensure the ConnMgr method's
signature/contract and any tests reflect the new zero-return behavior.
client/ui/network.go-97-103 (1)

97-103: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Clear the tab being left, not the newly selected tab.

OnUnselected resolves the grid through tabs.Selected(), so when switching from Overlapping/Exit-node to Peers it can clear the wrong grid. Pick the grid from item.Text instead.

Suggested fix
 	tabs.OnUnselected = func(item *container.TabItem) {
 		// Only reset network grids when leaving a network tab; the
 		// peers VBox manages its own state.
 		if item != nil && item.Text != peersText {
-			grid, _ := getGridAndFilterFromTab(tabs, allGrid, overlappingGrid, exitNodeGrid)
+			var grid *fyne.Container
+			switch item.Text {
+			case overlappingNetworksText:
+				grid = overlappingGrid
+			case exitNodeNetworksText:
+				grid = exitNodeGrid
+			default:
+				grid = allGrid
+			}
 			grid.Objects = nil
 		}
 	}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@client/ui/network.go` around lines 97 - 103, OnUnselected currently uses
tabs.Selected() when resolving which grid to clear and thus can clear the newly
selected tab's grid; change the logic so the grid is picked based on the leaving
tab's identifier (item.Text) instead. In the tabs.OnUnselected handler (symbol:
OnUnselected) call or update getGridAndFilterFromTab to accept the leaving tab
text (item.Text) and return the corresponding grid among allGrid,
overlappingGrid, exitNodeGrid (and ignore peersText), then set that grid's
Objects = nil; ensure peersText is still treated as a no-op so the peers tab
never gets cleared.
🧹 Nitpick comments (8)
management/server/store/sql_store.go (1)

1502-1571: ⚡ Quick win

Add a pgx regression test for these nullable settings fields.

This path manually keeps the SELECT list, Scan(...) order, and assignments in sync. A focused round-trip test for ConnectionMode, RelayTimeoutSeconds, P2pTimeoutSeconds, and P2pRetryMaxSeconds would pay off here because a mismatch only breaks the Postgres fast path.

Also applies to: 1631-1646

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@management/server/store/sql_store.go` around lines 1502 - 1571, Add a focused
pgx regression test that round-trips the nullable settings fields used in
accountQuery/Scan to catch fast-path column/order regressions: insert an account
row with explicit values and NULLs for settings_connection_mode,
settings_relay_timeout_seconds, settings_p2p_timeout_seconds,
settings_p2p_retry_max_seconds (and optionally settings_lazy_connection_enabled
and related extra settings), then load it via the same code path (the QueryRow +
Scan that fills sConnectionMode, sRelayTimeoutSeconds, sP2pTimeoutSeconds,
sP2pRetryMaxSeconds, sLazyConnectionEnabled, etc.) and assert the scanned
sql.NullString/sql.NullInt64/sql.NullBool values match the inserted ones; add
the same test for the other Scan block referenced around the 1631-1646 area to
ensure both SELECT/Scan orders remain in sync.
management/internals/shared/grpc/conversion.go (1)

161-161: ⚖️ Poor tradeoff

SonarCloud: 16 parameters on ToSyncResponse. Worth folding into an options struct.

Static analysis flags this and the function has accreted enough orthogonal inputs (config trio, peer/credentials, network state, dnsName, checks/cache, settings/extraSettings, peerGroups, dnsFwdPort, groupNamesByPeerID) that a single ToSyncResponseInput struct would make call-sites self-documenting and stop further parameter creep — especially since this PR is already adding a groupNamesByPeerID arg. The newly-introduced AppendRemotePeerConfigContext shows the codebase is already moving toward that pattern; same treatment one level up would be consistent.

Not strictly required for this PR; flagging as a recommended follow-up.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@management/internals/shared/grpc/conversion.go` at line 161, The
ToSyncResponse function has grown too many parameters; introduce a
ToSyncResponseInput struct to group the orthogonal inputs (e.g., Config,
HttpServerConfig, DeviceAuthorizationFlow, Peer, Token turnCredentials, Token
relayCredentials, NetworkMap, dnsName, Checks slice, DNSConfigCache, Settings,
ExtraSettings, peerGroups, dnsFwdPort, groupNamesByPeerID) and change the
signature of ToSyncResponse to accept a single *ToSyncResponseInput; update all
call sites to construct and pass this struct (optionally provide a
constructor/helper to build it for readability) and adapt any tests, and follow
the same pattern used by AppendRemotePeerConfigContext to keep usage consistent.
shared/management/proto/management.proto (1)

235-241: 💤 Low value

Reserve the tag gap 18–49 in PeerSystemMeta for consistency with PeerConfig.

PeerConfig carefully calls out its tag gap with reserved 9, 10; plus a comment so reviewers don't accidentally land a future field on a reused tag. PeerSystemMeta jumps from Flags flags = 17; straight to effective_connection_mode = 50; with no reserved 18 to 49; and no comment, so a future small addition here is one careless commit away from colliding. Cheap to make defensive now.

♻️ Suggested defensive marker
   Flags flags = 17;
+
+  // Tags 18–49 are intentionally left unused so future small additions
+  // can land without re-numbering the new connection-mode/effective-*
+  // fields below. Reserved here to make the gap explicit. Phase 3.7i (`#5989`).
+  reserved 18 to 49;

   // Phase 3.7i (`#5989`): connection mode/timeouts this peer is actually
   // running with. Mgmt copies into RemotePeerConfig of every other peer.
   string effective_connection_mode    = 50;
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@shared/management/proto/management.proto` around lines 235 - 241, Add a
defensive reserved field range to PeerSystemMeta to prevent accidental tag
reuse: after the existing "Flags flags = 17;" declaration, add a `reserved 18 to
49;` with a short comment mirroring PeerConfig's style so future reviewers know
that tags 18–49 are intentionally skipped; update the proto in the same message
(PeerSystemMeta) that currently jumps to `effective_connection_mode = 50` to
include this reserved declaration.
management/internals/shared/grpc/conversion_test.go (1)

21-180: ⚡ Quick win

Add one test that exercises the new groupNamesByPeerID path.

These additions cover connection-mode and retry-max wiring well, but they still don’t pin the new remote-peer group-name serialization introduced in this PR. A small ToSyncResponse case with one peer in multiple groups would catch regressions in both propagation and ordering.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@management/internals/shared/grpc/conversion_test.go` around lines 21 - 180,
Add a new unit test that exercises the groupNamesByPeerID path by creating a
minimal Network and two or more Group objects where a single Peer appears in
multiple groups, then call ToSyncResponse (or the public wrapper that invokes
groupNamesByPeerID) with those groups and peers and assert the returned
SyncResponse encodes the peer's group names (both presence and stable ordering)
on the wire; locate the test near the other toPeerConfig tests (use the existing
network/peer fixtures or toPeerConfigForTest for setup), call ToSyncResponse
with one peer in multiple groups, and add assertions that the response contains
the expected group-name slice for that peer and that repeated runs preserve
ordering.
management/server/peer/peer_test.go (1)

145-152: ⚡ Quick win

Cover the other effective config fields in this regression test.

PeerSystemMeta.isEqual now compares more than EffectiveConnectionMode. A regression in EffectiveRelayTimeoutSecs, EffectiveP2PTimeoutSecs, or EffectiveP2PRetryMaxSecs would still pass this test, so this is worth making table-driven while the change is fresh.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@management/server/peer/peer_test.go` around lines 145 - 152, The test
TestPeerSystemMeta_isEqual_ChecksEffectiveFields only checks
EffectiveConnectionMode but isEqual now compares EffectiveRelayTimeoutSecs,
EffectiveP2PTimeoutSecs, and EffectiveP2PRetryMaxSecs as well; convert this into
a table-driven test that iterates cases where a single effective field differs
(use PeerSystemMeta instances and call isEqual) for each of
EffectiveConnectionMode, EffectiveRelayTimeoutSecs, EffectiveP2PTimeoutSecs, and
EffectiveP2PRetryMaxSecs and assert isEqual returns false for each case, and
true for an identical base case to ensure all effective fields are covered.
management/server/http/handlers/accounts/accounts_handler_test.go (1)

340-414: ⚡ Quick win

Round out this API test with the other new connection settings.

This only verifies p2p_retry_max_seconds. The same request/response mapping was added for connection_mode, relay_timeout_seconds, and p2p_timeout_seconds, so a wiring bug in any of those fields would still merge cleanly.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@management/server/http/handlers/accounts/accounts_handler_test.go` around
lines 340 - 414, TestAccountsHandler_PutSettings_P2pRetryMax only asserts
P2pRetryMaxSeconds but the new request/response mapping also includes
connection_mode, relay_timeout_seconds, and p2p_timeout_seconds; update the test
(TestAccountsHandler_PutSettings_P2pRetryMax) to include those fields in the
JSON request body sent to handler.updateAccount and add the corresponding fields
to expectedSettings (e.g., ConnectionMode, RelayTimeoutSeconds,
P2pTimeoutSeconds in api.AccountSettings) with the expected values so the test
verifies all new connection settings are round-tripped.
shared/management/http/api/openapi.yml (2)

367-393: ⚡ Quick win

Avoid rollout-phase notes in public API descriptions.

The phase-specific wording (Phase 1/2/3, rollout behavior notes) is likely to become stale in generated API docs. Prefer stable behavior-focused descriptions so the contract remains accurate across releases.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@shared/management/http/api/openapi.yml` around lines 367 - 393, Remove
phase-specific rollout wording from the descriptions for p2p_mode,
p2p_timeout_seconds, and p2p_retry_max_seconds and replace with stable,
behavior-focused text that describes current runtime effects and defaults; e.g.,
state what NULL means, the effective modes for each setting (without mentioning
"Phase 1/2/3" or future rollout plans), the default values (e.g., built-in
default 180 minutes, 900 seconds), and when a value is ignored or treated as 0,
so the public API docs stay accurate across releases.

373-405: ⚡ Quick win

Add explicit upper bounds for timeout settings.

These properties currently only enforce minimum: 0, so very large values still pass schema validation and can drift from practical backend limits. Add maximum constraints aligned with server-side validation for p2p_timeout_seconds, p2p_retry_max_seconds, and relay_timeout_seconds.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@shared/management/http/api/openapi.yml` around lines 373 - 405, OpenAPI
schema properties p2p_timeout_seconds, p2p_retry_max_seconds, and
relay_timeout_seconds lack upper bounds; add a maximum field to each to match
the server-side validation limits. For each property (p2p_timeout_seconds,
p2p_retry_max_seconds, relay_timeout_seconds) add a "maximum" integer (format:
int64) value equal to the backend's allowed max, ensure the maximum is >= the
existing minimum (0), keep nullable: true and existing descriptions/examples
(update example if needed, e.g., p2p_retry_max_seconds), and regenerate/validate
the spec so schema validation enforces the same limits as the server.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 15ca1bee-2a94-4ad8-9f78-233908bf91ea

📥 Commits

Reviewing files that changed from the base of the PR and between b19b746 and d48893f.

⛔ Files ignored due to path filters (5)
  • client/proto/daemon.pb.go is excluded by !**/*.pb.go
  • client/proto/daemon_grpc.pb.go is excluded by !**/*.pb.go
  • shared/management/proto/management.pb.go is excluded by !**/*.pb.go
  • shared/management/proto/management_grpc.pb.go is excluded by !**/*.pb.go
  • shared/management/proto/proxy_service.pb.go is excluded by !**/*.pb.go
📒 Files selected for processing (81)
  • client/android/client.go
  • client/android/peer_notifier.go
  • client/android/preferences.go
  • client/cmd/root.go
  • client/cmd/service.go
  • client/cmd/service_installer.go
  • client/cmd/up.go
  • client/internal/conn_mgr.go
  • client/internal/conn_mgr_test.go
  • client/internal/conn_state_pusher.go
  • client/internal/conn_state_pusher_test.go
  • client/internal/connect.go
  • client/internal/debouncer/debouncer.go
  • client/internal/engine.go
  • client/internal/engine_pusher_adapters.go
  • client/internal/engine_test.go
  • client/internal/lazyconn/env.go
  • client/internal/lazyconn/inactivity/manager.go
  • client/internal/lazyconn/inactivity/manager_test.go
  • client/internal/lazyconn/manager/manager.go
  • client/internal/lazyconn/support.go
  • client/internal/peer/conn.go
  • client/internal/peer/conn_test.go
  • client/internal/peer/env.go
  • client/internal/peer/env_test.go
  • client/internal/peer/guard/guard.go
  • client/internal/peer/handshaker.go
  • client/internal/peer/handshaker_test.go
  • client/internal/peer/ice_backoff.go
  • client/internal/peer/ice_backoff_test.go
  • client/internal/peer/status.go
  • client/internal/peer/status_test.go
  • client/internal/peer/worker_ice.go
  • client/internal/profilemanager/config.go
  • client/proto/daemon.proto
  • client/server/server.go
  • client/status/status.go
  • client/status/status_test.go
  • client/ui/client_ui.go
  • client/ui/const.go
  • client/ui/event_handler.go
  • client/ui/network.go
  • client/ui/peers_tab.go
  • management/internals/controllers/network_map/controller/controller.go
  • management/internals/server/boot.go
  • management/internals/shared/grpc/conversion.go
  • management/internals/shared/grpc/conversion_test.go
  • management/internals/shared/grpc/server.go
  • management/server/account.go
  • management/server/account/manager.go
  • management/server/account/manager_mock.go
  • management/server/activity/codes.go
  • management/server/http/handler.go
  • management/server/http/handlers/accounts/accounts_handler.go
  • management/server/http/handlers/accounts/accounts_handler_test.go
  • management/server/http/handlers/peer_connections/handler.go
  • management/server/http/handlers/peer_connections/handler_test.go
  • management/server/http/testing/testing_tools/channel/channel.go
  • management/server/management_proto_test.go
  • management/server/management_test.go
  • management/server/mock_server/account_mock.go
  • management/server/peer.go
  • management/server/peer/peer.go
  • management/server/peer/peer_test.go
  • management/server/peer_connections/snapshot_router.go
  • management/server/peer_connections/snapshot_router_test.go
  • management/server/peer_connections/store.go
  • management/server/peer_connections/store_test.go
  • management/server/peer_test.go
  • management/server/store/sql_store.go
  • management/server/types/settings.go
  • management/server/types/settings_test.go
  • shared/connectionmode/mode.go
  • shared/connectionmode/mode_test.go
  • shared/management/client/client.go
  • shared/management/client/client_test.go
  • shared/management/client/grpc.go
  • shared/management/client/mock.go
  • shared/management/http/api/openapi.yml
  • shared/management/http/api/types.gen.go
  • shared/management/proto/management.proto
💤 Files with no reviewable changes (2)
  • client/ui/const.go
  • client/ui/event_handler.go

Comment on lines +349 to +392
func (p *Preferences) SetRelayTimeoutSeconds(secs int64) {
v := uint32(secs)
p.configInput.RelayTimeoutSeconds = &v
}

// GetP2pTimeoutSeconds returns the locally configured ICE-worker
// inactivity timeout in seconds (only effective in p2p-dynamic mode),
// or 0 if no override is set.
func (p *Preferences) GetP2pTimeoutSeconds() (int64, error) {
if p.configInput.P2pTimeoutSeconds != nil {
return int64(*p.configInput.P2pTimeoutSeconds), nil
}
cfg, err := profilemanager.ReadConfig(p.configInput.ConfigPath)
if err != nil {
return 0, err
}
return int64(cfg.P2pTimeoutSeconds), nil
}

// SetP2pTimeoutSeconds stores a local override for the p2p timeout.
// Pass 0 to clear the override.
func (p *Preferences) SetP2pTimeoutSeconds(secs int64) {
v := uint32(secs)
p.configInput.P2pTimeoutSeconds = &v
}

// GetP2pRetryMaxSeconds returns the locally configured cap on the
// per-peer ICE-failure backoff schedule, or 0 if no override is set.
func (p *Preferences) GetP2pRetryMaxSeconds() (int64, error) {
if p.configInput.P2pRetryMaxSeconds != nil {
return int64(*p.configInput.P2pRetryMaxSeconds), nil
}
cfg, err := profilemanager.ReadConfig(p.configInput.ConfigPath)
if err != nil {
return 0, err
}
return int64(cfg.P2pRetryMaxSeconds), nil
}

// SetP2pRetryMaxSeconds stores a local override for the backoff cap.
// Pass 0 to clear the override.
func (p *Preferences) SetP2pRetryMaxSeconds(secs int64) {
v := uint32(secs)
p.configInput.P2pRetryMaxSeconds = &v
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Guard the int64uint32 conversions.

These setters silently wrap invalid inputs; for example, -1 becomes 4294967295. That stores a bogus timeout/backoff override instead of clearing or rejecting it.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@client/android/preferences.go` around lines 349 - 392, The setters
(SetRelayTimeoutSeconds, SetP2pTimeoutSeconds, SetP2pRetryMaxSeconds) currently
cast int64→uint32 and silently wrap negative or too-large values; change each to
guard inputs: treat secs <= 0 as "clear the override" by setting the
corresponding p.configInput.RelayTimeoutSeconds / P2pTimeoutSeconds /
P2pRetryMaxSeconds to nil, if secs > math.MaxUint32 clamp to math.MaxUint32
before converting, otherwise convert safely to uint32 and store the pointer;
implement these checks so negative values do not wrap into large uint32s and
zero clears the override.

Comment thread client/internal/conn_mgr.go
Comment on lines +83 to +95
// NewManagerWithTwoTimers is the Phase-2 constructor. Pass 0 for either
// timeout to disable that teardown path. Both 0 leaves the manager
// running but inert (no channel ever fires) -- used by p2p / relay-forced
// modes that don't tear down workers.
func NewManagerWithTwoTimers(iface WgInterface, iceTimeout, relayTimeout time.Duration) *Manager {
if iceTimeout > 0 {
log.Infof("ICE inactivity timeout: %v", iceTimeout)
}
if relayTimeout > 0 {
log.Infof("relay inactivity timeout: %v", relayTimeout)
}
return newManager(iface, iceTimeout, relayTimeout)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate the new two-timer inputs before constructing the manager.

NewManagerWithTwoTimers() bypasses validateInactivityThreshold, so negative or sub-minute values are now accepted even though the old constructor rejected them. That can make peers tear down on the first ticker pass or behave inconsistently across constructors.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@client/internal/lazyconn/inactivity/manager.go` around lines 83 - 95,
NewManagerWithTwoTimers currently skips input validation, so call
validateInactivityThreshold for both iceTimeout and relayTimeout before calling
newManager (the same validation used by the old constructor) and reject invalid
values; if validation fails, log an error and return nil (or otherwise propagate
the validation error) to avoid constructing a Manager with negative or
sub-minute timeouts. Ensure you reference NewManagerWithTwoTimers,
validateInactivityThreshold, and newManager when making the change so behavior
is consistent across constructors.

Comment on lines +202 to +212
// Phase 3: initialize per-peer ICE-failure backoff. The cap comes
// from the resolved P2pRetryMaxSeconds. 0 means "use built-in default".
backoffCap := time.Duration(conn.config.P2pRetryMaxSeconds) * time.Second
if backoffCap == 0 {
backoffCap = DefaultP2PRetryMax
}
if conn.iceBackoff == nil {
conn.iceBackoff = newIceBackoff(backoffCap)
} else {
conn.iceBackoff.SetMaxBackoff(backoffCap)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Preserve the explicit “disable ICE backoff” state across reopens.

SetIceBackoffMax(0) stores config.P2pRetryMaxSeconds = 0, but Open() interprets zero as “use DefaultP2PRetryMax”. After the next close/reopen, a peer that was explicitly configured to disable retry backoff comes back with the default 15-minute cap instead.

Suggested fix
 // Open()
-	backoffCap := time.Duration(conn.config.P2pRetryMaxSeconds) * time.Second
-	if backoffCap == 0 {
-		backoffCap = DefaultP2PRetryMax
-	}
+	var backoffCap time.Duration
+	switch conn.config.P2pRetryMaxSeconds {
+	case ^uint32(0):
+		backoffCap = 0
+	case 0:
+		backoffCap = DefaultP2PRetryMax
+	default:
+		backoffCap = time.Duration(conn.config.P2pRetryMaxSeconds) * time.Second
+	}

 // SetIceBackoffMax()
-	conn.config.P2pRetryMaxSeconds = uint32(d / time.Second)
+	if d == 0 {
+		conn.config.P2pRetryMaxSeconds = ^uint32(0)
+	} else {
+		conn.config.P2pRetryMaxSeconds = uint32(d / time.Second)
+	}

Also applies to: 1166-1172

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@client/internal/peer/conn.go` around lines 202 - 212, Open() currently treats
config.P2pRetryMaxSeconds == 0 as "use DefaultP2PRetryMax", which loses an
explicit "disabled" setting from SetIceBackoffMax(0); add a persistent flag
(e.g. conn.iceBackoffDisabled bool) that SetIceBackoffMax toggles when passed 0,
and update Open()'s initialization logic (the block using
config.P2pRetryMaxSeconds, backoffCap, newIceBackoff and
iceBackoff.SetMaxBackoff) to check conn.iceBackoffDisabled first and, if true,
initialize or set conn.iceBackoff to a disabled state (backoffCap == 0 /
SetMaxBackoff(0)) instead of substituting DefaultP2PRetryMax so the explicit
"disable" choice survives close/reopen.

Comment on lines +51 to +83
func ResolveModeFromEnv() (connectionmode.Mode, uint32) {
mode := connectionmode.ModeUnspecified

if raw := os.Getenv(EnvKeyNBConnectionMode); raw != "" {
parsed, err := connectionmode.ParseString(raw)
if err != nil {
log.Warnf("ignoring %s=%q: %v", EnvKeyNBConnectionMode, raw, err)
} else if parsed != connectionmode.ModeUnspecified {
mode = parsed
}
}

if mode == connectionmode.ModeUnspecified {
if strings.EqualFold(os.Getenv(EnvKeyNBForceRelay), "true") {
warnDeprecated(EnvKeyNBForceRelay, EnvKeyNBConnectionMode+"=relay-forced")
mode = connectionmode.ModeRelayForced
} else if isLazyEnvTrue() {
warnDeprecated(envEnableLazyConn, EnvKeyNBConnectionMode+"=p2p-lazy")
mode = connectionmode.ModeP2PLazy
}
}

timeoutSecs := uint32(0)
if raw := os.Getenv(envInactivityThreshold); raw != "" {
if d, err := time.ParseDuration(raw); err == nil {
timeoutSecs = uint32(d.Seconds())
warnDeprecated(envInactivityThreshold, "the relay_timeout setting on the management server")
} else {
log.Warnf("ignoring %s=%q: %v", envInactivityThreshold, raw, err)
}
}

return mode, timeoutSecs
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Preserve the js relay-only special case in the new resolver.

IsForceRelayed() still forces relay on runtime.GOOS == "js", but ResolveModeFromEnv() does not. Any caller migrated to the new API can now resolve to a non-relay mode in browsers, which breaks the old compatibility guarantee.

Suggested fix
 func ResolveModeFromEnv() (connectionmode.Mode, uint32) {
+	if runtime.GOOS == "js" {
+		return connectionmode.ModeRelayForced, 0
+	}
+
 	mode := connectionmode.ModeUnspecified
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func ResolveModeFromEnv() (connectionmode.Mode, uint32) {
mode := connectionmode.ModeUnspecified
if raw := os.Getenv(EnvKeyNBConnectionMode); raw != "" {
parsed, err := connectionmode.ParseString(raw)
if err != nil {
log.Warnf("ignoring %s=%q: %v", EnvKeyNBConnectionMode, raw, err)
} else if parsed != connectionmode.ModeUnspecified {
mode = parsed
}
}
if mode == connectionmode.ModeUnspecified {
if strings.EqualFold(os.Getenv(EnvKeyNBForceRelay), "true") {
warnDeprecated(EnvKeyNBForceRelay, EnvKeyNBConnectionMode+"=relay-forced")
mode = connectionmode.ModeRelayForced
} else if isLazyEnvTrue() {
warnDeprecated(envEnableLazyConn, EnvKeyNBConnectionMode+"=p2p-lazy")
mode = connectionmode.ModeP2PLazy
}
}
timeoutSecs := uint32(0)
if raw := os.Getenv(envInactivityThreshold); raw != "" {
if d, err := time.ParseDuration(raw); err == nil {
timeoutSecs = uint32(d.Seconds())
warnDeprecated(envInactivityThreshold, "the relay_timeout setting on the management server")
} else {
log.Warnf("ignoring %s=%q: %v", envInactivityThreshold, raw, err)
}
}
return mode, timeoutSecs
func ResolveModeFromEnv() (connectionmode.Mode, uint32) {
if runtime.GOOS == "js" {
return connectionmode.ModeRelayForced, 0
}
mode := connectionmode.ModeUnspecified
if raw := os.Getenv(EnvKeyNBConnectionMode); raw != "" {
parsed, err := connectionmode.ParseString(raw)
if err != nil {
log.Warnf("ignoring %s=%q: %v", EnvKeyNBConnectionMode, raw, err)
} else if parsed != connectionmode.ModeUnspecified {
mode = parsed
}
}
if mode == connectionmode.ModeUnspecified {
if strings.EqualFold(os.Getenv(EnvKeyNBForceRelay), "true") {
warnDeprecated(EnvKeyNBForceRelay, EnvKeyNBConnectionMode+"=relay-forced")
mode = connectionmode.ModeRelayForced
} else if isLazyEnvTrue() {
warnDeprecated(envEnableLazyConn, EnvKeyNBConnectionMode+"=p2p-lazy")
mode = connectionmode.ModeP2PLazy
}
}
timeoutSecs := uint32(0)
if raw := os.Getenv(envInactivityThreshold); raw != "" {
if d, err := time.ParseDuration(raw); err == nil {
timeoutSecs = uint32(d.Seconds())
warnDeprecated(envInactivityThreshold, "the relay_timeout setting on the management server")
} else {
log.Warnf("ignoring %s=%q: %v", envInactivityThreshold, raw, err)
}
}
return mode, timeoutSecs
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@client/internal/peer/env.go` around lines 51 - 83, ResolveModeFromEnv no
longer preserves the browser ("js") special-case that forces relay-only mode;
update ResolveModeFromEnv so it checks runtime.GOOS == "js" (or calls
IsForceRelayed if that encapsulates the check) and if true sets mode =
connectionmode.ModeRelayForced (and emits the same deprecation/warning path if
appropriate) before returning so callers in JS still get relay-only behavior;
locate the logic in ResolveModeFromEnv and apply the check either early (before
env parsing) or after parsing but prior to returning.

Comment on lines +143 to +149
peerConnHandler := peer_connections_http.NewHandler(
peerConnStore,
&pcAccountManagerAdapter{am: accountManager, nmc: networkMapController},
peerConnRouter,
)
router.HandleFunc("/peers/{peerId}/connections", peerConnHandler.GetPeerConnections).Methods("GET", "OPTIONS")
router.HandleFunc("/peers/{peerId}/connections/refresh", peerConnHandler.PostRefresh).Methods("POST", "OPTIONS")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== peer_connections handler ==="
sed -n '1,260p' management/server/http/handlers/peer_connections/handler.go

echo
echo "=== peers handler/auth patterns ==="
rg -n -C3 --type=go 'HasPermission|permissions\.|Permission|UserAuth|Admin|Owner|RegularUser|GetUserFromUserAuth|Check.*Permission' \
  management/server/http/handlers/peer_connections \
  management/server/http/handlers/peers

echo
echo "=== constructor signatures ==="
rg -n -C2 --type=go 'func NewHandler|func AddEndpoints' \
  management/server/http/handlers/peer_connections \
  management/server/http/handlers/peers

Repository: netbirdio/netbird

Length of output: 24412


Add explicit RBAC checks to the peer-connections routes.

The peer_connections handler does not enforce the same permission validation as the existing peers endpoints:

  • peers handler calls h.permissionsManager.ValidateUserPermissions(r.Context(), accountID, userID, modules.Peers, operations.Read) before returning peer data
  • peer_connections handler calls only h.account.GetPeer(r.Context(), userAuth.AccountId, peerID, userAuth.UserId) and relies implicitly on that method for authorization

The NewHandler constructor for peer_connections does not receive permissionsManager at all, so explicit permission checks cannot be added without refactoring.

Since these endpoints expose per-peer state and trigger network changes (refresh), they should enforce the same Peers read/write permissions as the existing /peers routes before merge.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@management/server/http/handler.go` around lines 143 - 149, The
peer-connections routes lack explicit RBAC checks; refactor
peer_connections_http.NewHandler to accept the permissionsManager (same type
used by the peers handler), then in the handler methods (GetPeerConnections and
PostRefresh) call permissionsManager.ValidateUserPermissions(ctx, accountID,
userID, modules.Peers, operations.Read) before returning data and
ValidateUserPermissions(..., modules.Peers, operations.Write) (or the
appropriate operation) before performing refresh/network changes; replace the
current reliance on h.account.GetPeer for authorization with these explicit
checks and use the pcAccountManagerAdapter/peerConnStore as before.

Comment on lines +229 to +240
if req.Settings.P2pTimeoutSeconds != nil {
v := uint32(*req.Settings.P2pTimeoutSeconds)
returnSettings.P2pTimeoutSeconds = &v
}
if req.Settings.P2pRetryMaxSeconds != nil {
v := uint32(*req.Settings.P2pRetryMaxSeconds)
returnSettings.P2pRetryMaxSeconds = &v
}
if req.Settings.RelayTimeoutSeconds != nil {
v := uint32(*req.Settings.RelayTimeoutSeconds)
returnSettings.RelayTimeoutSeconds = &v
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Missing bounds validation before int64 → uint32 truncation for timeout fields.

All three timeout values come in as *int64 from the API but are stored as uint32. There is no check that the value is non-negative or within [0, math.MaxUint32]. In Go, uint32(int64(-1)) silently produces 4294967295 (~136-year timeout) rather than an error. A client submitting -1 would end up with a de-facto infinite peer timeout with no diagnostic.

🛡️ Proposed fix — add a bounds-check helper
+func toUint32Seconds(fieldName string, v int64) (uint32, error) {
+	if v < 0 {
+		return 0, fmt.Errorf("%s must be non-negative, got %d", fieldName, v)
+	}
+	if v > math.MaxUint32 {
+		return 0, fmt.Errorf("%s value %d exceeds maximum allowed (%d)", fieldName, v, math.MaxUint32)
+	}
+	return uint32(v), nil
+}

Then in updateAccountRequestSettings:

 	if req.Settings.P2pTimeoutSeconds != nil {
-		v := uint32(*req.Settings.P2pTimeoutSeconds)
-		returnSettings.P2pTimeoutSeconds = &v
+		v, err := toUint32Seconds("p2p_timeout_seconds", *req.Settings.P2pTimeoutSeconds)
+		if err != nil {
+			return nil, err
+		}
+		returnSettings.P2pTimeoutSeconds = &v
 	}
 	if req.Settings.P2pRetryMaxSeconds != nil {
-		v := uint32(*req.Settings.P2pRetryMaxSeconds)
-		returnSettings.P2pRetryMaxSeconds = &v
+		v, err := toUint32Seconds("p2p_retry_max_seconds", *req.Settings.P2pRetryMaxSeconds)
+		if err != nil {
+			return nil, err
+		}
+		returnSettings.P2pRetryMaxSeconds = &v
 	}
 	if req.Settings.RelayTimeoutSeconds != nil {
-		v := uint32(*req.Settings.RelayTimeoutSeconds)
-		returnSettings.RelayTimeoutSeconds = &v
+		v, err := toUint32Seconds("relay_timeout_seconds", *req.Settings.RelayTimeoutSeconds)
+		if err != nil {
+			return nil, err
+		}
+		returnSettings.RelayTimeoutSeconds = &v
 	}

You'll also need "math" in the imports.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@management/server/http/handlers/accounts/accounts_handler.go` around lines
229 - 240, The timeout conversions in updateAccountRequestSettings
(req.Settings.P2pTimeoutSeconds, P2pRetryMaxSeconds, RelayTimeoutSeconds →
returnSettings.*) currently cast *int64 to uint32 without bounds checking; add a
small helper (e.g., validateUint32FromInt64) that accepts *int64, returns
(uint32, error) and ensures the input is non-nil, >= 0 and <= math.MaxUint32,
then use that helper for P2pTimeoutSeconds, P2pRetryMaxSeconds and
RelayTimeoutSeconds and propagate/return an error if validation fails; also add
"math" to imports.

Comment on lines +85 to +88
peer, err := h.account.GetPeer(r.Context(), userAuth.AccountId, peerID, userAuth.UserId)
if err != nil {
http.Error(w, "peer not found", http.StatusNotFound)
return
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don't turn every GetPeer failure into 404.

Both endpoints return "peer not found" for all GetPeer errors. That masks transient/store failures as a false 404 and makes clients behave as if the peer disappeared. Distinguish not-found from everything else and return 5xx for internal failures.

Also applies to: 137-140

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@management/server/http/handlers/peer_connections/handler.go` around lines 85
- 88, The handlers currently treat any h.account.GetPeer error as 404; change
both call sites that invoke h.account.GetPeer in handler.go to distinguish "not
found" from other errors by using the store's sentinel or error type check
(e.g., errors.Is(err, store.ErrNotFound) or a typed NotFound error returned by
GetPeer). If the error is a NotFound return http.StatusNotFound; for any other
error return http.StatusInternalServerError, log the error with context
(including peerID/accountID/userID) and then write a 5xx response. Ensure you
update both occurrences of h.account.GetPeer in this file.

Comment on lines +64 to +71
func (s *MemoryStore) Put(peerPubKey string, m *mgmProto.PeerConnectionMap) {
s.mu.Lock()
defer s.mu.Unlock()

prev := s.maps[peerPubKey]
if prev != nil && m.GetSeq() > 0 && m.GetSeq() <= prev.m.GetSeq() {
return
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Full snapshots can be silently dropped after a client engine restart (seq counter resets).

The pusher's seq is an in-memory counter that resets to 0 on every engine start (see connStatePusher.flushDelta/flushFull incrementing p.seq from 0). After a peer process/engine restart:

  1. Client reconnects and pushes its initial full snapshot with seq=1.
  2. Mgmt still holds the old prev (TTL hasn't expired), with e.g. prev.Seq=137.
  3. Line 69: m.GetSeq() > 0 && m.GetSeq() <= prev.m.GetSeq()1 <= 137drop.

Mgmt's view of that peer's connections then stays stale until the in-memory TTL expires, even though the client has been pushing fresh data the whole time. Subsequent deltas hit the same wall. This is exactly the case full snapshots are supposed to authoritatively repair.

Bypass the seq check on full snapshots — they're always authoritative replacements:

🔧 Suggested fix
 	prev := s.maps[peerPubKey]
-	if prev != nil && m.GetSeq() > 0 && m.GetSeq() <= prev.m.GetSeq() {
+	// Full snapshots are authoritative replacements (e.g., after a client
+	// engine restart the pusher's seq counter resets to 0, so its first
+	// snapshot has seq=1 even if prev.Seq is much higher). Only out-of-order
+	// *deltas* should be dropped.
+	if prev != nil && !m.GetFullSnapshot() && m.GetSeq() > 0 && m.GetSeq() <= prev.m.GetSeq() {
 		return
 	}

Also consider adding a regression test in store_test.go covering "full snapshot with lower seq after restart replaces prev".

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func (s *MemoryStore) Put(peerPubKey string, m *mgmProto.PeerConnectionMap) {
s.mu.Lock()
defer s.mu.Unlock()
prev := s.maps[peerPubKey]
if prev != nil && m.GetSeq() > 0 && m.GetSeq() <= prev.m.GetSeq() {
return
}
func (s *MemoryStore) Put(peerPubKey string, m *mgmProto.PeerConnectionMap) {
s.mu.Lock()
defer s.mu.Unlock()
prev := s.maps[peerPubKey]
// Full snapshots are authoritative replacements (e.g., after a client
// engine restart the pusher's seq counter resets to 0, so its first
// snapshot has seq=1 even if prev.Seq is much higher). Only out-of-order
// *deltas* should be dropped.
if prev != nil && !m.GetFullSnapshot() && m.GetSeq() > 0 && m.GetSeq() <= prev.m.GetSeq() {
return
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@management/server/peer_connections/store.go` around lines 64 - 71, In
MemoryStore.Put, the current seq gating can silently drop authoritative full
snapshots after a client restart; modify MemoryStore.Put so that if the incoming
mgmProto.PeerConnectionMap is a full snapshot (check m.GetFull() or the
equivalent "full" boolean accessor on PeerConnectionMap) you bypass the seq
comparison and always replace prev (i.e., skip the m.GetSeq() <= prev.m.GetSeq()
return path for full snapshots), then proceed to store the new map and update
TTL as usual; add a regression test in store_test.go asserting that a full
snapshot with a lower seq replaces an existing prev with a higher seq after
restart.

Comment thread management/server/peer.go
Comment on lines +1228 to +1235
func (am *DefaultAccountManager) GetPeerByPubKey(ctx context.Context, accountID, pubKey string) (*nbpeer.Peer, error) {
p, err := am.Store.GetPeerByPeerPubKey(ctx, store.LockingStrengthNone, pubKey)
if err != nil {
return nil, err
}
if p.AccountID != accountID {
return nil, fmt.Errorf("peer with pubkey %s not in account %s", pubKey, accountID)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Return a typed peer/account error here.

On Lines 1233-1235 this switches to a plain fmt.Errorf, while the other peer lookup paths return status.NewPeerNotPartOfAccountError() for the same account-mismatch case. Keeping the typed status error here preserves consistent handling in the HTTP layer.

Suggested fix
 func (am *DefaultAccountManager) GetPeerByPubKey(ctx context.Context, accountID, pubKey string) (*nbpeer.Peer, error) {
 	p, err := am.Store.GetPeerByPeerPubKey(ctx, store.LockingStrengthNone, pubKey)
 	if err != nil {
 		return nil, err
 	}
 	if p.AccountID != accountID {
-		return nil, fmt.Errorf("peer with pubkey %s not in account %s", pubKey, accountID)
+		return nil, status.NewPeerNotPartOfAccountError()
 	}
 	return p, nil
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func (am *DefaultAccountManager) GetPeerByPubKey(ctx context.Context, accountID, pubKey string) (*nbpeer.Peer, error) {
p, err := am.Store.GetPeerByPeerPubKey(ctx, store.LockingStrengthNone, pubKey)
if err != nil {
return nil, err
}
if p.AccountID != accountID {
return nil, fmt.Errorf("peer with pubkey %s not in account %s", pubKey, accountID)
}
func (am *DefaultAccountManager) GetPeerByPubKey(ctx context.Context, accountID, pubKey string) (*nbpeer.Peer, error) {
p, err := am.Store.GetPeerByPubKey(ctx, store.LockingStrengthNone, pubKey)
if err != nil {
return nil, err
}
if p.AccountID != accountID {
return nil, status.NewPeerNotPartOfAccountError()
}
return p, nil
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@management/server/peer.go` around lines 1228 - 1235, In GetPeerByPubKey
(DefaultAccountManager) replace the plain fmt.Errorf returned when p.AccountID
!= accountID with the typed status error used elsewhere — call
status.NewPeerNotPartOfAccountError(...) with the same peer pubKey and accountID
(matching other lookup paths) so the HTTP layer receives the consistent
peer/account error type.

@MichaelUray MichaelUray changed the title [3/4 stacked] Phase 3.7i of #5989: peer-status visibility (RemotePeerConfig + conn-state pusher + Peers UI) [client, management] Phase 3.7i of #5989: peer-status visibility — RemotePeerConfig + conn-state pusher + Peers UI (stack 3/4) May 6, 2026
@MichaelUray MichaelUray force-pushed the pr/c-phase3.7i-visibility branch from d48893f to 907bc1c Compare May 6, 2026 06:22
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 13

♻️ Duplicate comments (9)
client/internal/peer/ice_backoff.go (1)

161-169: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Clear any active suspension when backoff is disabled.

SetMaxBackoff(0) rebuilds the scheduler, but an already-suspended peer keeps honoring the old nextRetry. That means a server push disabling ICE backoff can stay ineffective until the stale timer expires.

Suggested fix
 func (s *iceBackoffState) SetMaxBackoff(d time.Duration) {
 	s.mu.Lock()
 	defer s.mu.Unlock()
 	if d == s.maxBackoff {
 		return
 	}
 	s.maxBackoff = d
 	s.bo = buildBackoff(d)
+	if d == 0 {
+		s.suspended = false
+		s.nextRetry = time.Time{}
+	}
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@client/internal/peer/ice_backoff.go` around lines 161 - 169, SetMaxBackoff
currently rebuilds the scheduler but does not clear an existing suspension, so
calling SetMaxBackoff(0) can leave a peer suspended until the old nextRetry
elapses; update SetMaxBackoff (in iceBackoffState) to, under the same lock, when
d == 0 clear any active suspension by resetting s.suspended (and s.nextRetry to
zero/empty) before rebuilding s.bo and assigning s.maxBackoff so the peer is
immediately unsuspended when backoff is disabled.
management/server/peer.go (1)

1228-1235: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Return the typed peer/account error here.

This account-mismatch path should use the same typed status error as the other peer lookup APIs; fmt.Errorf makes the HTTP layer handle it inconsistently.

Suggested fix
 	if p.AccountID != accountID {
-		return nil, fmt.Errorf("peer with pubkey %s not in account %s", pubKey, accountID)
+		return nil, status.NewPeerNotPartOfAccountError()
 	}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@management/server/peer.go` around lines 1228 - 1235, GetPeerByPubKey
currently returns a plain fmt.Errorf when the peer.AccountID doesn't match,
causing inconsistent HTTP handling; replace that fmt.Errorf with the same typed
status/account-mismatch error used by the other peer lookup APIs so the HTTP
layer treats it consistently. Locate GetPeerByPubKey (and compare with the other
peer lookup functions that call am.Store.GetPeerByPeerPubKey) and return the
same typed error value/function (the project’s account/peer status error helper)
instead of fmt.Errorf, preserving the error message/context about pubKey and
accountID.
management/server/peer_connections/store.go (1)

68-70: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Do not seq-drop authoritative full snapshots.

After a client restart, the pusher's counter starts over, so its first full snapshot can legitimately have a lower seq than the cached map. This gate drops that repair snapshot and leaves the old state in place until TTL expiry.

Suggested fix
-	if prev != nil && m.GetSeq() > 0 && m.GetSeq() <= prev.m.GetSeq() {
+	if prev != nil && !m.GetFullSnapshot() && m.GetSeq() > 0 && m.GetSeq() <= prev.m.GetSeq() {
 		return
 	}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@management/server/peer_connections/store.go` around lines 68 - 70, The
current seq-check in the peer map update (using s.maps, prev, m.GetSeq(),
prev.m.GetSeq()) wrongly drops authoritative full snapshots with lower seqs;
change the guard so you only return when the incoming map is not an
authoritative full snapshot — e.g. replace the condition "if prev != nil &&
m.GetSeq() > 0 && m.GetSeq() <= prev.m.GetSeq() { return }" with a check that
also requires the map to be non-full/ non-authoritative (something like "&&
!m.IsFullSnapshot()" or "&& !m.GetMeta().GetAuthoritative()") before returning;
ensure you use the actual predicate your Map type exposes to detect an
authoritative/full snapshot so full repairs are accepted even if seq is lower.
management/internals/shared/grpc/server.go (2)

487-489: ⚠️ Potential issue | 🟠 Major

Disable the snapshot select case once the router channel is closed.

continue on a closed snapshotCh leaves this case permanently ready and can spin the goroutine at 100% CPU. Set snapshotCh = nil or return when ok is false.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@management/internals/shared/grpc/server.go` around lines 487 - 489, The
select case reading from snapshotCh currently does "if !ok { continue }" which
leaves the case permanently ready and can spin the goroutine; update the
handling in the select branch that reads "case nonce, ok := <-snapshotCh:" to
either set snapshotCh = nil when ok is false (to disable that select case) or
return/exit the goroutine instead, ensuring you reference and modify the
snapshotCh variable in the scope of the goroutine where it is used (adjust logic
in the surrounding loop/select to handle snapshotCh being nil).

1175-1183: ⚠️ Potential issue | 🟠 Major

Reject connection-map uploads from unregistered peers before storing them.

parseRequest() only proves the sender owns the WireGuard key used for encryption. It does not prove that this pubkey belongs to a registered peer, so this path still allows arbitrary keypairs to create entries in peerConnections. Resolve the peer/account first and only call Put for registered peers.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@management/internals/shared/grpc/server.go` around lines 1175 - 1183, After
parsing the request in SyncPeerConnections (using parseRequest) avoid blindly
writing to peerConnections; instead resolve the sender to a registered
peer/account (e.g. call your existing lookup/resolve method using peerKey or the
peer public key) and if that lookup fails return an error, only calling
s.peerConnections.Put(peerKey.String(), pcm) when the peer/account is confirmed
registered; ensure you reference peerKey and pcm and return a clear
permission/unauthorized error for unregistered senders.
client/internal/peer/status.go (2)

422-465: ⚠️ Potential issue | 🟠 Major

Build the listener/router notifications before releasing d.mux.

These helpers are explicitly documented to snapshot under the lock, but this path unlocks first and only then calls notifyConnStateChange(). The same post-unlock pattern is repeated in the other updatePeer*Locked helpers below, including notifyPeerStateChangeListeners(), so listener/subscription changes can race with notification construction. Capture the closures/snapshots while locked, then unlock and invoke them.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@client/internal/peer/status.go` around lines 422 - 465, The code in
updatePeerStateLocked builds router/peer notifications after releasing d.mux
which allows races; while still holding the lock you must construct/capture any
closures and snapshots (call snapshotRouterPeersLocked,
notifyConnStateChange(receivedState.PubKey, peerState) and any
notifyPeerStateChangeListeners equivalents) into local variables, then release
d.mux and invoke those captured closures (and call d.notifier.peerListChanged)
outside the lock; update the other updatePeer*Locked helpers similarly so all
listener/router notification closures and snapshots are created under the lock
before unlocking.

506-540: ⚠️ Potential issue | 🟠 Major

Remote-meta-only changes still never notify subscribers.

This method updates RemoteLiveOnline, groups, and the effective/configured mode fields, then returns without firing either the per-peer conn-state listener or the peer-list/router notifications. A management-only liveness or config change will stay stale until some unrelated ICE/relay transition happens. Snapshot the required notifications under the lock and dispatch them after unlock in both branches.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@client/internal/peer/status.go` around lines 506 - 540, In
UpdatePeerRemoteMeta, you currently update Remote* fields for an online peer
(d.peers[pubKey]) or an offline entry (d.offlinePeers[i]) and return without
notifying subscribers; capture (snapshot) the required notifications — e.g., the
per-peer connection-state listener callback and any peer-list/router change
events — while still holding d.mux (after mutating the peer record), then
release the lock and dispatch those notifications outside the lock in both the
online branch (when st exists) and the offline branch (when
offlinePeers[i].PubKey matches); ensure you record which peer key and the
minimal changed state needed for the notifications so the post-unlock dispatch
logic can run safely.
management/server/http/handler.go (1)

143-149: ⚠️ Potential issue | 🟠 Major

Wire the peer-connections handler through the permissions manager.

This route registration still constructs the handler without any RBAC dependency, so these endpoints cannot perform the same explicit Peers read/write permission checks as peers.AddEndpoints(...). Since they expose per-peer connection state and trigger refresh actions, that authorization should be explicit here before merge.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@management/server/http/handler.go` around lines 143 - 149, Construct the
peer-connections handler via the permissions manager so the endpoints enforce
Peers RBAC: instead of registering peer_connections_http.NewHandler(...)
directly, create the handler with
peer_connections_http.NewHandler(peerConnStore, &pcAccountManagerAdapter{am:
accountManager, nmc: networkMapController}, peerConnRouter) and then wrap its
methods with the permissions middleware (or RequirePermission) to enforce Peers
read on GetPeerConnections and Peers write on PostRefresh (similar to how
peers.AddEndpoints does it); register the wrapped functions with
router.HandleFunc("/peers/{peerId}/connections",
wrappedGet).Methods("GET","OPTIONS") and
router.HandleFunc("/peers/{peerId}/connections/refresh",
wrappedPost).Methods("POST","OPTIONS").
management/internals/shared/grpc/conversion.go (1)

263-267: ⚠️ Potential issue | 🟠 Major

Preserve the retry-max sentinel when populating ConfiguredP2PRetryMaxSecs.

derefUint32OrZero collapses nil and explicit 0 here, but toPeerConfig() uses p2pRetryMaxDisabledSentinel to distinguish "unset" from "explicitly disabled backoff". That means remote peers lose the canonical wire-format value for this setting. Mirror the same sentinel encoding here and add a regression test.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@management/internals/shared/grpc/conversion.go` around lines 263 - 267, The
code currently uses derefUint32OrZero to set cfgP2pRetryMax which collapses nil
and explicit 0; instead preserve the p2pRetryMaxDisabledSentinel used by
toPeerConfig so consumers can tell "unset" from "explicit 0". Change the
assignment in conversion.go that sets cfgP2pRetryMax (the variable that
populates ConfiguredP2PRetryMaxSecs) to check c.Cfg.P2pRetryMaxSeconds: if nil
set p2pRetryMaxDisabledSentinel, otherwise use the dereferenced value (keep
explicit 0); replace derefUint32OrZero for this field or add a small helper
(e.g., derefUint32OrSentinel) and add a regression test asserting the sentinel
is preserved in the resulting ConfiguredP2PRetryMaxSecs and that explicit 0
remains 0.
🧹 Nitpick comments (9)
management/server/peer/peer_test.go (1)

145-152: ⚡ Quick win

Test name implies all Effective* fields are checked, but only EffectiveConnectionMode is covered.

The PR adds four new fields to isEqual: EffectiveConnectionMode, EffectiveRelayTimeoutSecs, EffectiveP2PTimeoutSecs, and EffectiveP2PRetryMaxSecs. This test validates only the first one; the other three would silently pass even if they were accidentally omitted from isEqual. Either extend the test to cover all four fields, or rename it to reflect its narrower scope (e.g., TestPeerSystemMeta_isEqual_ChecksEffectiveConnectionMode).

A straightforward extension using the same table-driven style as TestFlags_IsEqual:

🧪 Proposed extension to cover all `Effective*` fields
-func TestPeerSystemMeta_isEqual_ChecksEffectiveFields(t *testing.T) {
-	base := PeerSystemMeta{Hostname: "h", EffectiveConnectionMode: "p2p-dynamic"}
-	other := base
-	other.EffectiveConnectionMode = "p2p"
-	if base.isEqual(other) {
-		t.Error("isEqual should return false when EffectiveConnectionMode differs")
-	}
-}
+func TestPeerSystemMeta_isEqual_ChecksEffectiveFields(t *testing.T) {
+	tests := []struct {
+		name   string
+		mutate func(*PeerSystemMeta)
+	}{
+		{
+			name:   "EffectiveConnectionMode",
+			mutate: func(m *PeerSystemMeta) { m.EffectiveConnectionMode = "p2p" },
+		},
+		{
+			name:   "EffectiveRelayTimeoutSecs",
+			mutate: func(m *PeerSystemMeta) { m.EffectiveRelayTimeoutSecs = 99 },
+		},
+		{
+			name:   "EffectiveP2PTimeoutSecs",
+			mutate: func(m *PeerSystemMeta) { m.EffectiveP2PTimeoutSecs = 99 },
+		},
+		{
+			name:   "EffectiveP2PRetryMaxSecs",
+			mutate: func(m *PeerSystemMeta) { m.EffectiveP2PRetryMaxSecs = 99 },
+		},
+	}
+	base := PeerSystemMeta{Hostname: "h", EffectiveConnectionMode: "p2p-dynamic"}
+	for _, tt := range tests {
+		t.Run(tt.name, func(t *testing.T) {
+			other := base
+			tt.mutate(&other)
+			require.False(t, base.isEqual(other), "isEqual should return false when %s differs", tt.name)
+		})
+	}
+}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@management/server/peer/peer_test.go` around lines 145 - 152, The test
TestPeerSystemMeta_isEqual_ChecksEffectiveFields currently only asserts a
difference for EffectiveConnectionMode; update the test to either (a) rename it
to TestPeerSystemMeta_isEqual_ChecksEffectiveConnectionMode if you intend to
keep it narrow, or preferably (b) extend it to cover all four Effective* fields
by using a table-driven approach that mutates each of EffectiveConnectionMode,
EffectiveRelayTimeoutSecs, EffectiveP2PTimeoutSecs, and EffectiveP2PRetryMaxSecs
on the copied other PeerSystemMeta and asserts isEqual returns false for each;
target the PeerSystemMeta.isEqual method and add one subcase per field so
omissions in isEqual will fail the test.
client/ui/network.go (2)

125-125: 💤 Low value

Auto-refresh keeps polling networks even while the Peers tab is active.

startAutoRefresh (10s) calls updateNetworksBasedOnDisplayTab unconditionally, which repaints the (hidden) network grid behind the Peers tab and triggers daemon ListNetworks RPCs the user can't see. Combined with the new 30s peers ticker in peers_tab.go, this is wasted work. Consider gating the network refresh on tabs.Selected().Text != peersText so each tab only refreshes its own data.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@client/ui/network.go` at line 125, The auto-refresh is repainting the network
grid even when the Peers tab is active; modify startAutoRefresh (the call site
and/or its implementation that invokes updateNetworksBasedOnDisplayTab) to check
the currently selected tab before refreshing networks: gate the call to
updateNetworksBasedOnDisplayTab (and any network-related refresh work for
allGrid/overlappingGrid/exitNodeGrid) so it only runs when tabs.Selected().Text
!= peersText, leaving the peers_tab.go 30s peers ticker to handle peers updates;
ensure you reference tabs and peersText consistently so only the visible tab's
data is polled.

60-76: 💤 Low value

Defensive-only nit: tabs.Selected() can theoretically be nil.

getGridAndFilterFromTab (Line 707) dereferences tabs.Selected().Text directly. The button callbacks here only run after the user interacts with a fully-built window, so in practice Selected() is non-nil — but the Refresh handler at Line 71 already guards against nil, while selectAllBtn/deselectAllBtn do not. Worth aligning for consistency, though not blocking.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@client/ui/network.go` around lines 60 - 76, The selectAll and deselectAll
button handlers call getGridAndFilterFromTab which dereferences
tabs.Selected().Text; add the same nil-guard used in refreshBtn: check if
tabs.Selected() != nil before calling getGridAndFilterFromTab and otherwise
return (or no-op). Update the selectAllBtn and deselectAllBtn callbacks to
early-return when tabs.Selected() is nil, then call getGridAndFilterFromTab and
proceed to s.selectAllFilteredNetworks/s.deselectAllFilteredNetworks and
s.updateNetworksBasedOnDisplayTab.
client/ui/peers_tab.go (2)

219-273: 💤 Low value

Reduce cognitive complexity of buildPeerDetailText (Sonar).

Sonar reports complexity 24 vs. 20 limit. The function is essentially a sequence of independent "section writers" (basic, endpoints, full-details). Extracting each section into a small helper drops complexity and keeps the overall layout obvious.

♻️ Sketch
func buildPeerDetailText(p *proto.PeerState, full bool) string {
    var sb strings.Builder
    writeBasicPeerFields(&sb, p)
    writeEndpointFields(&sb, p)
    writeMetadataFields(&sb, p)
    if full {
        sb.WriteString("\n--- Full details ---\n")
        writeFullPeerFields(&sb, p)
    }
    return sb.String()
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@client/ui/peers_tab.go` around lines 219 - 273, Sonar flags
buildPeerDetailText as too complex; split its linear sections into small helpers
to reduce cognitive complexity. Create helper functions (e.g.,
writeBasicPeerFields(sb *strings.Builder, p *proto.PeerState),
writeEndpointFields(sb *strings.Builder, p *proto.PeerState),
writeMetadataFields(sb *strings.Builder, p *proto.PeerState), and
writeFullPeerFields(sb *strings.Builder, p *proto.PeerState)) that contain the
corresponding fmt.Fprintf blocks currently in buildPeerDetailText, then rewrite
buildPeerDetailText to instantiate the strings.Builder and call those helpers
(and only append the "\n--- Full details ---\n" header before calling
writeFullPeerFields when full==true). Ensure all existing uses of p.Get... and
orDashStr/peerLatencyStr/humanBytes remain unchanged inside the helpers.

51-52: ⚡ Quick win

Derive the call context from the parent ctx.

context.WithTimeout(context.Background(), …) ignores the ctx passed into buildPeersTabContent. If the UI/process ctx is cancelled (shutdown, window close), an in-flight Status RPC keeps running until its own 5s timeout instead of being cancelled promptly. Use the parent ctx so cancellation propagates.

-		callCtx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
+		callCtx, cancel := context.WithTimeout(ctx, 5*time.Second)
 		defer cancel()
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@client/ui/peers_tab.go` around lines 51 - 52, In buildPeersTabContent, the
temporary call context is created with context.WithTimeout(context.Background(),
5*time.Second) which ignores the parent ctx; change it to derive from the parent
(use context.WithTimeout(ctx, 5*time.Second)) so callCtx and cancel propagate
cancellation from the UI/process ctx (affecting the Status RPC and any in-flight
operations).
management/server/peer_test.go (1)

1159-1159: ⚡ Quick win

Add one assertion path for the new group-enrichment input.

Passing nil here means ToSyncResponse's new groupNamesByPeerID branch is still untested, so regressions in the peer-groups projection would slip through this fixture. A small non-nil map plus one assertion on the emitted peer metadata would close that gap.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@management/server/peer_test.go` at line 1159, The test calls ToSyncResponse
with a nil groupNamesByPeerID which leaves the new group-enrichment branch
untested; update the call in peer_test.go to pass a small non-nil map (e.g.
map[peer.ID][]string with the test peer's ID mapped to one or two group names)
into the position currently nil, then add an assertion that the returned
response's peer metadata (from response.Peers or the specific peer entry
asserted elsewhere in this fixture) contains those group names so the
groupNamesByPeerID -> peer-groups projection is exercised.
management/server/peer_connections/store_test.go (1)

38-52: ⚡ Quick win

Cover mutation-after-Put as well, not just mutation-after-Get.

This test only proves the store clones on return. It would still pass if Put kept the caller's protobuf pointer and the caller mutated it later. Add one case that mutates the original PeerConnectionMap after Put and then verifies the stored copy stayed unchanged.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@management/server/peer_connections/store_test.go` around lines 38 - 52, Add a
second assertion in TestMemoryStore_DeepCopyOnReturn to cover
mutation-after-Put: after creating the original mgmProto.PeerConnectionMap (the
value passed into s.Put in the test that uses newStoreWithClock), mutate fields
on that original protobuf (e.g., change Entries[0].RemotePubkey or
LastHandshake) immediately after calling s.Put("peerA", ...), then call
s.Get("peerA") and assert the returned copy still has the original values
("peerB" etc.). This verifies s.Put clones the input rather than retaining the
caller's pointer.
management/server/http/testing/testing_tools/channel/channel.go (1)

138-138: 💤 Low value

Consider naming the trailing nil parameters.

Both call sites (lines 138 and 267) pass seven consecutive nil arguments after serviceManager. While the compiler catches any count/type mismatch, tracking which nil maps to which parameter (peerConnections store, snapshot router, …) requires cross-referencing NewAPIHandler's signature. A comment block or a thin wrapper helper that names the optional arguments would ease future maintenance.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@management/server/http/testing/testing_tools/channel/channel.go` at line 138,
The call to NewAPIHandler passes seven consecutive nils after serviceManager
which makes it unclear which optional parameters are being set to nil; update
the apiHandler construction (the call to http2.NewAPIHandler where apiHandler,
err := ...) to either replace the trailing nils with named inline comments
indicating which parameter each nil corresponds to (e.g., /* peerConnections */,
/* snapshotRouter */, etc.), or create a small helper/wrapper that accepts the
required arguments and supplies named nil defaults before calling NewAPIHandler;
reference the NewAPIHandler signature to use the exact parameter names and
ensure the helper or inline comments map each nil to the correct parameter.
shared/management/client/grpc.go (1)

483-490: ⚡ Quick win

Snapshot callback runs inline on the Sync recv loop.

cb(req.GetNonce()) is invoked synchronously while the goroutine is reading the management Sync stream. If a future implementation of the handler does anything blocking (RPCs, locks held by other goroutines, channel sends without buffer/select), the entire Sync stream stalls — including subsequent NetworkMap updates. Consider dispatching to a goroutine here, or at least documenting that the handler must be non-blocking and bounded.

♻️ Suggested fix
-		if req := decryptedResp.GetSnapshotRequest(); req != nil {
-			c.snapMu.Lock()
-			cb := c.onSnapshotRequest
-			c.snapMu.Unlock()
-			if cb != nil {
-				cb(req.GetNonce())
-			}
-		}
+		if req := decryptedResp.GetSnapshotRequest(); req != nil {
+			c.snapMu.Lock()
+			cb := c.onSnapshotRequest
+			c.snapMu.Unlock()
+			if cb != nil {
+				go cb(req.GetNonce())
+			}
+		}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@shared/management/client/grpc.go` around lines 483 - 490, The snapshot
callback is invoked inline on the Sync receive loop (see c.onSnapshotRequest
usage and cb(req.GetNonce())), which can block the stream; change the invocation
to dispatch the callback asynchronously (e.g., spawn a goroutine) after
acquiring and releasing snapMu so the recv loop isn’t blocked, and ensure the
goroutine is safe (recover panics) or document that onSnapshotRequest must be
non-blocking if you choose not to spawn a goroutine.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@client/android/preferences.go`:
- Around line 325-331: SetConnectionMode currently stores a non-nil pointer to
an empty string so calling SetConnectionMode("") doesn’t clear the override;
change SetConnectionMode to set p.configInput.ConnectionMode = nil when mode ==
"" and otherwise set it to a pointer to the provided string (e.g., allocate a
new string variable or use &m) so GetConnectionMode’s nil-check will correctly
fall through to the server-pushed value; update the logic in the
SetConnectionMode method that manipulates p.configInput.ConnectionMode
accordingly.

In `@client/internal/conn_state_pusher.go`:
- Around line 107-112: OnSnapshotRequest currently sends nonce into
p.snapshotReq but if the channel buffer is full it drops the new nonce, causing
older nonces to be used; change the logic in connStatePusher.OnSnapshotRequest
to coalesce by non-blocking enqueue of the newest nonce: when attempting to
send, if the channel is full drain/consume all pending values from p.snapshotReq
and then send the newest nonce so the buffer holds only the latest request;
apply the same drain-and-replace pattern to the analogous handler at the other
occurrence (lines referenced in the comment) so pending snapshot requests always
resolve to the most recent nonce.
- Around line 261-275: computeDeltaFromSource currently only iterates current
SnapshotAllRemotePeers and never emits deletions, so keys present in
connStatePusher.lastPushed but missing from SnapshotAllRemotePeers are never
reconciled; update computeDeltaFromSource in connStatePusher to also iterate
lastPushed and detect pubkeys absent from the current snapshot and emit explicit
removal events (e.g., a PeerStateChangeEvent marked as deletion or a distinct
event type) for those pubkeys; use existing symbols SnapshotAllRemotePeers,
lastPushed, PeerStateChangeEvent and isMaterialChange to decide when to emit
updates vs deletions, ensuring the store receives explicit removal entries
instead of relying on TTL or full snapshots.

In `@client/internal/debouncer/debouncer.go`:
- Around line 28-53: The Debouncer can run the wrong/duplicate fn when an old
timer callback races with a re-arm; fix Trigger and Stop by using a
timer-identity guard and clearing d.fn when cancelling/executing: in Trigger
(method Debouncer.Trigger) after creating the timer store it as a local variable
(e.g. myTimer := d.timer) and capture that in the time.AfterFunc closure; inside
the closure lock d.mu and check if d.timer == myTimer — if not, return;
otherwise read f := d.fn, set d.fn = nil and set d.timer = nil before unlocking
and then call f(); in Stop (method Debouncer.Stop) under d.mu, stop the timer if
present and set d.timer = nil and d.fn = nil so an
already-fired-but-not-yet-executed callback cannot observe a retained fn.

In `@client/internal/engine.go`:
- Around line 354-357: The Stop() path currently sets e.connStatePusher = nil
without unregistering callbacks installed by Start(), which can leave closures
referencing the pusher and cause panics; update Stop() (and/or implement an
Unregister method on the connStatePusher type) to first unregister or remove all
registered conn-state callbacks (the ones installed in Start()) and only then
stop and nil out e.connStatePusher, ensuring any closures cannot dereference a
nil e.connStatePusher (refer to e.connStatePusher, Start(), and Stop() to locate
the affected logic).

In `@client/internal/peer/guard/guard.go`:
- Around line 145-147: The direct call to g.onNetworkChange() in the guard code
can panic and crash the reconnect flow; locate the invocation (inside the guard
type/method where g.onNetworkChange is executed) and wrap the callback
invocation in a panic-safe boundary by deferring a recover to swallow and log
the panic (or handle it) so it doesn't propagate — e.g., invoke onNetworkChange
inside a small closure that defers a recover, captures the panic, and logs it
via the guard's logger or an error path; ensure you leave the original call site
(g.onNetworkChange) intact but protected by this recover wrapper.

In `@client/internal/peer/handshaker.go`:
- Around line 122-129: readICEListener() only snapshots h.iceListener and
releases h.mu before Listen() invokes it, allowing RemoveICEListener() to race
and let a detached listener still be called; fix by adding a removal-generation
check: add a generation field (e.g., iceGen int64) on Handshaker, increment it
inside RemoveICEListener() under h.mu, have readICEListener() return both the
listener and the current generation (or capture the generation in the returned
closure), and in Listen() re-check the current generation under h.mu (or compare
atomically) right before invoking the callback and skip invocation if the
generation changed; update the symbols readICEListener, RemoveICEListener,
Listen, and any code touching h.iceListener to use this generation check so a
detached listener cannot be called after removal.

In `@client/internal/peer/worker_ice.go`:
- Around line 218-231: IsConnected reads w.lastKnownState under w.muxAgent but
onConnectionStateChange updates lastKnownState without taking that lock, causing
a data race; fix by acquiring the same mutex (w.muxAgent.Lock() / Unlock() or
defer Unlock()) around all writes to w.lastKnownState inside
onConnectionStateChange so readers via IsConnected and any other accessors are
synchronized with the callback updates on WorkerICE.

In `@client/server/setconfig_test.go`:
- Around line 204-214: The test is advertising connection-mode and timeout
fields as allowed no-ops in SetConfigRequest which misrepresents the RPC
surface; update the test to stop permitting these fields until the daemon
actually implements them by removing "ConnectionMode", "P2PTimeoutSeconds",
"RelayTimeoutSeconds", and "P2PRetryMaxSeconds" from the allowed-fields table in
setconfig_test.go (and the analogous entries at the later block around lines
279-289), or alternatively implement wiring in the SetConfigRequest handler to
persist/apply those fields and update the test to assert persistence; reference
the SetConfigRequest handling logic and the named fields above when making the
change.

In `@client/ui/peers_tab.go`:
- Line 83: The showFull.OnChanged handler currently calls render() directly on
the Fyne UI goroutine which blocks while render() does a synchronous conn.Status
RPC; change the OnChanged handler so it does not perform RPCs on the UI thread:
either (A) run render() asynchronously off the UI goroutine (spawn a goroutine
or use the same background worker the ticker uses) so conn.Status is executed
off the UI thread, or (B) better, cache the last FullStatus snapshot and update
only the rows/presentation when showFull changes (have showFull.OnChanged
rebuild rows from the cached FullStatus rather than calling conn.Status). Update
references to showFull, render(), conn.Status, and the FullStatus cache/usage
accordingly.

In `@management/server/http/handlers/peer_connections/handler_test.go`:
- Around line 40-46: The fake GetPeerByPubKey currently ignores the account
parameter (signature GetPeerByPubKey(_ context.Context, _, pubKey string)) and
returns any peer by pubkey, allowing cross-account leaks; update the
implementation in fakeAM.GetPeerByPubKey to enforce account scoping by using the
second parameter (accountID) — either change fakeAM.peersByKey to be keyed by
accountID -> pubKey (e.g., map[string]map[string]*nbpeer.Peer) or, if peersByKey
stores Peer objects with an AccountID field, check that p.AccountID == accountID
before returning; if there is no match return nil, errors.New("not found").

In `@management/server/http/handlers/peer_connections/handler.go`:
- Around line 161-189: The buildResponse function performs an N+1 lookup by
calling h.account.GetPeerByPubKey for each entry to set apiEntry.RemoteFQDN;
change this to fetch all peers for the account once (or otherwise build a
request-scoped map/cache) before the loop and then look up remote FQDNs from
that map when populating apiEntry.RemoteFQDN. Locate buildResponse and replace
per-entry GetPeerByPubKey calls with a single batch call (or cache population)
using account APIs, index the results by pubkey, and fall back to the existing
best-effort behavior if a pubkey is missing.

In `@management/server/peer_connections/store.go`:
- Around line 73-100: The code currently accepts a delta push as authoritative
when prev == nil; change the logic in the block around stored/prev/m so that if
!m.GetFullSnapshot() && prev == nil you do NOT store the delta into s.maps
(i.e., reject/ignore the update and return early or skip assigning
s.maps[peerPubKey]). Specifically, in the function handling peer map updates,
detect m.GetFullSnapshot() == false and prev == nil and abort the merge flow
that sets stored and s.maps[peerPubKey] (keep existing memEntry untouched) until
a full snapshot arrives; keep references to stored, prev, merged, m, and
memEntry to locate the code. Ensure updatedAt is not changed for ignored deltas.

---

Duplicate comments:
In `@client/internal/peer/ice_backoff.go`:
- Around line 161-169: SetMaxBackoff currently rebuilds the scheduler but does
not clear an existing suspension, so calling SetMaxBackoff(0) can leave a peer
suspended until the old nextRetry elapses; update SetMaxBackoff (in
iceBackoffState) to, under the same lock, when d == 0 clear any active
suspension by resetting s.suspended (and s.nextRetry to zero/empty) before
rebuilding s.bo and assigning s.maxBackoff so the peer is immediately
unsuspended when backoff is disabled.

In `@client/internal/peer/status.go`:
- Around line 422-465: The code in updatePeerStateLocked builds router/peer
notifications after releasing d.mux which allows races; while still holding the
lock you must construct/capture any closures and snapshots (call
snapshotRouterPeersLocked, notifyConnStateChange(receivedState.PubKey,
peerState) and any notifyPeerStateChangeListeners equivalents) into local
variables, then release d.mux and invoke those captured closures (and call
d.notifier.peerListChanged) outside the lock; update the other updatePeer*Locked
helpers similarly so all listener/router notification closures and snapshots are
created under the lock before unlocking.
- Around line 506-540: In UpdatePeerRemoteMeta, you currently update Remote*
fields for an online peer (d.peers[pubKey]) or an offline entry
(d.offlinePeers[i]) and return without notifying subscribers; capture (snapshot)
the required notifications — e.g., the per-peer connection-state listener
callback and any peer-list/router change events — while still holding d.mux
(after mutating the peer record), then release the lock and dispatch those
notifications outside the lock in both the online branch (when st exists) and
the offline branch (when offlinePeers[i].PubKey matches); ensure you record
which peer key and the minimal changed state needed for the notifications so the
post-unlock dispatch logic can run safely.

In `@management/internals/shared/grpc/conversion.go`:
- Around line 263-267: The code currently uses derefUint32OrZero to set
cfgP2pRetryMax which collapses nil and explicit 0; instead preserve the
p2pRetryMaxDisabledSentinel used by toPeerConfig so consumers can tell "unset"
from "explicit 0". Change the assignment in conversion.go that sets
cfgP2pRetryMax (the variable that populates ConfiguredP2PRetryMaxSecs) to check
c.Cfg.P2pRetryMaxSeconds: if nil set p2pRetryMaxDisabledSentinel, otherwise use
the dereferenced value (keep explicit 0); replace derefUint32OrZero for this
field or add a small helper (e.g., derefUint32OrSentinel) and add a regression
test asserting the sentinel is preserved in the resulting
ConfiguredP2PRetryMaxSecs and that explicit 0 remains 0.

In `@management/internals/shared/grpc/server.go`:
- Around line 487-489: The select case reading from snapshotCh currently does
"if !ok { continue }" which leaves the case permanently ready and can spin the
goroutine; update the handling in the select branch that reads "case nonce, ok
:= <-snapshotCh:" to either set snapshotCh = nil when ok is false (to disable
that select case) or return/exit the goroutine instead, ensuring you reference
and modify the snapshotCh variable in the scope of the goroutine where it is
used (adjust logic in the surrounding loop/select to handle snapshotCh being
nil).
- Around line 1175-1183: After parsing the request in SyncPeerConnections (using
parseRequest) avoid blindly writing to peerConnections; instead resolve the
sender to a registered peer/account (e.g. call your existing lookup/resolve
method using peerKey or the peer public key) and if that lookup fails return an
error, only calling s.peerConnections.Put(peerKey.String(), pcm) when the
peer/account is confirmed registered; ensure you reference peerKey and pcm and
return a clear permission/unauthorized error for unregistered senders.

In `@management/server/http/handler.go`:
- Around line 143-149: Construct the peer-connections handler via the
permissions manager so the endpoints enforce Peers RBAC: instead of registering
peer_connections_http.NewHandler(...) directly, create the handler with
peer_connections_http.NewHandler(peerConnStore, &pcAccountManagerAdapter{am:
accountManager, nmc: networkMapController}, peerConnRouter) and then wrap its
methods with the permissions middleware (or RequirePermission) to enforce Peers
read on GetPeerConnections and Peers write on PostRefresh (similar to how
peers.AddEndpoints does it); register the wrapped functions with
router.HandleFunc("/peers/{peerId}/connections",
wrappedGet).Methods("GET","OPTIONS") and
router.HandleFunc("/peers/{peerId}/connections/refresh",
wrappedPost).Methods("POST","OPTIONS").

In `@management/server/peer_connections/store.go`:
- Around line 68-70: The current seq-check in the peer map update (using s.maps,
prev, m.GetSeq(), prev.m.GetSeq()) wrongly drops authoritative full snapshots
with lower seqs; change the guard so you only return when the incoming map is
not an authoritative full snapshot — e.g. replace the condition "if prev != nil
&& m.GetSeq() > 0 && m.GetSeq() <= prev.m.GetSeq() { return }" with a check that
also requires the map to be non-full/ non-authoritative (something like "&&
!m.IsFullSnapshot()" or "&& !m.GetMeta().GetAuthoritative()") before returning;
ensure you use the actual predicate your Map type exposes to detect an
authoritative/full snapshot so full repairs are accepted even if seq is lower.

In `@management/server/peer.go`:
- Around line 1228-1235: GetPeerByPubKey currently returns a plain fmt.Errorf
when the peer.AccountID doesn't match, causing inconsistent HTTP handling;
replace that fmt.Errorf with the same typed status/account-mismatch error used
by the other peer lookup APIs so the HTTP layer treats it consistently. Locate
GetPeerByPubKey (and compare with the other peer lookup functions that call
am.Store.GetPeerByPeerPubKey) and return the same typed error value/function
(the project’s account/peer status error helper) instead of fmt.Errorf,
preserving the error message/context about pubKey and accountID.

---

Nitpick comments:
In `@client/ui/network.go`:
- Line 125: The auto-refresh is repainting the network grid even when the Peers
tab is active; modify startAutoRefresh (the call site and/or its implementation
that invokes updateNetworksBasedOnDisplayTab) to check the currently selected
tab before refreshing networks: gate the call to updateNetworksBasedOnDisplayTab
(and any network-related refresh work for allGrid/overlappingGrid/exitNodeGrid)
so it only runs when tabs.Selected().Text != peersText, leaving the peers_tab.go
30s peers ticker to handle peers updates; ensure you reference tabs and
peersText consistently so only the visible tab's data is polled.
- Around line 60-76: The selectAll and deselectAll button handlers call
getGridAndFilterFromTab which dereferences tabs.Selected().Text; add the same
nil-guard used in refreshBtn: check if tabs.Selected() != nil before calling
getGridAndFilterFromTab and otherwise return (or no-op). Update the selectAllBtn
and deselectAllBtn callbacks to early-return when tabs.Selected() is nil, then
call getGridAndFilterFromTab and proceed to
s.selectAllFilteredNetworks/s.deselectAllFilteredNetworks and
s.updateNetworksBasedOnDisplayTab.

In `@client/ui/peers_tab.go`:
- Around line 219-273: Sonar flags buildPeerDetailText as too complex; split its
linear sections into small helpers to reduce cognitive complexity. Create helper
functions (e.g., writeBasicPeerFields(sb *strings.Builder, p *proto.PeerState),
writeEndpointFields(sb *strings.Builder, p *proto.PeerState),
writeMetadataFields(sb *strings.Builder, p *proto.PeerState), and
writeFullPeerFields(sb *strings.Builder, p *proto.PeerState)) that contain the
corresponding fmt.Fprintf blocks currently in buildPeerDetailText, then rewrite
buildPeerDetailText to instantiate the strings.Builder and call those helpers
(and only append the "\n--- Full details ---\n" header before calling
writeFullPeerFields when full==true). Ensure all existing uses of p.Get... and
orDashStr/peerLatencyStr/humanBytes remain unchanged inside the helpers.
- Around line 51-52: In buildPeersTabContent, the temporary call context is
created with context.WithTimeout(context.Background(), 5*time.Second) which
ignores the parent ctx; change it to derive from the parent (use
context.WithTimeout(ctx, 5*time.Second)) so callCtx and cancel propagate
cancellation from the UI/process ctx (affecting the Status RPC and any in-flight
operations).

In `@management/server/http/testing/testing_tools/channel/channel.go`:
- Line 138: The call to NewAPIHandler passes seven consecutive nils after
serviceManager which makes it unclear which optional parameters are being set to
nil; update the apiHandler construction (the call to http2.NewAPIHandler where
apiHandler, err := ...) to either replace the trailing nils with named inline
comments indicating which parameter each nil corresponds to (e.g., /*
peerConnections */, /* snapshotRouter */, etc.), or create a small
helper/wrapper that accepts the required arguments and supplies named nil
defaults before calling NewAPIHandler; reference the NewAPIHandler signature to
use the exact parameter names and ensure the helper or inline comments map each
nil to the correct parameter.

In `@management/server/peer_connections/store_test.go`:
- Around line 38-52: Add a second assertion in TestMemoryStore_DeepCopyOnReturn
to cover mutation-after-Put: after creating the original
mgmProto.PeerConnectionMap (the value passed into s.Put in the test that uses
newStoreWithClock), mutate fields on that original protobuf (e.g., change
Entries[0].RemotePubkey or LastHandshake) immediately after calling
s.Put("peerA", ...), then call s.Get("peerA") and assert the returned copy still
has the original values ("peerB" etc.). This verifies s.Put clones the input
rather than retaining the caller's pointer.

In `@management/server/peer_test.go`:
- Line 1159: The test calls ToSyncResponse with a nil groupNamesByPeerID which
leaves the new group-enrichment branch untested; update the call in peer_test.go
to pass a small non-nil map (e.g. map[peer.ID][]string with the test peer's ID
mapped to one or two group names) into the position currently nil, then add an
assertion that the returned response's peer metadata (from response.Peers or the
specific peer entry asserted elsewhere in this fixture) contains those group
names so the groupNamesByPeerID -> peer-groups projection is exercised.

In `@management/server/peer/peer_test.go`:
- Around line 145-152: The test TestPeerSystemMeta_isEqual_ChecksEffectiveFields
currently only asserts a difference for EffectiveConnectionMode; update the test
to either (a) rename it to
TestPeerSystemMeta_isEqual_ChecksEffectiveConnectionMode if you intend to keep
it narrow, or preferably (b) extend it to cover all four Effective* fields by
using a table-driven approach that mutates each of EffectiveConnectionMode,
EffectiveRelayTimeoutSecs, EffectiveP2PTimeoutSecs, and EffectiveP2PRetryMaxSecs
on the copied other PeerSystemMeta and asserts isEqual returns false for each;
target the PeerSystemMeta.isEqual method and add one subcase per field so
omissions in isEqual will fail the test.

In `@shared/management/client/grpc.go`:
- Around line 483-490: The snapshot callback is invoked inline on the Sync
receive loop (see c.onSnapshotRequest usage and cb(req.GetNonce())), which can
block the stream; change the invocation to dispatch the callback asynchronously
(e.g., spawn a goroutine) after acquiring and releasing snapMu so the recv loop
isn’t blocked, and ensure the goroutine is safe (recover panics) or document
that onSnapshotRequest must be non-blocking if you choose not to spawn a
goroutine.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 7a14726c-0e01-4303-b6cf-2bc41685f606

📥 Commits

Reviewing files that changed from the base of the PR and between d48893f and 907bc1c.

⛔ Files ignored due to path filters (3)
  • client/proto/daemon.pb.go is excluded by !**/*.pb.go
  • shared/management/proto/management.pb.go is excluded by !**/*.pb.go
  • shared/management/proto/management_grpc.pb.go is excluded by !**/*.pb.go
📒 Files selected for processing (60)
  • client/android/client.go
  • client/android/peer_notifier.go
  • client/android/preferences.go
  • client/cmd/service.go
  • client/cmd/service_installer.go
  • client/internal/conn_mgr.go
  • client/internal/conn_mgr_test.go
  • client/internal/conn_state_pusher.go
  • client/internal/conn_state_pusher_test.go
  • client/internal/debouncer/debouncer.go
  • client/internal/engine.go
  • client/internal/engine_pusher_adapters.go
  • client/internal/engine_test.go
  • client/internal/lazyconn/inactivity/manager.go
  • client/internal/lazyconn/inactivity/manager_test.go
  • client/internal/lazyconn/support.go
  • client/internal/peer/conn.go
  • client/internal/peer/env_test.go
  • client/internal/peer/guard/guard.go
  • client/internal/peer/handshaker.go
  • client/internal/peer/ice_backoff.go
  • client/internal/peer/ice_backoff_test.go
  • client/internal/peer/status.go
  • client/internal/peer/status_test.go
  • client/internal/peer/worker_ice.go
  • client/proto/daemon.proto
  • client/server/server.go
  • client/server/setconfig_test.go
  • client/ui/client_ui.go
  • client/ui/const.go
  • client/ui/event_handler.go
  • client/ui/network.go
  • client/ui/peers_tab.go
  • management/internals/controllers/network_map/controller/controller.go
  • management/internals/server/boot.go
  • management/internals/shared/grpc/conversion.go
  • management/internals/shared/grpc/server.go
  • management/server/account/manager.go
  • management/server/account/manager_mock.go
  • management/server/http/handler.go
  • management/server/http/handlers/peer_connections/handler.go
  • management/server/http/handlers/peer_connections/handler_test.go
  • management/server/http/testing/testing_tools/channel/channel.go
  • management/server/management_proto_test.go
  • management/server/management_test.go
  • management/server/mock_server/account_mock.go
  • management/server/peer.go
  • management/server/peer/peer.go
  • management/server/peer/peer_test.go
  • management/server/peer_connections/snapshot_router.go
  • management/server/peer_connections/snapshot_router_test.go
  • management/server/peer_connections/store.go
  • management/server/peer_connections/store_test.go
  • management/server/peer_test.go
  • management/server/store/sql_store.go
  • shared/management/client/client.go
  • shared/management/client/client_test.go
  • shared/management/client/grpc.go
  • shared/management/client/mock.go
  • shared/management/proto/management.proto
💤 Files with no reviewable changes (2)
  • client/ui/const.go
  • client/ui/event_handler.go
🚧 Files skipped from review as they are similar to previous changes (4)
  • client/cmd/service_installer.go
  • client/android/client.go
  • management/server/peer/peer.go
  • client/internal/conn_state_pusher_test.go

Comment on lines +325 to +331
// SetConnectionMode stores a local override for the connection mode.
// Pass an empty string to clear the override (revert to following the
// server-pushed value).
func (p *Preferences) SetConnectionMode(mode string) {
m := mode
p.configInput.ConnectionMode = &m
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

SetConnectionMode("") does not actually clear the override.

The doc says "Pass an empty string to clear the override (revert to following the server-pushed value)", but the implementation stores &"" (non-nil pointer to an empty string). GetConnectionMode checks p.configInput.ConnectionMode != nil first, so after SetConnectionMode("") the getter returns "" and never falls through to read cfg.ConnectionMode — the local override is set-to-empty rather than cleared, which is observably different from "follow server".

🛡️ Proposed fix
 func (p *Preferences) SetConnectionMode(mode string) {
-	m := mode
-	p.configInput.ConnectionMode = &m
+	if mode == "" {
+		p.configInput.ConnectionMode = nil
+		return
+	}
+	m := mode
+	p.configInput.ConnectionMode = &m
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// SetConnectionMode stores a local override for the connection mode.
// Pass an empty string to clear the override (revert to following the
// server-pushed value).
func (p *Preferences) SetConnectionMode(mode string) {
m := mode
p.configInput.ConnectionMode = &m
}
// SetConnectionMode stores a local override for the connection mode.
// Pass an empty string to clear the override (revert to following the
// server-pushed value).
func (p *Preferences) SetConnectionMode(mode string) {
if mode == "" {
p.configInput.ConnectionMode = nil
return
}
m := mode
p.configInput.ConnectionMode = &m
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@client/android/preferences.go` around lines 325 - 331, SetConnectionMode
currently stores a non-nil pointer to an empty string so calling
SetConnectionMode("") doesn’t clear the override; change SetConnectionMode to
set p.configInput.ConnectionMode = nil when mode == "" and otherwise set it to a
pointer to the provided string (e.g., allocate a new string variable or use &m)
so GetConnectionMode’s nil-check will correctly fall through to the
server-pushed value; update the logic in the SetConnectionMode method that
manipulates p.configInput.ConnectionMode accordingly.

Comment on lines +107 to +112
func (p *connStatePusher) OnSnapshotRequest(nonce uint64) {
select {
case p.snapshotReq <- nonce:
default:
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Use the latest refresh nonce when coalescing snapshot requests.

This currently dequeues the oldest nonce and drops newer ones once the buffer fills, so a client waiting on since=<latest> can time out even though a full snapshot was pushed for an older nonce. If this path is meant to coalesce, it needs to drain pending requests and respond with the newest nonce.

Suggested fix
 	case nonce := <-p.snapshotReq:
+		for {
+			select {
+			case newer := <-p.snapshotReq:
+				nonce = newer
+			default:
+				goto flushSnapshot
+			}
+		}
+	flushSnapshot:
 		if p.source != nil {
 			p.flushFull(p.source.SnapshotAllRemotePeers(), nonce)
 		}

Also applies to: 174-177

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@client/internal/conn_state_pusher.go` around lines 107 - 112,
OnSnapshotRequest currently sends nonce into p.snapshotReq but if the channel
buffer is full it drops the new nonce, causing older nonces to be used; change
the logic in connStatePusher.OnSnapshotRequest to coalesce by non-blocking
enqueue of the newest nonce: when attempting to send, if the channel is full
drain/consume all pending values from p.snapshotReq and then send the newest
nonce so the buffer holds only the latest request; apply the same
drain-and-replace pattern to the analogous handler at the other occurrence
(lines referenced in the comment) so pending snapshot requests always resolve to
the most recent nonce.

Comment on lines +261 to +275
func (p *connStatePusher) computeDeltaFromSource() []PeerStateChangeEvent {
if p.source == nil {
return nil
}
all := p.source.SnapshotAllRemotePeers()
p.mu.Lock()
defer p.mu.Unlock()
delta := make([]PeerStateChangeEvent, 0, len(all))
for _, ev := range all {
prev, had := p.lastPushed[ev.Pubkey]
if !had || isMaterialChange(prev, ev) {
delta = append(delta, ev)
}
}
return delta
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Removed peers never get reconciled out of the pushed state.

computeDeltaFromSource only emits currently present peers, so a peer that disappeared from SnapshotAllRemotePeers() never produces any delta. Because the store's delta path only upserts entries, management keeps showing that stale peer until a manual full snapshot or TTL expiry.

This needs a removal path: either detect missing keys and force a full snapshot, or extend the delta format/store merge so deletions are explicit.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@client/internal/conn_state_pusher.go` around lines 261 - 275,
computeDeltaFromSource currently only iterates current SnapshotAllRemotePeers
and never emits deletions, so keys present in connStatePusher.lastPushed but
missing from SnapshotAllRemotePeers are never reconciled; update
computeDeltaFromSource in connStatePusher to also iterate lastPushed and detect
pubkeys absent from the current snapshot and emit explicit removal events (e.g.,
a PeerStateChangeEvent marked as deletion or a distinct event type) for those
pubkeys; use existing symbols SnapshotAllRemotePeers, lastPushed,
PeerStateChangeEvent and isMaterialChange to decide when to emit updates vs
deletions, ensuring the store receives explicit removal entries instead of
relying on TTL or full snapshots.

Comment on lines +28 to +53
func (d *Debouncer) Trigger(fn func()) {
d.mu.Lock()
defer d.mu.Unlock()
d.fn = fn
if d.timer != nil {
d.timer.Stop()
}
d.timer = time.AfterFunc(d.delay, func() {
d.mu.Lock()
f := d.fn
d.mu.Unlock()
if f != nil {
f()
}
})
}

// Stop cancels any pending fn. Safe to call multiple times.
func (d *Debouncer) Stop() {
d.mu.Lock()
defer d.mu.Unlock()
if d.timer != nil {
d.timer.Stop()
d.timer = nil
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Potential double-execution: in-flight AfterFunc callback can run the latest fn after a fast Trigger re-arm.

Sequence:

  1. Trigger(fnA) schedules timer t1 and stores d.fn=fnA.
  2. t1 fires; the callback goroutine starts but is preempted before acquiring d.mu.
  3. Trigger(fnB) acquires d.mu, sets d.fn=fnB, calls t1.Stop() (returns false, already fired), schedules t2, releases.
  4. The t1 callback finally acquires d.mu, reads f=fnB, calls fnB().
  5. t2 later fires and runs fnB() again.

Result: fnB runs twice (and fnA is silently dropped — this is the documented behaviour, but the duplicate run of fnB is not). For SyncMeta this is just a redundant RPC, but it can mask convergence bugs and breaks the documented "last-write-wins" contract. Similarly, Stop() cannot prevent an already-fired callback from executing because it doesn't null d.fn.

♻️ Proposed fix using a timer-identity guard
 func (d *Debouncer) Trigger(fn func()) {
 	d.mu.Lock()
 	defer d.mu.Unlock()
 	d.fn = fn
 	if d.timer != nil {
 		d.timer.Stop()
 	}
-	d.timer = time.AfterFunc(d.delay, func() {
-		d.mu.Lock()
-		f := d.fn
-		d.mu.Unlock()
-		if f != nil {
-			f()
-		}
-	})
+	var t *time.Timer
+	t = time.AfterFunc(d.delay, func() {
+		d.mu.Lock()
+		// Stale callback from a timer that was Stop()'d/replaced after firing.
+		if d.timer != t {
+			d.mu.Unlock()
+			return
+		}
+		f := d.fn
+		d.fn = nil
+		d.timer = nil
+		d.mu.Unlock()
+		if f != nil {
+			f()
+		}
+	})
+	d.timer = t
 }

 func (d *Debouncer) Stop() {
 	d.mu.Lock()
 	defer d.mu.Unlock()
 	if d.timer != nil {
 		d.timer.Stop()
 		d.timer = nil
 	}
+	d.fn = nil
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@client/internal/debouncer/debouncer.go` around lines 28 - 53, The Debouncer
can run the wrong/duplicate fn when an old timer callback races with a re-arm;
fix Trigger and Stop by using a timer-identity guard and clearing d.fn when
cancelling/executing: in Trigger (method Debouncer.Trigger) after creating the
timer store it as a local variable (e.g. myTimer := d.timer) and capture that in
the time.AfterFunc closure; inside the closure lock d.mu and check if d.timer ==
myTimer — if not, return; otherwise read f := d.fn, set d.fn = nil and set
d.timer = nil before unlocking and then call f(); in Stop (method
Debouncer.Stop) under d.mu, stop the timer if present and set d.timer = nil and
d.fn = nil so an already-fired-but-not-yet-executed callback cannot observe a
retained fn.

Comment thread client/internal/engine.go
Comment on lines +354 to +357
if e.connStatePusher != nil {
e.connStatePusher.Stop()
e.connStatePusher = nil
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Clear the registered conn-state callbacks before nil-ing the pusher.

Start() installs callbacks that dereference e.connStatePusher, but Stop() tears the pusher down by setting the field to nil without unregistering those callbacks. A late status update or snapshot request during teardown can still hit those closures and panic on a nil receiver.

Suggested fix
 	if e.connStatePusher != nil {
+		e.statusRecorder.SetConnStateListener(nil)
+		e.mgmClient.SetSnapshotRequestHandler(nil)
 		e.connStatePusher.Stop()
 		e.connStatePusher = nil
 	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if e.connStatePusher != nil {
e.connStatePusher.Stop()
e.connStatePusher = nil
}
if e.connStatePusher != nil {
e.statusRecorder.SetConnStateListener(nil)
e.mgmClient.SetSnapshotRequestHandler(nil)
e.connStatePusher.Stop()
e.connStatePusher = nil
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@client/internal/engine.go` around lines 354 - 357, The Stop() path currently
sets e.connStatePusher = nil without unregistering callbacks installed by
Start(), which can leave closures referencing the pusher and cause panics;
update Stop() (and/or implement an Unregister method on the connStatePusher
type) to first unregister or remove all registered conn-state callbacks (the
ones installed in Start()) and only then stop and nil out e.connStatePusher,
ensuring any closures cannot dereference a nil e.connStatePusher (refer to
e.connStatePusher, Start(), and Stop() to locate the affected logic).

Comment on lines +204 to +214
// Phase 3.7i Connection-Mode fields. Currently in the proto so
// daemons can advertise them via GetConfig, but SetConfig does
// NOT apply them at runtime — they're only persisted via
// `netbird service install/reconfigure --connection-mode/...`
// (writes the active profile file directly; daemon picks up on
// next start). Wiring them through SetConfig is a follow-up
// task. Listed here so the structural test passes.
"ConnectionMode": true,
"P2PTimeoutSeconds": true,
"RelayTimeoutSeconds": true,
"P2PRetryMaxSeconds": true,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

SetConfigRequest now advertises fields that this suite explicitly allows the daemon to ignore.

These comments/documentation lock in a silent no-op for connection_mode and the timeout fields: the RPC accepts them, the structural test passes, and callers still get no runtime effect. That is a misleading public contract for SetConfig. Either wire these fields through SetConfig now and assert persistence here, or keep them out of the request surface until the handler supports them.

Also applies to: 279-289

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@client/server/setconfig_test.go` around lines 204 - 214, The test is
advertising connection-mode and timeout fields as allowed no-ops in
SetConfigRequest which misrepresents the RPC surface; update the test to stop
permitting these fields until the daemon actually implements them by removing
"ConnectionMode", "P2PTimeoutSeconds", "RelayTimeoutSeconds", and
"P2PRetryMaxSeconds" from the allowed-fields table in setconfig_test.go (and the
analogous entries at the later block around lines 279-289), or alternatively
implement wiring in the SetConfigRequest handler to persist/apply those fields
and update the test to assert persistence; reference the SetConfigRequest
handling logic and the named fields above when making the change.

Comment thread client/ui/peers_tab.go
})
}

showFull.OnChanged = func(_ bool) { render() }
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

showFull toggle blocks the UI thread on a gRPC call.

OnChanged is invoked on the Fyne UI goroutine. render() performs a synchronous conn.Status RPC with a 5-second timeout (Lines 51‑57) before returning, so a slow/unhealthy daemon can freeze the entire window for up to 5 s every time the user toggles the checkbox. The periodic ticker correctly invokes render() from a background goroutine; the OnChanged path should follow the same pattern (or, since toggling showFull only changes presentation, just re-render rows from the last-known status snapshot without an RPC round-trip).

🛠 Minimal fix: run render() off the UI thread
-	showFull.OnChanged = func(_ bool) { render() }
+	showFull.OnChanged = func(_ bool) { go render() }

A nicer follow-up is to cache the last FullStatus and let the toggle just rebuild rows locally, avoiding the daemon round-trip entirely.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
showFull.OnChanged = func(_ bool) { render() }
showFull.OnChanged = func(_ bool) { go render() }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@client/ui/peers_tab.go` at line 83, The showFull.OnChanged handler currently
calls render() directly on the Fyne UI goroutine which blocks while render()
does a synchronous conn.Status RPC; change the OnChanged handler so it does not
perform RPCs on the UI thread: either (A) run render() asynchronously off the UI
goroutine (spawn a goroutine or use the same background worker the ticker uses)
so conn.Status is executed off the UI thread, or (B) better, cache the last
FullStatus snapshot and update only the rows/presentation when showFull changes
(have showFull.OnChanged rebuild rows from the cached FullStatus rather than
calling conn.Status). Update references to showFull, render(), conn.Status, and
the FullStatus cache/usage accordingly.

Comment on lines +40 to +46
func (a *fakeAM) GetPeerByPubKey(_ context.Context, _, pubKey string) (*nbpeer.Peer, error) {
p, ok := a.peersByKey[pubKey]
if !ok {
return nil, errors.New("not found")
}
return p, nil
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Scope the fake pubkey lookup by accountID, or the RBAC test can’t catch cross-account enrichment leaks.

Right now this helper will return any peer keyed by pubkey, regardless of the caller's account. That makes the 200-path test pass even if the handler accidentally enriches PeerConnectionMap entries with metadata from another account. Given that RBAC on these routes is a stated review focus, the fake should enforce account membership too.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@management/server/http/handlers/peer_connections/handler_test.go` around
lines 40 - 46, The fake GetPeerByPubKey currently ignores the account parameter
(signature GetPeerByPubKey(_ context.Context, _, pubKey string)) and returns any
peer by pubkey, allowing cross-account leaks; update the implementation in
fakeAM.GetPeerByPubKey to enforce account scoping by using the second parameter
(accountID) — either change fakeAM.peersByKey to be keyed by accountID -> pubKey
(e.g., map[string]map[string]*nbpeer.Peer) or, if peersByKey stores Peer objects
with an AccountID field, check that p.AccountID == accountID before returning;
if there is no match return nil, errors.New("not found").

Comment on lines +161 to +189
func (h *Handler) buildResponse(ctx context.Context, accountID, dnsDomain, pubkey string, m *mgmProto.PeerConnectionMap) apiResponse {
resp := apiResponse{
PeerPubkey: pubkey,
Seq: m.GetSeq(),
FullSnapshot: m.GetFullSnapshot(),
InResponseTo: m.GetInResponseToNonce(),
Entries: make([]apiEntry, 0, len(m.GetEntries())),
}
for _, e := range m.GetEntries() {
entry := apiEntry{
RemotePubkey: e.GetRemotePubkey(),
ConnType: connTypeToStr(e.GetConnType()),
LatencyMs: e.GetLatencyMs(),
Endpoint: e.GetEndpoint(),
RelayServer: e.GetRelayServer(),
RxBytes: e.GetRxBytes(),
TxBytes: e.GetTxBytes(),
}
if hs := e.GetLastHandshake(); hs != nil && hs.IsValid() {
entry.LastHandshake = hs.AsTime().Format(time.RFC3339)
}
// Enrich remote_fqdn via account-peer lookup (best-effort).
if rPeer, err := h.account.GetPeerByPubKey(ctx, accountID, e.GetRemotePubkey()); err == nil && rPeer != nil {
entry.RemoteFQDN = rPeer.FQDN(dnsDomain)
}
resp.Entries = append(resp.Entries, entry)
}
return resp
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

N+1 peer lookup in buildResponse.

For every entry in the connection map, h.account.GetPeerByPubKey is called once to enrich RemoteFQDN. A peer connected to N other peers triggers N lookups per HTTP request, and this endpoint is intended to be polled (the since= long-poll path can issue one such request every 200 ms while waiting). On accounts with many peers and/or many simultaneous viewers this becomes a noticeable backend hot path.

Consider one of:

  • Batch-fetching all peers for the account once and indexing them by pubkey before the loop.
  • Adding a small in-memory pubkey→FQDN cache scoped to the request.
  • Pushing the FQDN resolution into the store layer so the snapshot already carries enriched names.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@management/server/http/handlers/peer_connections/handler.go` around lines 161
- 189, The buildResponse function performs an N+1 lookup by calling
h.account.GetPeerByPubKey for each entry to set apiEntry.RemoteFQDN; change this
to fetch all peers for the account once (or otherwise build a request-scoped
map/cache) before the loop and then look up remote FQDNs from that map when
populating apiEntry.RemoteFQDN. Locate buildResponse and replace per-entry
GetPeerByPubKey calls with a single batch call (or cache population) using
account APIs, index the results by pubkey, and fall back to the existing
best-effort behavior if a pubkey is missing.

Comment on lines +73 to +100
stored := proto.Clone(m).(*mgmProto.PeerConnectionMap)
if !m.GetFullSnapshot() && prev != nil {
merged := proto.Clone(prev.m).(*mgmProto.PeerConnectionMap)
merged.Seq = m.GetSeq()
merged.FullSnapshot = false
// Keep the latest non-zero refresh-nonce. A snapshot pushed in
// response to nonce N must remain reachable via GET ?since=N
// even when a regular delta with InResponseToNonce=0 arrives
// shortly after; otherwise the refresh polling client gives up
// and falls back to the next sync interval (~60 s gap).
if m.GetInResponseToNonce() > merged.GetInResponseToNonce() {
merged.InResponseToNonce = m.GetInResponseToNonce()
}
byKey := make(map[string]int, len(merged.Entries))
for i, e := range merged.Entries {
byKey[e.GetRemotePubkey()] = i
}
for _, ne := range stored.Entries {
if idx, ok := byKey[ne.GetRemotePubkey()]; ok {
merged.Entries[idx] = ne
} else {
merged.Entries = append(merged.Entries, ne)
byKey[ne.GetRemotePubkey()] = len(merged.Entries) - 1
}
}
stored = merged
}
s.maps[peerPubKey] = &memEntry{m: stored, updatedAt: s.clock.Now()}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Do not treat a delta as authoritative when there is no cached baseline.

If the cache was evicted or the server restarted, prev is nil and a delta-only push gets stored as the whole map. From that point, reads return only the recently changed peers until some later full snapshot repairs the cache.

At minimum, reject delta updates when prev == nil and wait for a full snapshot before serving data for that peer.

Suggested fix
 	stored := proto.Clone(m).(*mgmProto.PeerConnectionMap)
+	if prev == nil && !m.GetFullSnapshot() {
+		return
+	}
 	if !m.GetFullSnapshot() && prev != nil {
 		merged := proto.Clone(prev.m).(*mgmProto.PeerConnectionMap)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
stored := proto.Clone(m).(*mgmProto.PeerConnectionMap)
if !m.GetFullSnapshot() && prev != nil {
merged := proto.Clone(prev.m).(*mgmProto.PeerConnectionMap)
merged.Seq = m.GetSeq()
merged.FullSnapshot = false
// Keep the latest non-zero refresh-nonce. A snapshot pushed in
// response to nonce N must remain reachable via GET ?since=N
// even when a regular delta with InResponseToNonce=0 arrives
// shortly after; otherwise the refresh polling client gives up
// and falls back to the next sync interval (~60 s gap).
if m.GetInResponseToNonce() > merged.GetInResponseToNonce() {
merged.InResponseToNonce = m.GetInResponseToNonce()
}
byKey := make(map[string]int, len(merged.Entries))
for i, e := range merged.Entries {
byKey[e.GetRemotePubkey()] = i
}
for _, ne := range stored.Entries {
if idx, ok := byKey[ne.GetRemotePubkey()]; ok {
merged.Entries[idx] = ne
} else {
merged.Entries = append(merged.Entries, ne)
byKey[ne.GetRemotePubkey()] = len(merged.Entries) - 1
}
}
stored = merged
}
s.maps[peerPubKey] = &memEntry{m: stored, updatedAt: s.clock.Now()}
stored := proto.Clone(m).(*mgmProto.PeerConnectionMap)
if prev == nil && !m.GetFullSnapshot() {
return
}
if !m.GetFullSnapshot() && prev != nil {
merged := proto.Clone(prev.m).(*mgmProto.PeerConnectionMap)
merged.Seq = m.GetSeq()
merged.FullSnapshot = false
// Keep the latest non-zero refresh-nonce. A snapshot pushed in
// response to nonce N must remain reachable via GET ?since=N
// even when a regular delta with InResponseToNonce=0 arrives
// shortly after; otherwise the refresh polling client gives up
// and falls back to the next sync interval (~60 s gap).
if m.GetInResponseToNonce() > merged.GetInResponseToNonce() {
merged.InResponseToNonce = m.GetInResponseToNonce()
}
byKey := make(map[string]int, len(merged.Entries))
for i, e := range merged.Entries {
byKey[e.GetRemotePubkey()] = i
}
for _, ne := range stored.Entries {
if idx, ok := byKey[ne.GetRemotePubkey()]; ok {
merged.Entries[idx] = ne
} else {
merged.Entries = append(merged.Entries, ne)
byKey[ne.GetRemotePubkey()] = len(merged.Entries) - 1
}
}
stored = merged
}
s.maps[peerPubKey] = &memEntry{m: stored, updatedAt: s.clock.Now()}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@management/server/peer_connections/store.go` around lines 73 - 100, The code
currently accepts a delta push as authoritative when prev == nil; change the
logic in the block around stored/prev/m so that if !m.GetFullSnapshot() && prev
== nil you do NOT store the delta into s.maps (i.e., reject/ignore the update
and return early or skip assigning s.maps[peerPubKey]). Specifically, in the
function handling peer map updates, detect m.GetFullSnapshot() == false and prev
== nil and abort the merge flow that sets stored and s.maps[peerPubKey] (keep
existing memEntry untouched) until a full snapshot arrives; keep references to
stored, prev, merged, m, and memEntry to locate the code. Ensure updatedAt is
not changed for ignored deltas.

@MichaelUray MichaelUray force-pushed the pr/c-phase3.7i-visibility branch from 907bc1c to 3e758d1 Compare May 6, 2026 06:53
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
management/server/peer/peer.go (1)

199-217: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

isEmpty() not updated for the new Effective* fields — inconsistent with isEqual.

isEqual (Line 192‑195) now compares the four Phase 3.7i Effective* fields, but isEmpty does not include them. Because UpdateMetaIfNew early-returns when isEmpty() is true (Line 257‑259), a hypothetical update that sets only the Effective* fields would be silently dropped. In practice peers always populate Hostname/GoOS/etc., so this is unlikely to matter today, but the asymmetry is a footgun for future call sites that might construct a meta with only runtime fields.

Suggested fix
 func (p PeerSystemMeta) isEmpty() bool {
 	return p.Hostname == "" &&
 		...
 		p.Environment.Cloud == "" &&
 		p.Environment.Platform == "" &&
-		len(p.Files) == 0
+		len(p.Files) == 0 &&
+		p.EffectiveConnectionMode == "" &&
+		p.EffectiveRelayTimeoutSecs == 0 &&
+		p.EffectiveP2PTimeoutSecs == 0 &&
+		p.EffectiveP2PRetryMaxSecs == 0
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@management/server/peer/peer.go` around lines 199 - 217, The isEmpty method on
PeerSystemMeta is missing the four Effective* runtime fields that isEqual now
compares, causing updates that set only those Effective* fields to be treated as
empty; update PeerSystemMeta.isEmpty to include the same Effective* fields
checked by isEqual (i.e., add the four Effective... fields from the struct into
the emptiness check) so that UpdateMetaIfNew's early-return behavior matches
isEqual's comparison logic.
♻️ Duplicate comments (20)
client/internal/lazyconn/inactivity/manager.go (1)

87-95: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

NewManagerWithTwoTimers still bypasses threshold validation.

The Phase-1 path runs validateInactivityThreshold (rejecting sub-minute values), but NewManagerWithTwoTimers accepts any non-zero iceTimeout/relayTimeout directly, so a misconfigured caller that passes e.g. 30*time.Second (or a negative duration) will produce a manager that tears down on the first tick or behaves nonsensically. Consider routing both timeouts through the same minimum-bound check (treating 0 as "disabled" specially) so behavior is consistent across constructors.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@client/internal/lazyconn/inactivity/manager.go` around lines 87 - 95,
NewManagerWithTwoTimers currently accepts any non-zero durations and bypasses
validateInactivityThreshold; update it to validate both iceTimeout and
relayTimeout the same way as the Phase-1 path: for each timeout, if timeout != 0
call validateInactivityThreshold(timeout) (treat 0 as "disabled" and skip
validation), and if validation fails handle it consistently (e.g. log the error
and return nil to reject the constructor) before calling newManager; reference
NewManagerWithTwoTimers, validateInactivityThreshold, and newManager when making
the change.
client/internal/peer/worker_ice.go (1)

545-563: ⚠️ Potential issue | 🟠 Major

Synchronize lastKnownState writes with muxAgent.

IsConnected() reads lastKnownState under w.muxAgent, but this callback still writes it without the same lock on Lines 550 and 561-562. That leaves a real data race and lets callers make network-change decisions from unsynchronized state.

Suggested fix
 	case ice.ConnectionStateConnected:
-		w.lastKnownState = ice.ConnectionStateConnected
+		w.muxAgent.Lock()
+		w.lastKnownState = ice.ConnectionStateConnected
+		w.muxAgent.Unlock()
 		w.logSuccessfulPaths(agent)
 		w.conn.onICEConnected()
 		return
 	case ice.ConnectionStateFailed, ice.ConnectionStateDisconnected, ice.ConnectionStateClosed:
 		sessionChanged := w.closeAgent(agent, dialerCancel)

-		if w.lastKnownState == ice.ConnectionStateConnected {
-			w.lastKnownState = ice.ConnectionStateDisconnected
+		w.muxAgent.Lock()
+		wasConnected := w.lastKnownState == ice.ConnectionStateConnected
+		if wasConnected {
+			w.lastKnownState = ice.ConnectionStateDisconnected
+		}
+		w.muxAgent.Unlock()
+		if wasConnected {
 			w.conn.onICEStateDisconnected(sessionChanged)
 		}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@client/internal/peer/worker_ice.go` around lines 545 - 563, The
onConnectionStateChange callback in WorkerICE writes w.lastKnownState without
holding w.muxAgent, creating a race with IsConnected which reads under that
mutex; update the callback in the onConnectionStateChange method (the Connected
and the Failed/Disconnected/Closed branches where lastKnownState is set) to
acquire and release w.muxAgent around any writes to w.lastKnownState (same lock
used by IsConnected), keeping other logic (logSuccessfulPaths, closeAgent,
w.conn.onICEStateDisconnected/onICEConnected) unchanged and only surrounding the
assignments to lastKnownState with the mutex to ensure synchronized access.
management/server/peer.go (1)

1228-1237: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Return a typed peer/account error instead of fmt.Errorf.

Other peer-lookup paths in this file (e.g. Line 320, 396, 422, 449) return status.NewPeerNotPartOfAccountError() for the same account-mismatch case. The plain fmt.Errorf here will be classified as Internal by the HTTP layer instead of the proper NotFound/account-scoped error, which is inconsistent and will produce 5xx responses for what is really a not-found/forbidden situation — particularly noticeable since the new peer_connections HTTP handler relies on this method.

Suggested fix
 func (am *DefaultAccountManager) GetPeerByPubKey(ctx context.Context, accountID, pubKey string) (*nbpeer.Peer, error) {
 	p, err := am.Store.GetPeerByPeerPubKey(ctx, store.LockingStrengthNone, pubKey)
 	if err != nil {
 		return nil, err
 	}
 	if p.AccountID != accountID {
-		return nil, fmt.Errorf("peer with pubkey %s not in account %s", pubKey, accountID)
+		return nil, status.NewPeerNotPartOfAccountError()
 	}
 	return p, nil
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@management/server/peer.go` around lines 1228 - 1237, Replace the generic
fmt.Errorf in GetPeerByPubKey with the typed account/peer error used elsewhere
so the HTTP layer classifies it correctly; specifically, in
DefaultAccountManager.GetPeerByPubKey, when p.AccountID != accountID return
status.NewPeerNotPartOfAccountError(accountID, pubKey) (matching other callers
like the ones at lines ~320/396/422/449) instead of fmt.Errorf, ensuring the
function returns the standardized error type.
client/ui/peers_tab.go (1)

83-83: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

showFull toggle still performs a synchronous gRPC on the UI goroutine.

Same concern as a previous review: OnChanged runs on Fyne's UI goroutine, and render() performs a conn.Status RPC with a 5 s timeout (Lines 51‑57). A slow daemon will freeze the window for up to 5 s on every toggle. The same applies to the footer Refresh button in network.go (Line 70‑76), since peersBundle.Refresh is render itself; the periodic ticker already runs render() off-thread.

Minimal fix
-	showFull.OnChanged = func(_ bool) { render() }
+	showFull.OnChanged = func(_ bool) { go render() }

A nicer follow-up is to cache the last FullStatus and let the toggle just rebuild rows locally without a daemon round-trip.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@client/ui/peers_tab.go` at line 83, The showFull.OnChanged handler currently
calls render() on Fyne's UI goroutine which blocks because render() does a
conn.Status RPC with timeout; change the handler to avoid any synchronous RPC on
the UI thread by either (a) spawning a goroutine to perform the RPC/fetch and
then schedule UI updates back on the UI goroutine (e.g., using
fyne.CurrentApp().Driver().RunOnMain or equivalent) or (b) implement a cached
FullStatus in peersBundle and have showFull.OnChanged only rebuild the rows from
that cached data so no daemon round-trip occurs; also update peersBundle.Refresh
and the Refresh button in network.go to follow the same pattern (do network
fetches off-thread and only update UI on the main goroutine).
client/internal/peer/ice_backoff.go (1)

161-169: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

SetMaxBackoff(0) still does not clear an in-flight suspension.

When the management server pushes a value of 0 to disable backoff, maxBackoff/bo are swapped but suspended/nextRetry from a prior markFailure() remain. IsSuspended() will then keep honoring the stale timer until it naturally expires, so the live "disable backoff" push has no immediate effect on an already-suspended peer.

Suggested fix
 func (s *iceBackoffState) SetMaxBackoff(d time.Duration) {
 	s.mu.Lock()
 	defer s.mu.Unlock()
 	if d == s.maxBackoff {
 		return
 	}
 	s.maxBackoff = d
 	s.bo = buildBackoff(d)
+	if d == 0 {
+		s.suspended = false
+		s.nextRetry = time.Time{}
+	}
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@client/internal/peer/ice_backoff.go` around lines 161 - 169, SetMaxBackoff on
iceBackoffState swaps maxBackoff/bo but doesn't clear an already-set suspension,
so calling SetMaxBackoff(0) doesn't immediately unsuspend peers; change
SetMaxBackoff to, under the same mu.Lock(), detect when d == 0 and clear the
suspended state and any pending retry by setting s.suspended = false and
resetting s.nextRetry to the zero time (or otherwise cancelling any in-flight
timer), so IsSuspended() returns false immediately after disabling backoff;
ensure this logic lives alongside the existing maxBackoff/bo assignment and uses
iceBackoffState's mu for thread safety, referencing SetMaxBackoff, suspended,
nextRetry, markFailure, IsSuspended, bo and maxBackoff.
client/internal/peer/handshaker.go (1)

116-153: ⚠️ Potential issue | 🟠 Major | ⚖️ Poor tradeoff

RemoveICEListener can still dispatch one stale ICE callback.

readICEListener() (Line 125‑129) only snapshots the function pointer under h.mu and releases the mutex before Listen() invokes it (Lines 148‑153 / 173‑178). If RemoveICEListener() runs in the gap between snapshot and call, the offer/answer is still delivered to the just-detached listener. This means DetachICE / ICE-backoff is not a hard stop — at least one extra ICE callback can fire after detachment.

Either hold the mutex across dispatch (acceptable since the listener should be lightweight, e.g. a non-blocking enqueue), or add a generation counter that the dispatcher re-checks under lock immediately before calling.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@client/internal/peer/handshaker.go` around lines 116 - 153, The
snapshot-and-call race lets RemoveICEListener detach but still have one last
callback delivered; fix by ensuring the mutex protects the actual dispatch: in
Listen, acquire h.mu, read h.iceListener into a local, and invoke that listener
while still holding h.mu (or alternatively implement a generation counter: add
an h.iceGen incremented in RemoveICEListener, have readICEListener return both
listener and gen, then re-lock and verify gen unchanged before calling). Update
RemoveICEListener, readICEListener, and Listen to use the chosen approach
(referencing h.mu, h.iceListener, RemoveICEListener, readICEListener, Listen) so
no ICE callback can be invoked after detachment.
client/android/preferences.go (2)

325-331: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Clear the connection-mode override instead of storing &"".

SetConnectionMode("") still leaves p.configInput.ConnectionMode non-nil, so GetConnectionMode() never falls back to the server-pushed value despite the method comment promising that behavior.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@client/android/preferences.go` around lines 325 - 331, SetConnectionMode
currently assigns a pointer to an empty string which keeps
p.configInput.ConnectionMode non-nil; change it so when mode == "" it clears the
override by setting p.configInput.ConnectionMode = nil, otherwise store a
pointer to the provided value (use the existing Preferences.SetConnectionMode
and p.configInput.ConnectionMode symbols to locate the fix and ensure
GetConnectionMode will then fall back to the server-pushed value).

347-352: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don't cast signed timeout values straight to uint32.

These setters still treat 0 as an explicit override instead of “clear”, and negative inputs wrap into huge timeout values. Guard secs <= 0 by clearing the pointer, and clamp oversized values before converting.

Also applies to: 368-373, 388-393

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@client/android/preferences.go` around lines 347 - 352, SetRelayTimeoutSeconds
currently casts signed secs directly to uint32 which lets negative values wrap
and treats 0 as an explicit override; change it to treat secs <= 0 as “clear” by
setting p.configInput.RelayTimeoutSeconds = nil, otherwise clamp secs to a safe
maximum (e.g. uint32 max or your configured max constant) before converting and
storing a pointer to the uint32 value; apply the same guard-and-clamp pattern to
the other timeout setter functions in this file (the two other timeout setters
nearby) so none perform direct signed-to-uint32 casts or treat 0 as an override.
client/internal/peer/status.go (2)

453-465: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Build the notification snapshots while d.mux is still held.

These updatePeer*Locked paths still unlock first and then call notifyConnStateChange() / notifyPeerStateChangeListeners(), even though both helpers snapshot listener/subscriber state and are documented to run under d.mux. That reintroduces the race with concurrent listener or subscription changes.

Also applies to: 664-671, 717-724, 769-772, 820-823


506-544: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Remote-meta-only changes still never trigger an immediate refresh.

UpdatePeerRemoteMeta() updates RemoteLiveOnline, connection-mode fields, timeout fields, and groups, then returns from both branches without notifying peer-state subscribers or refreshing the peer list. A management-only liveness/mode flip will stay stale until some unrelated transport update or the next poll.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@client/internal/peer/status.go` around lines 506 - 544, UpdatePeerRemoteMeta
currently mutates Remote* fields on d.peers and d.offlinePeers and returns
without signaling any change; update the method to invoke the same immediate
refresh/notification path used when transport/state changes occur so remote-only
changes propagate instantly. After you set the Remote* fields in both the online
branch (where you modify st and assign d.peers[pubKey]) and the offline branch
(where you modify d.offlinePeers[i]), call the existing peer-state
notification/refresh helper the codebase uses for transport updates (the same
function UpdatePeerTransport or the refresh/notify function that other updates
call) passing the pubKey (or the updated peer entry) so subscribers are notified
and the peer list is refreshed immediately. Ensure the call happens before
returning nil in both branches.
client/internal/peer/conn.go (1)

202-212: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Preserve the explicit “disable ICE backoff” state across reopen.

SetIceBackoffMax(0) persists 0 into conn.config.P2pRetryMaxSeconds, but Open() interprets 0 as DefaultP2PRetryMax. After the next close/open, a peer that explicitly disabled backoff comes back with the default cap instead.

Also applies to: 1166-1172

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@client/internal/peer/conn.go` around lines 202 - 212, Open() currently treats
conn.config.P2pRetryMaxSeconds == 0 as DefaultP2PRetryMax which overrides an
explicit SetIceBackoffMax(0) disable; change the logic so a stored 0 is
preserved as “disabled” instead of being replaced by DefaultP2PRetryMax: compute
backoffCap := time.Duration(conn.config.P2pRetryMaxSeconds) * time.Second and
remove the branch that substitutes DefaultP2PRetryMax when backoffCap == 0, then
pass that backoffCap unchanged into newIceBackoff(...) or
iceBackoff.SetMaxBackoff(...); apply the same change to the other occurrence
around the SetMaxBackoff/newIceBackoff calls at 1166-1172 so an explicit 0
remains disabled across reopen.
management/internals/shared/grpc/conversion.go (1)

263-289: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Preserve the disabled sentinel in ConfiguredP2PRetryMaxSecs.

toPeerConfig() already encodes an explicit 0 retry cap as p2pRetryMaxDisabledSentinel, but Line 267 collapses nil and explicit zero with derefUint32OrZero. That makes ConfiguredP2PRetryMaxSecs lose the “user explicitly disabled backoff” marker and diverge from the wire format used elsewhere.

Suggested fix
 	if c.Cfg != nil {
 		cfgConnMode = derefStringOrEmpty(c.Cfg.ConnectionMode)
 		cfgRelayTO = derefUint32OrZero(c.Cfg.RelayTimeoutSeconds)
 		cfgP2pTO = derefUint32OrZero(c.Cfg.P2pTimeoutSeconds)
-		cfgP2pRetryMax = derefUint32OrZero(c.Cfg.P2pRetryMaxSeconds)
+		if c.Cfg.P2pRetryMaxSeconds != nil {
+			if *c.Cfg.P2pRetryMaxSeconds == 0 {
+				cfgP2pRetryMax = p2pRetryMaxDisabledSentinel
+			} else {
+				cfgP2pRetryMax = *c.Cfg.P2pRetryMaxSeconds
+			}
+		}
 	}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@management/internals/shared/grpc/conversion.go` around lines 263 - 289, The
code in toPeerConfig collapses a nil vs explicit-zero P2P retry cap by using
derefUint32OrZero for cfgP2pRetryMax, losing the p2pRetryMaxDisabledSentinel
marker; update the assignment so ConfiguredP2PRetryMaxSecs preserves the
sentinel: when c.Cfg is nil keep cfgP2pRetryMax as zero/empty as before, but
when c.Cfg.P2pRetryMaxSeconds is non-nil set cfgP2pRetryMax to
derefUint32OrZero(c.Cfg.P2pRetryMaxSeconds) and when it is explicitly zero set
it to p2pRetryMaxDisabledSentinel (i.e., check c.Cfg.P2pRetryMaxSeconds for nil
vs zero before assigning ConfiguredP2PRetryMaxSecs) so toPeerConfig retains the
explicit-disabled wire marker.
client/internal/conn_mgr.go (1)

286-324: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Restart the lazy manager when only the inactivity thresholds change.

initLazyManager() snapshots relayTimeoutSecs and p2pTimeoutSecs into manager.Config, but this path only restarts on modeChanged. A management push that keeps p2p-lazy/p2p-dynamic and only changes the relay/ICE inactivity timeouts will keep enforcing the old thresholds until the daemon restarts.

Suggested fix
-	prev := e.mode
+	prev := e.mode
+	prevRelay := e.relayTimeoutSecs
+	prevP2P := e.p2pTimeoutSecs
 	e.mode = newMode
 	e.relayTimeoutSecs = newRelay
 	e.p2pTimeoutSecs = newP2P
 	e.p2pRetryMaxSecs = newP2pRetry
 	e.propagateP2pRetryMaxToConns()

 	wasManaged := modeUsesLazyMgr(prev)
 	isManaged := modeUsesLazyMgr(newMode)
 	modeChanged := prev != newMode
+	timeoutsChanged := prevRelay != newRelay || prevP2P != newP2P

 	if modeChanged && wasManaged && !isManaged {
 		log.Infof("lazy/dynamic connection manager disabled by management push (mode=%s)", newMode)
 		e.closeManager(ctx)
 		e.statusRecorder.UpdateLazyConnection(false)
 		return nil
 	}

-	if modeChanged && wasManaged && isManaged {
+	if (modeChanged || timeoutsChanged) && wasManaged && isManaged {
 		// Switching between lazy and dynamic at runtime: tear down the
-		// existing manager so initLazyManager picks up the new timeouts.
-		log.Infof("lazy/dynamic mode change %s -> %s, restarting manager", prev, newMode)
+		// existing manager so initLazyManager picks up the new mode/timeouts.
+		log.Infof("lazy/dynamic config change (mode %s -> %s, relay %ds -> %ds, ice %ds -> %ds), restarting manager",
+			prev, newMode, prevRelay, newRelay, prevP2P, newP2P)
 		e.closeManager(ctx)
 		e.statusRecorder.UpdateLazyConnection(false)
 	}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@client/internal/conn_mgr.go` around lines 286 - 324, The code only restarts
the lazy/dynamic manager when mode changes, so updates that only modify
e.relayTimeoutSecs, e.p2pTimeoutSecs or e.p2pRetryMaxSecs leave the running
manager using stale thresholds; change the logic to detect when any of those
timeout fields differ from their previous values while modeUsesLazyMgr(newMode)
is true and e.lazyConnMgr != nil, and in that case call e.closeManager(ctx),
update status via e.statusRecorder.UpdateLazyConnection(false), then
reinitialize via e.initLazyManager(ctx), call e.startModeSideEffects(), and
ensure e.propagateP2pRetryMaxToConns() is called so the new values are picked up
by the manager and active connections (reference symbols: e.relayTimeoutSecs,
e.p2pTimeoutSecs, e.p2pRetryMaxSecs, modeUsesLazyMgr, e.lazyConnMgr,
e.closeManager, e.initLazyManager, e.startModeSideEffects,
e.propagateP2pRetryMaxToConns, e.statusRecorder.UpdateLazyConnection).
management/server/http/handler.go (1)

142-149: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Peer-connections REST routes still missing explicit RBAC.

/peers/{peerId}/connections (read) and /peers/{peerId}/connections/refresh (write — triggers a snapshot dispatch over gRPC) are wired here, but peer_connections_http.NewHandler is not given a permissionsManager, so unlike the sibling peers.AddEndpoints at Line 140 the handler can't call ValidateUserPermissions(..., modules.Peers, operations.Read|Write). The current authorization story relies entirely on accountManager.GetPeer to fail for unauthorized callers, which is a side-effect rather than a deliberate RBAC contract — and the refresh endpoint mutates network state.

Thread permissionsManager into peer_connections_http.NewHandler and gate GetPeerConnections on Peers.Read and PostRefresh on Peers.Write before merging.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@management/server/http/handler.go` around lines 142 - 149, The
peer-connections routes lack explicit RBAC: thread the existing
permissionsManager into peer_connections_http.NewHandler (instead of omitting
it) and update the handler so GetPeerConnections calls
ValidateUserPermissions(..., modules.Peers, operations.Read) and PostRefresh
calls ValidateUserPermissions(..., modules.Peers, operations.Write) before
proceeding; adjust the pcAccountManagerAdapter/new handler constructor usage to
accept and store the permissionsManager and enforce these checks (use the same
ValidateUserPermissions pattern used by peers.AddEndpoints and
accountManager.GetPeer to perform explicit authorization).
management/internals/shared/grpc/server.go (2)

487-498: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Closed snapshotCh causes the select to spin.

SnapshotRouter.Register is documented to close the previous channel on fast reconnect (per the Phase 3.7i design). When that happens, nonce, ok := <-snapshotCh immediately returns with ok==false and continue falls back into the same select, which sees the closed channel as permanently ready — pegging this goroutine until srv.Context() is cancelled. Set snapshotCh = nil (or return) on close so the case is disabled.

🛡️ Suggested fix
 		case nonce, ok := <-snapshotCh:
 			if !ok {
-				continue
+				snapshotCh = nil
+				continue
 			}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@management/internals/shared/grpc/server.go` around lines 487 - 498, The
select case reading from snapshotCh spins when the channel is closed because on
receive it returns ok==false and continue leaves the case enabled; update the
handling in the select case for "case nonce, ok := <-snapshotCh:" so that when
ok==false you disable the case by setting snapshotCh = nil (or return) instead
of continue, thereby preventing the select from repeatedly firing on the closed
channel; keep the rest of the logic (building snapMsg and calling s.sendUpdate)
unchanged and reference snapshotCh, s.sendUpdate, and the select block around
srv.Context().

1175-1185: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

SyncPeerConnections accepts uploads from any keypair.

Unlike SyncMeta (Line 1146) and Sync (Line 238), this handler stops at envelope decryption — it never resolves the peerKey against the account/peer store. Any client that owns a WireGuard keypair can encrypt a PeerConnectionMap against the management server's pubkey and have it written into the TTL store under their own pubkey, growing the map unboundedly and seeding state that the dashboard will surface.

Mirror the Sync flow's GetAccountIDForPeerKey (or GetPeerByPeerPubKey) lookup before calling s.peerConnections.Put:

🔒️ Suggested fix
 func (s *Server) SyncPeerConnections(ctx context.Context, req *proto.EncryptedMessage) (*proto.Empty, error) {
 	pcm := &proto.PeerConnectionMap{}
 	peerKey, err := s.parseRequest(ctx, req, pcm)
 	if err != nil {
 		return nil, err
 	}
+	if _, err := s.accountManager.GetAccountIDForPeerKey(ctx, peerKey.String()); err != nil {
+		if e, ok := internalStatus.FromError(err); ok && e.Type() == internalStatus.NotFound {
+			return nil, status.Errorf(codes.PermissionDenied, "peer is not registered")
+		}
+		return nil, mapError(ctx, err)
+	}
 	if s.peerConnections != nil {
 		s.peerConnections.Put(peerKey.String(), pcm)
 	}
 	return &proto.Empty{}, nil
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@management/internals/shared/grpc/server.go` around lines 1175 - 1185, The
SyncPeerConnections handler currently writes decrypted PeerConnectionMap for any
valid WireGuard key; update Server.SyncPeerConnections to resolve the decrypted
peerKey to an account/peer before storing: call the same lookup used in Sync
(e.g. GetAccountIDForPeerKey or GetPeerByPeerPubKey) with the peerKey returned
from s.parseRequest, return an error (permission denied / not found) if no
account/peer is found, and only call s.peerConnections.Put with the resolved
account/peer identifier (peer ID or account ID string) instead of
peerKey.String(); ensure you reuse the same error handling pattern as in
Sync/SyncMeta.
client/ui/client_ui.go (1)

741-750: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Invalid timeout input is still silently coerced to 0.

parseUint32Field (Line 656) returns 0 on empty, malformed, or out-of-range input. At Line 745–750 the result is unconditionally written into req.RelayTimeoutSeconds / P2PTimeoutSeconds / P2PRetryMaxSeconds, so a typo (e.g. 30s, -5, abc) ends up clearing the local override without any user feedback — and validateSettings (Line 577) does not check these fields. hasConnectionModeChanges (Line 638–640) likewise normalizes via the same lossy parser, so the change-detection path can also miss real edits.

Validate the three entries in validateSettings (or have parseUint32Field return (value, ok)) and surface an error before the window closes, so non-empty text either parses or aborts the save.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@client/ui/client_ui.go` around lines 741 - 750, The code silently coerces
invalid timeout text to 0 because parseUint32Field returns 0 on error and the
values are unconditionally assigned to req.RelayTimeoutSeconds /
req.P2PTimeoutSeconds / req.P2PRetryMaxSeconds (used also by
hasConnectionModeChanges), so invalid input clears overrides without feedback;
fix by changing parsing/validation so non-empty invalid input is rejected:
either update parseUint32Field to return (uint32, bool/error) and propagate the
error back to the caller, or add explicit checks in validateSettings that
iRelayTimeout.Text, iP2pTimeout.Text, and iP2pRetryMax.Text either are empty or
successfully parse to a uint32 (using a new parse helper), and if parsing fails
surface a user-facing error and abort save rather than writing 0 into req.* and
closing the window.
client/internal/conn_state_pusher.go (2)

107-112: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Coalesce snapshot requests to the newest nonce.

Current behavior can drop the newest nonce when the queue is full and then flush an older nonce first, which breaks the “latest nonce” coalescing contract.

Proposed fix
 func (p *connStatePusher) OnSnapshotRequest(nonce uint64) {
 	select {
 	case p.snapshotReq <- nonce:
+		return
 	default:
 	}
+	// Queue full: keep only the latest nonce.
+	for {
+		select {
+		case <-p.snapshotReq:
+		default:
+			select {
+			case p.snapshotReq <- nonce:
+			default:
+			}
+			return
+		}
+	}
 }
@@
 		case nonce := <-p.snapshotReq:
+			// Coalesce pending requests and honor the latest nonce.
+			for {
+				select {
+				case newer := <-p.snapshotReq:
+					nonce = newer
+				default:
+					goto flushSnapshot
+				}
+			}
+		flushSnapshot:
 			if p.source != nil {
 				p.flushFull(p.source.SnapshotAllRemotePeers(), nonce)
 			}

Also applies to: 174-177

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@client/internal/conn_state_pusher.go` around lines 107 - 112,
OnSnapshotRequest in connStatePusher currently drops the newest nonce when the
snapshotReq channel is full; change the send logic so that if p.snapshotReq
would block you first drain one value from the channel (non-blocking receive)
and then send the new nonce, ensuring the newest nonce is always enqueued.
Implement this replace-oldest behavior in the OnSnapshotRequest
implementation(s) (the occurrence at lines ~107-112 and the similar block at
~174-177) by attempting a non-blocking send, and on failure perform a
non-blocking receive from p.snapshotReq before sending the new nonce.

261-275: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Reconcile removed peers during source-delta computation.

Removed peers never generate an update here, so management can retain stale peer entries until TTL or an unrelated full snapshot.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@client/internal/conn_state_pusher.go` around lines 261 - 275,
computeDeltaFromSource currently only emits events for peers present in the
source snapshot and misses peers that were removed; update
computeDeltaFromSource to also detect pubkeys present in p.lastPushed but
missing from p.source.SnapshotAllRemotePeers() and append corresponding removal
events so stale entries are reconciled. Specifically: inside
computeDeltaFromSource (while holding p.mu), build a set/map of current pubkeys
from all, then iterate p.lastPushed to find keys not in that set and append a
PeerStateChangeEvent representing removal for each (use the project’s canonical
removed status/flag for PeerStateChangeEvent, e.g., PeerStateRemoved); ensure
delta capacity accounts for these extra events and keep locking semantics
(p.mu.Lock()/defer Unlock) as in the original function.
client/internal/engine.go (1)

354-357: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Unregister conn-state callbacks before clearing the pusher.

Stop nils e.connStatePusher while callbacks installed in Start can still fire. That leaves a nil-deref path during teardown races.

Proposed fix
 	if e.connStatePusher != nil {
+		e.statusRecorder.SetConnStateListener(nil)
+		e.mgmClient.SetSnapshotRequestHandler(nil)
 		e.connStatePusher.Stop()
 		e.connStatePusher = nil
 	}

Also applies to: 632-637

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@client/internal/engine.go` around lines 354 - 357, The teardown clears
e.connStatePusher while callbacks registered in Start can still fire, creating a
nil-deref race; update the shutdown sequence in the methods around
e.connStatePusher (the Stop/Start pair) to first unregister or remove any
conn-state callbacks (the same callback IDs or unregister API used in Start)
while the pusher is still valid, then call e.connStatePusher.Stop(), and only
after Stop returns set e.connStatePusher = nil; apply the same change to the
other occurrence around lines 632-637 to ensure callbacks are removed before
nil-ing the pusher.
🧹 Nitpick comments (7)
client/internal/lazyconn/support.go (1)

26-29: 💤 Low value

-dev- / -ci- embedded marker check could match non-dev release strings

The strings.Contains(agentVersion, "-dev-") and strings.Contains(agentVersion, "-ci-") guards will return true for any version string that happens to contain those substrings — e.g. a hypothetical "1.0.0-preview-ci-build" or a version emitted by a third-party packaging pipeline. Because these cases short-circuit before the minVersion comparison, a peer running an older build with such a version string would be incorrectly classified as lazy-conn–capable.

The risk is low in practice (the formats are tightly controlled), but it's worth documenting the assumption explicitly or tightening the match, e.g. requiring the marker to be the second segment only ("0.0.0-dev-*").

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@client/internal/lazyconn/support.go` around lines 26 - 29, The current
substring checks on agentVersion (strings.Contains(agentVersion, "-dev-") /
"-ci-") can match unintended versions; update the guard in support.go to parse
agentVersion by '-' and require the marker to be the second segment (e.g. split
:= strings.SplitN(agentVersion, "-", 3) and test split[1] == "dev" or "ci") so
only versions like "0.0.0-dev-*" or "0.0.0-ci-*" are treated as dev/ci, leaving
the minVersion comparison intact for all others.
client/internal/peer/guard/guard.go (1)

144-147: ⚡ Quick win

Synchronous callback can stall the reconnect select loop.

g.onNetworkChange() is called inline inside the srReconnectedChan select branch. The callback is documented as "reset iceBackoff + recreate workerICE," which may acquire locks, send on channels, or do other non-trivial work. While it blocks, the guard cannot process relayedConnDisconnected, iCEConnDisconnected, or ticker events, potentially delaying reconnection.

Dispatch it in a goroutine so the select loop stays live:

♻️ Proposed fix
-			// Phase 3.5 (`#5989`): notify Conn to reset iceBackoff + recreate workerICE
-			if g.onNetworkChange != nil {
-				g.onNetworkChange()
-			}
+			// Phase 3.5 (`#5989`): notify Conn to reset iceBackoff + recreate workerICE.
+			// Fire in a goroutine so the select loop is not stalled.
+			if g.onNetworkChange != nil {
+				go g.onNetworkChange()
+			}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@client/internal/peer/guard/guard.go` around lines 144 - 147, The select
branch handling srReconnectedChan currently calls g.onNetworkChange()
synchronously, which can block the guard's select loop and delay handling of
relayedConnDisconnected, iCEConnDisconnected, and ticker events; change this to
dispatch the callback in a new goroutine (e.g., go func() { g.onNetworkChange()
}()) inside the srReconnectedChan branch so the select loop remains responsive,
and consider wrapping the call with a recover to avoid a panic in the callback
from crashing the goroutine.
management/server/peer/peer_test.go (1)

145-152: ⚡ Quick win

Extend coverage to the three numeric Effective*Secs fields.

The new test only exercises EffectiveConnectionMode. The three uint32 fields (EffectiveRelayTimeoutSecs, EffectiveP2PTimeoutSecs, EffectiveP2PRetryMaxSecs) are added to isEqual in the same change — a regression that drops any of them from the comparison would not be caught.

Suggested table-driven extension
 func TestPeerSystemMeta_isEqual_ChecksEffectiveFields(t *testing.T) {
-	base := PeerSystemMeta{Hostname: "h", EffectiveConnectionMode: "p2p-dynamic"}
-	other := base
-	other.EffectiveConnectionMode = "p2p"
-	if base.isEqual(other) {
-		t.Error("isEqual should return false when EffectiveConnectionMode differs")
-	}
+	base := PeerSystemMeta{
+		Hostname:                  "h",
+		EffectiveConnectionMode:   "p2p-dynamic",
+		EffectiveRelayTimeoutSecs: 30,
+		EffectiveP2PTimeoutSecs:   45,
+		EffectiveP2PRetryMaxSecs:  900,
+	}
+	tests := []struct {
+		name   string
+		mutate func(*PeerSystemMeta)
+	}{
+		{"EffectiveConnectionMode", func(m *PeerSystemMeta) { m.EffectiveConnectionMode = "p2p" }},
+		{"EffectiveRelayTimeoutSecs", func(m *PeerSystemMeta) { m.EffectiveRelayTimeoutSecs = 60 }},
+		{"EffectiveP2PTimeoutSecs", func(m *PeerSystemMeta) { m.EffectiveP2PTimeoutSecs = 90 }},
+		{"EffectiveP2PRetryMaxSecs", func(m *PeerSystemMeta) { m.EffectiveP2PRetryMaxSecs = 1800 }},
+	}
+	for _, tc := range tests {
+		t.Run(tc.name, func(t *testing.T) {
+			other := base
+			tc.mutate(&other)
+			if base.isEqual(other) {
+				t.Errorf("isEqual should return false when %s differs", tc.name)
+			}
+		})
+	}
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@management/server/peer/peer_test.go` around lines 145 - 152, The test
TestPeerSystemMeta_isEqual_ChecksEffectiveFields currently only verifies
EffectiveConnectionMode; extend it to also cover the three numeric fields on
PeerSystemMeta (EffectiveRelayTimeoutSecs, EffectiveP2PTimeoutSecs,
EffectiveP2PRetryMaxSecs) by converting the test to a small table-driven loop
that starts from a base PeerSystemMeta, clones it to "other", mutates each field
one at a time (including the uint32 fields) to a different value, and asserts
that base.isEqual(other) returns false for each case; use the existing function
name TestPeerSystemMeta_isEqual_ChecksEffectiveFields and the struct
PeerSystemMeta to locate where to update the test.
management/internals/server/boot.go (1)

111-125: Document the replica-local nature of the new peer-connection state.

PeerConnStore() and PeerConnRouter() are in-process only, so GET/refresh works predictably only when the dashboard request and the peer's gRPC stream land on the same management replica. If multi-instance deployments are supported, this feature will need sticky routing or a shared backend for consistent behavior.

Also applies to: 127-129, 192-192

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@management/internals/server/boot.go` around lines 111 - 125, Update the
comments for PeerConnStore and PeerConnRouter to explicitly state that the
returned peer-connection state is in-process and replica-local (i.e., only valid
for requests and gRPC streams handled by the same management replica) and that
multi-replica deployments require sticky routing or a shared backend for
consistent GET/refresh behavior; apply the same clarified comment wording to the
other occurrences noted (around the blocks at the referenced locations) so all
mentions consistently document the replica-local constraint and recommended
mitigations.
management/server/peer_connections/snapshot_router_test.go (1)

3-3: ⚡ Quick win

Avoid blocking forever on the close assertions.

Both unregister checks do a plain receive. If the router stops closing the channel, this test hangs until the package timeout instead of failing here.

Suggested change
-import "testing"
+import (
+	"testing"
+	"time"
+)
@@
 	r := NewSnapshotRouter()
 	ch := r.Register("peerA")
 	r.Unregister("peerA", ch)
-	if _, ok := <-ch; ok {
-		t.Error("channel should be closed after Unregister")
-	}
+	select {
+	case _, ok := <-ch:
+		if ok {
+			t.Error("channel should be closed after Unregister")
+		}
+	case <-time.After(100 * time.Millisecond):
+		t.Fatal("timed out waiting for channel close")
+	}
@@
 	// Proper Unregister with the fresh token tears it down.
 	r.Unregister("peerA", fresh)
-	if _, ok := <-fresh; ok {
-		t.Error("fresh channel should be closed after its own Unregister")
-	}
+	select {
+	case _, ok := <-fresh:
+		if ok {
+			t.Error("fresh channel should be closed after its own Unregister")
+		}
+	case <-time.After(100 * time.Millisecond):
+		t.Fatal("timed out waiting for fresh channel close")
+	}
 }

Also applies to: 28-34, 57-60

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@management/server/peer_connections/snapshot_router_test.go` at line 3, The
two plain receives used to assert channel closes in snapshot_router_test.go (the
unregister checks that currently do `<-ch` without timeout) can hang forever;
change each to a select that waits for the close/receive case and a time.After
case that calls t.Fatalf/t.Fatal on timeout (e.g., select { case <-unregisterCh:
/* ok */ case <-time.After(time.Second): t.Fatalf("unregister channel not
closed") }). Replace both unregister/close assertions with this pattern so the
test fails fast instead of blocking.
management/internals/shared/grpc/server.go (2)

435-437: 💤 Low value

Guard against a nil snapshotRouter.

handleUpdates dereferences s.snapshotRouter unconditionally (s.snapshotRouter.Register(...)), but the rest of the file is defensive about the companion store (if s.peerConnections != nil at Line 1181). If a future caller (test harness, alternate boot.go) ever constructs Server without a router, every Sync will panic. A nil-check around the register/unregister + select case keeps the two fields consistent in defensiveness.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@management/internals/shared/grpc/server.go` around lines 435 - 437,
handleUpdates currently dereferences s.snapshotRouter unguarded; wrap the
Register/Unregister and related select-case logic so it only registers when
s.snapshotRouter != nil. Specifically, in handleUpdates, check s.snapshotRouter
!= nil before calling s.snapshotRouter.Register(peerKey.String()) and defer
s.snapshotRouter.Unregister(...), and ensure the select branch that uses
snapshotCh is only added/used when snapshotCh is non-nil (or replace it with a
guarded case). This keeps snapshotRouter usage consistent with the existing
nil-guards around s.peerConnections and prevents panics when Server is
constructed without a router.

487-498: ⚡ Quick win

Snapshot send-error path is inconsistent with the regular update path.

The regular update branch at Line 457–460 returns the sendUpdate error (which already calls cancelPeerRoutines), terminating the loop. This new snapshot branch (Line 496–498) only logs and falls through, but sendUpdate has already cancelled the peer routines and the encrypted-stream send has failed — so the next iteration is operating on a half-broken stream and any subsequent update will hit the same failure. Returning the error keeps the two paths uniform and lets the stream tear down cleanly.

♻️ Suggested fix
 			if err := s.sendUpdate(ctx, accountID, peerKey, peer, snapMsg, srv, streamStartTime); err != nil {
-				log.WithContext(ctx).Warnf("send snapshot request to %s: %v", peerKey.String(), err)
+				log.WithContext(ctx).Warnf("send snapshot request to %s: %v", peerKey.String(), err)
+				return err
 			}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@management/internals/shared/grpc/server.go` around lines 487 - 498, The
snapshot branch that reads from snapshotCh logs sendUpdate errors and continues,
but sendUpdate already cancels peer routines (cancelPeerRoutines) and leaves the
stream half-broken; make the snapshot branch mirror the regular update branch by
returning the error from sendUpdate (propagate the err returned by
s.sendUpdate(...) instead of only logging) so the surrounding peer loop/handler
(the function that reads snapshotCh and calls sendUpdate) terminates and the
stream is torn down cleanly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 370076ef-ff00-496b-9477-7545b47086ae

📥 Commits

Reviewing files that changed from the base of the PR and between 907bc1c and 3e758d1.

⛔ Files ignored due to path filters (3)
  • client/proto/daemon.pb.go is excluded by !**/*.pb.go
  • shared/management/proto/management.pb.go is excluded by !**/*.pb.go
  • shared/management/proto/management_grpc.pb.go is excluded by !**/*.pb.go
📒 Files selected for processing (59)
  • client/android/client.go
  • client/android/peer_notifier.go
  • client/android/preferences.go
  • client/cmd/service.go
  • client/cmd/service_installer.go
  • client/internal/conn_mgr.go
  • client/internal/conn_mgr_test.go
  • client/internal/conn_state_pusher.go
  • client/internal/conn_state_pusher_test.go
  • client/internal/debouncer/debouncer.go
  • client/internal/engine.go
  • client/internal/engine_pusher_adapters.go
  • client/internal/engine_test.go
  • client/internal/lazyconn/inactivity/manager.go
  • client/internal/lazyconn/inactivity/manager_test.go
  • client/internal/lazyconn/manager/manager.go
  • client/internal/lazyconn/support.go
  • client/internal/peer/conn.go
  • client/internal/peer/guard/guard.go
  • client/internal/peer/handshaker.go
  • client/internal/peer/ice_backoff.go
  • client/internal/peer/ice_backoff_test.go
  • client/internal/peer/status.go
  • client/internal/peer/status_test.go
  • client/internal/peer/worker_ice.go
  • client/proto/daemon.proto
  • client/server/server.go
  • client/ui/client_ui.go
  • client/ui/const.go
  • client/ui/event_handler.go
  • client/ui/network.go
  • client/ui/peers_tab.go
  • management/internals/controllers/network_map/controller/controller.go
  • management/internals/server/boot.go
  • management/internals/shared/grpc/conversion.go
  • management/internals/shared/grpc/server.go
  • management/server/account/manager.go
  • management/server/account/manager_mock.go
  • management/server/http/handler.go
  • management/server/http/handlers/peer_connections/handler.go
  • management/server/http/handlers/peer_connections/handler_test.go
  • management/server/http/testing/testing_tools/channel/channel.go
  • management/server/management_proto_test.go
  • management/server/management_test.go
  • management/server/mock_server/account_mock.go
  • management/server/peer.go
  • management/server/peer/peer.go
  • management/server/peer/peer_test.go
  • management/server/peer_connections/snapshot_router.go
  • management/server/peer_connections/snapshot_router_test.go
  • management/server/peer_connections/store.go
  • management/server/peer_connections/store_test.go
  • management/server/peer_test.go
  • management/server/store/sql_store.go
  • shared/management/client/client.go
  • shared/management/client/client_test.go
  • shared/management/client/grpc.go
  • shared/management/client/mock.go
  • shared/management/proto/management.proto
💤 Files with no reviewable changes (2)
  • client/ui/event_handler.go
  • client/ui/const.go
🚧 Files skipped from review as they are similar to previous changes (2)
  • shared/management/client/client.go
  • management/server/http/handlers/peer_connections/handler.go

Comment on lines +137 to +139
if err := applyConnectionModeFlagsToProfile(cmd); err != nil {
cmd.PrintErrf("Warning: failed to persist connection-mode flags: %v\n", err)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don't downgrade explicit profile-write failures to warnings.

If applyConnectionModeFlagsToProfile() returns an error here, the user explicitly passed one of the new mode/timeout flags and that requested config will not reach the daemon. Letting install/reconfigure continue silently applies defaults instead of the requested settings.

Suggested fix
-		if err := applyConnectionModeFlagsToProfile(cmd); err != nil {
-			cmd.PrintErrf("Warning: failed to persist connection-mode flags: %v\n", err)
-		}
+		if err := applyConnectionModeFlagsToProfile(cmd); err != nil {
+			return fmt.Errorf("persist connection-mode flags: %w", err)
+		}

Also applies to: 263-265

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@client/cmd/service_installer.go` around lines 137 - 139, The call to
applyConnectionModeFlagsToProfile currently logs errors as warnings and
continues, which silently loses user-specified settings; change both call sites
that invoke applyConnectionModeFlagsToProfile (the install/reconfigure command
handlers) to treat a non-nil error as fatal by propagating/returning the error
(or exiting with non-zero) instead of printing a warning so the
install/reconfigure aborts and the daemon never runs with defaulted settings.

Comment on lines 78 to +107
func NewConnMgr(engineConfig *EngineConfig, statusRecorder *peer.Status, peerStore *peerstore.Store, iface lazyconn.WGIface) *ConnMgr {
e := &ConnMgr{
envMode, envRelayTimeout := peer.ResolveModeFromEnv()

// First-pass resolution without server input -- updated later when
// the first NetworkMap arrives via UpdatedRemotePeerConfig.
mode, relayTimeout, p2pTimeout, p2pRetryMax := resolveConnectionMode(
envMode, envRelayTimeout,
engineConfig.ConnectionMode, engineConfig.RelayTimeoutSeconds,
engineConfig.P2pTimeoutSeconds,
engineConfig.P2pRetryMaxSeconds,
nil,
)

return &ConnMgr{
peerStore: peerStore,
statusRecorder: statusRecorder,
iface: iface,
rosenpassEnabled: engineConfig.RosenpassEnabled,
mode: mode,
relayTimeoutSecs: relayTimeout,
p2pTimeoutSecs: p2pTimeout,
p2pRetryMaxSecs: p2pRetryMax,
envMode: envMode,
envRelayTimeout: envRelayTimeout,
cfgMode: engineConfig.ConnectionMode,
cfgRelayTimeout: engineConfig.RelayTimeoutSeconds,
cfgP2pTimeout: engineConfig.P2pTimeoutSeconds,
cfgP2pRetryMax: engineConfig.P2pRetryMaxSeconds,
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don't keep p2p-lazy/p2p-dynamic as the effective mode when Rosenpass blocks the manager.

Lines 181-187 and 317-319 skip starting the lazy manager, but the resolved mode is still stored in e.mode. That means later mode-gated paths like ActivatePeer, DeactivatePeer, and the new effective-mode reporting still behave as if lazy/dynamic is active even though the daemon stayed on the eager path.

Suggested fix
+func normalizeEffectiveMode(mode connectionmode.Mode, rosenpassEnabled bool) connectionmode.Mode {
+	if rosenpassEnabled && modeUsesLazyMgr(mode) {
+		return connectionmode.ModeP2P
+	}
+	return mode
+}
+
 func NewConnMgr(engineConfig *EngineConfig, statusRecorder *peer.Status, peerStore *peerstore.Store, iface lazyconn.WGIface) *ConnMgr {
 	envMode, envRelayTimeout := peer.ResolveModeFromEnv()
@@
 	mode, relayTimeout, p2pTimeout, p2pRetryMax := resolveConnectionMode(
 		envMode, envRelayTimeout,
 		engineConfig.ConnectionMode, engineConfig.RelayTimeoutSeconds,
 		engineConfig.P2pTimeoutSeconds,
 		engineConfig.P2pRetryMaxSeconds,
 		nil,
 	)
+	mode = normalizeEffectiveMode(mode, engineConfig.RosenpassEnabled)
@@
 	newMode, newRelay, newP2P, newP2pRetry := resolveConnectionMode(
 		e.envMode, e.envRelayTimeout, e.cfgMode, e.cfgRelayTimeout,
 		e.cfgP2pTimeout, e.cfgP2pRetryMax, pc,
 	)
+	newMode = normalizeEffectiveMode(newMode, e.rosenpassEnabled)

Also applies to: 176-187, 281-319

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@client/internal/conn_mgr.go` around lines 78 - 107, NewConnMgr currently
records the resolved mode (possibly "p2p-lazy"/"p2p-dynamic") even when
rosenpass blocks starting the lazy manager, causing the ConnMgr.mode to claim
lazy/dynamic while the daemon stayed on the eager path; change NewConnMgr to
detect when rosenpass blocks the lazy manager (rosenpassEnabled true / the same
condition used later to skip starting the lazy manager) and override the stored
mode to the eager equivalent before returning the ConnMgr (update the mode field
set in NewConnMgr after calling resolveConnectionMode), so code paths that read
ConnMgr.mode (e.g., ActivatePeer/DeactivatePeer and effective-mode reporting)
reflect the actual running mode.

Comment thread client/internal/lazyconn/manager/manager.go
Comment thread client/ui/client_ui.go
Comment on lines +745 to +750
relaySecs := parseUint32Field(s.iRelayTimeout.Text)
p2pSecs := parseUint32Field(s.iP2pTimeout.Text)
retrySecs := parseUint32Field(s.iP2pRetryMax.Text)
req.RelayTimeoutSeconds = &relaySecs
req.P2PTimeoutSeconds = &p2pSecs
req.P2PRetryMaxSeconds = &retrySecs
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Stale timeout overrides are still sent when switching to "Follow server".

updateTimeoutEntriesEnabled (Line 902–905) only Disable()s the entries for the non-lazy/dynamic modes — it never clears their Text. So if a user previously had p2p-lazy with RelayTimeout=60 and now picks "Follow server", buildSetConfigRequest at Line 745–750 still parses the disabled entry and writes RelayTimeoutSeconds=60 into the request. The override silently persists on the daemon and reappears the next time the user switches back to a lazy/dynamic mode.

Consider clearing the entry text (or skipping the assignment) in modes where the override is meaningless:

♻️ Suggested fix
 func (s *serviceClient) updateTimeoutEntriesEnabled() {
 	if s.iRelayTimeout == nil {
 		return
 	}
 	switch s.sConnectionMode.Selected {
 	case "p2p-lazy":
 		s.iRelayTimeout.Enable()
 		s.iP2pTimeout.Disable()
+		s.iP2pTimeout.SetText("")
 		s.iP2pRetryMax.Disable()
+		s.iP2pRetryMax.SetText("")
 	case "p2p-dynamic":
 		s.iRelayTimeout.Enable()
 		s.iP2pTimeout.Enable()
 		s.iP2pRetryMax.Enable()
 	default:
 		s.iRelayTimeout.Disable()
+		s.iRelayTimeout.SetText("")
 		s.iP2pTimeout.Disable()
+		s.iP2pTimeout.SetText("")
 		s.iP2pRetryMax.Disable()
+		s.iP2pRetryMax.SetText("")
 	}
 }

Also applies to: 889-907

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@client/ui/client_ui.go` around lines 745 - 750, The disabled timeout fields
(iRelayTimeout.Text, iP2pTimeout.Text, iP2pRetryMax.Text) are still parsed in
buildSetConfigRequest via parseUint32Field and written into req.* even when
updateTimeoutEntriesEnabled has disabled them; modify buildSetConfigRequest to
check the current mode (use the same mode/source that
updateTimeoutEntriesEnabled reads) and skip assigning RelayTimeoutSeconds,
P2PTimeoutSeconds, and P2PRetryMaxSeconds when the UI is in "Follow server" (or
other modes where overrides are meaningless), or alternatively clear the Text
values when disabling in updateTimeoutEntriesEnabled so parseUint32Field returns
empty; reference functions/fields: buildSetConfigRequest,
updateTimeoutEntriesEnabled, parseUint32Field, iRelayTimeout.Text,
iP2pTimeout.Text, iP2pRetryMax.Text, and the
req.RelayTimeoutSeconds/P2PTimeoutSeconds/P2PRetryMaxSeconds assignments.

Comment on lines +122 to +136
func (s *MemoryStore) GetWithNonceCheck(peerPubKey string, since uint64) (*mgmProto.PeerConnectionMap, bool) {
s.mu.Lock()
defer s.mu.Unlock()
e, ok := s.maps[peerPubKey]
if !ok {
return nil, false
}
if since > 0 && e.m.GetInResponseToNonce() < since {
return nil, false
}
if s.clock.Now().Sub(e.updatedAt) > s.ttl {
delete(s.maps, peerPubKey)
return nil, false
}
return proto.Clone(e.m).(*mgmProto.PeerConnectionMap), true
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Evict expired entries before the nonce gate.

GetWithNonceCheck returns on the nonce comparison before running TTL cleanup. An expired entry with an older nonce then stays in s.maps indefinitely if callers only poll with sinceNonce.

Suggested change
 func (s *MemoryStore) GetWithNonceCheck(peerPubKey string, since uint64) (*mgmProto.PeerConnectionMap, bool) {
 	s.mu.Lock()
 	defer s.mu.Unlock()
 	e, ok := s.maps[peerPubKey]
 	if !ok {
 		return nil, false
 	}
+	if s.clock.Now().Sub(e.updatedAt) > s.ttl {
+		delete(s.maps, peerPubKey)
+		return nil, false
+	}
 	if since > 0 && e.m.GetInResponseToNonce() < since {
 		return nil, false
 	}
-	if s.clock.Now().Sub(e.updatedAt) > s.ttl {
-		delete(s.maps, peerPubKey)
-		return nil, false
-	}
 	return proto.Clone(e.m).(*mgmProto.PeerConnectionMap), true
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func (s *MemoryStore) GetWithNonceCheck(peerPubKey string, since uint64) (*mgmProto.PeerConnectionMap, bool) {
s.mu.Lock()
defer s.mu.Unlock()
e, ok := s.maps[peerPubKey]
if !ok {
return nil, false
}
if since > 0 && e.m.GetInResponseToNonce() < since {
return nil, false
}
if s.clock.Now().Sub(e.updatedAt) > s.ttl {
delete(s.maps, peerPubKey)
return nil, false
}
return proto.Clone(e.m).(*mgmProto.PeerConnectionMap), true
func (s *MemoryStore) GetWithNonceCheck(peerPubKey string, since uint64) (*mgmProto.PeerConnectionMap, bool) {
s.mu.Lock()
defer s.mu.Unlock()
e, ok := s.maps[peerPubKey]
if !ok {
return nil, false
}
if s.clock.Now().Sub(e.updatedAt) > s.ttl {
delete(s.maps, peerPubKey)
return nil, false
}
if since > 0 && e.m.GetInResponseToNonce() < since {
return nil, false
}
return proto.Clone(e.m).(*mgmProto.PeerConnectionMap), true
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@management/server/peer_connections/store.go` around lines 122 - 136,
GetWithNonceCheck currently checks the nonce before evicting TTL-expired
entries, allowing expired entries to persist if callers only use since/nonce
polling; update the function (GetWithNonceCheck) to perform the TTL expiry check
immediately after loading e from s.maps (using s.clock.Now().Sub(e.updatedAt) >
s.ttl), delete the map entry and return nil,false if expired, and only then
perform the since/nonce comparison and final return; keep the existing mutex
usage (s.mu.Lock()/defer s.mu.Unlock()) and the proto.Clone(e.m) return
behavior.

Comment on lines +1635 to +1645
if sRelayTimeoutSeconds.Valid {
v := uint32(sRelayTimeoutSeconds.Int64)
account.Settings.RelayTimeoutSeconds = &v
}
if sP2pTimeoutSeconds.Valid {
v := uint32(sP2pTimeoutSeconds.Int64)
account.Settings.P2pTimeoutSeconds = &v
}
if sP2pRetryMaxSeconds.Valid {
v := uint32(sP2pRetryMaxSeconds.Int64)
account.Settings.P2pRetryMaxSeconds = &v
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# 1) Find schema/migration definitions for the new settings columns.
rg -n "settings_(connection_mode|relay_timeout_seconds|p2p_timeout_seconds|p2p_retry_max_seconds)" --type sql --type go

# 2) Find all read/write paths using these settings fields.
rg -n "ConnectionMode|RelayTimeoutSeconds|P2pTimeoutSeconds|P2pRetryMaxSeconds" --type go

# 3) Verify existing validation/constraints for non-negative and bounded values.
rg -n "CHECK\\s*\\(|>=\\s*0|validate|clamp|sanitize" --type sql --type go

Repository: netbirdio/netbird

Length of output: 50374


🏁 Script executed:

# Find the handler validation logic for these timeout fields
rg -n "RelayTimeoutSeconds|P2pTimeoutSeconds|P2pRetryMaxSeconds" management/server/http/handlers/accounts/accounts_handler.go -B 2 -A 2

Repository: netbirdio/netbird

Length of output: 1226


🏁 Script executed:

# Check migration files for any schema constraints
fd -e sql -e go | xargs grep -l "settings_relay_timeout_seconds\|settings_p2p_timeout_seconds\|settings_p2p_retry_max_seconds" 2>/dev/null | head -20

Repository: netbirdio/netbird

Length of output: 98


🏁 Script executed:

# Look for validation functions in the types package
rg -n "func.*Settings.*Validate\|func.*validate.*timeout" management/server/types/ --type go

Repository: netbirdio/netbird

Length of output: 43


🏁 Script executed:

# Check the API types that define the request structure
rg -n "type AccountSettings" shared/management/http/api/types.gen.go -A 30 | head -60

Repository: netbirdio/netbird

Length of output: 3017


🏁 Script executed:

# Look for migration or schema files
find . -type f \( -name "*.sql" -o -name "*migration*" -o -name "*schema*" \) 2>/dev/null | grep -v node_modules | head -20

Repository: netbirdio/netbird

Length of output: 1114


🏁 Script executed:

# Search for database initialization or schema setup
rg -n "CREATE TABLE.*settings|ALTER TABLE.*settings" --type sql -A 5

Repository: netbirdio/netbird

Length of output: 50373


🏁 Script executed:

# Look for migrations that define the timeout settings columns
rg -n "settings_relay_timeout_seconds|settings_p2p_timeout_seconds|settings_p2p_retry_max_seconds" management/cmd/migration_up.go -B 3 -A 3

Repository: netbirdio/netbird

Length of output: 43


🏁 Script executed:

# Check sql_store.go for migration/schema setup
rg -n "ALTER TABLE.*accounts.*ADD\|CREATE TABLE" management/server/store/sql_store.go -A 2 | head -100

Repository: netbirdio/netbird

Length of output: 43


🏁 Script executed:

# Look for GORM migration in sql_store.go
rg -n "AutoMigrate|migration|CreateTable" management/server/store/sql_store.go | head -30

Repository: netbirdio/netbird

Length of output: 186


🏁 Script executed:

# Check the Settings struct in types for GORM tags and constraints
cat -n management/server/types/settings.go | sed -n '60,85p'

Repository: netbirdio/netbird

Length of output: 1471


🏁 Script executed:

# Check handler for any validation before the cast
cat -n management/server/http/handlers/accounts/accounts_handler.go | sed -n '215,245p'

Repository: netbirdio/netbird

Length of output: 1524


🏁 Script executed:

# Search for tests with negative or invalid timeout values
rg -n "RelayTimeoutSeconds|P2pTimeoutSeconds|P2pRetryMaxSeconds" management/server/http/handlers/accounts/ --type go -B 2 -A 2 | grep -i "negative\|invalid\|-[0-9]" | head -20

Repository: netbirdio/netbird

Length of output: 1670


🏁 Script executed:

# Check if there's any request validation middleware or OpenAPI schema constraints
rg -n "validate|constraint|minimum|maximum" management/server/http/handlers/accounts/accounts_handler.go -i | head -20

Repository: netbirdio/netbird

Length of output: 1241


🏁 Script executed:

# Look for any other safeguards before reaching the store
rg -n "func.*UpdateSettings\|func.*SetSettings" management/server/ --type go | head -5

Repository: netbirdio/netbird

Length of output: 43


🏁 Script executed:

# Find the handler method that processes the update settings request
rg -n "func.*Update.*Account.*Settings\|HandleUpdateSettings" management/server/http/handlers/accounts/accounts_handler.go

Repository: netbirdio/netbird

Length of output: 43


🏁 Script executed:

# Check the complete flow - find where the handler calls the store
rg -n "\.SaveAccount\|UpdateAccount" management/server/http/handlers/accounts/accounts_handler.go -A 3 | head -40

Repository: netbirdio/netbird

Length of output: 43


🏁 Script executed:

# Find the function that handles account updates in the handler
rg -n "func.*handler.*\(.*Account" management/server/http/handlers/accounts/accounts_handler.go | head -10

Repository: netbirdio/netbird

Length of output: 187


🏁 Script executed:

# Search for where settings are saved after being validated
rg -n "SaveAccount\|StoreEvent\|return.*settings" management/server/http/handlers/accounts/accounts_handler.go | head -20

Repository: netbirdio/netbird

Length of output: 43


🏁 Script executed:

# Find where updateAccountRequestSettings is called
rg -n "updateAccountRequestSettings" management/server/http/handlers/accounts/accounts_handler.go -B 5 -A 5

Repository: netbirdio/netbird

Length of output: 929


🏁 Script executed:

# Look at the full updateAccountRequestSettings method
cat -n management/server/http/handlers/accounts/accounts_handler.go | sed -n '170,260p'

Repository: netbirdio/netbird

Length of output: 4644


🏁 Script executed:

# Confirm the same unsafe cast exists in the store layer at the exact lines mentioned
cat -n management/server/store/sql_store.go | sed -n '1630,1650p'

Repository: netbirdio/netbird

Length of output: 812


Add bounds checks before casting int64 timeout values to uint32.

Lines 1636, 1640, and 1644 perform unchecked casts from sql.NullInt64.Int64 to uint32, which allows negative or out-of-range values from the database to silently wrap. The same issue exists in the HTTP handler (management/server/http/handlers/accounts/accounts_handler.go lines 230, 234, 238). Add validation to reject invalid values before assignment.

🛠️ Suggested fix
 if sRelayTimeoutSeconds.Valid {
+	if sRelayTimeoutSeconds.Int64 < 0 || sRelayTimeoutSeconds.Int64 > int64(^uint32(0)) {
+		return nil, status.NewGetAccountFromStoreError(
+			fmt.Errorf("invalid settings_relay_timeout_seconds: %d", sRelayTimeoutSeconds.Int64),
+		)
+	}
 	v := uint32(sRelayTimeoutSeconds.Int64)
 	account.Settings.RelayTimeoutSeconds = &v
 }
 if sP2pTimeoutSeconds.Valid {
+	if sP2pTimeoutSeconds.Int64 < 0 || sP2pTimeoutSeconds.Int64 > int64(^uint32(0)) {
+		return nil, status.NewGetAccountFromStoreError(
+			fmt.Errorf("invalid settings_p2p_timeout_seconds: %d", sP2pTimeoutSeconds.Int64),
+		)
+	}
 	v := uint32(sP2pTimeoutSeconds.Int64)
 	account.Settings.P2pTimeoutSeconds = &v
 }
 if sP2pRetryMaxSeconds.Valid {
+	if sP2pRetryMaxSeconds.Int64 < 0 || sP2pRetryMaxSeconds.Int64 > int64(^uint32(0)) {
+		return nil, status.NewGetAccountFromStoreError(
+			fmt.Errorf("invalid settings_p2p_retry_max_seconds: %d", sP2pRetryMaxSeconds.Int64),
+		)
+	}
 	v := uint32(sP2pRetryMaxSeconds.Int64)
 	account.Settings.P2pRetryMaxSeconds = &v
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if sRelayTimeoutSeconds.Valid {
v := uint32(sRelayTimeoutSeconds.Int64)
account.Settings.RelayTimeoutSeconds = &v
}
if sP2pTimeoutSeconds.Valid {
v := uint32(sP2pTimeoutSeconds.Int64)
account.Settings.P2pTimeoutSeconds = &v
}
if sP2pRetryMaxSeconds.Valid {
v := uint32(sP2pRetryMaxSeconds.Int64)
account.Settings.P2pRetryMaxSeconds = &v
if sRelayTimeoutSeconds.Valid {
if sRelayTimeoutSeconds.Int64 < 0 || sRelayTimeoutSeconds.Int64 > int64(^uint32(0)) {
return nil, status.NewGetAccountFromStoreError(
fmt.Errorf("invalid settings_relay_timeout_seconds: %d", sRelayTimeoutSeconds.Int64),
)
}
v := uint32(sRelayTimeoutSeconds.Int64)
account.Settings.RelayTimeoutSeconds = &v
}
if sP2pTimeoutSeconds.Valid {
if sP2pTimeoutSeconds.Int64 < 0 || sP2pTimeoutSeconds.Int64 > int64(^uint32(0)) {
return nil, status.NewGetAccountFromStoreError(
fmt.Errorf("invalid settings_p2p_timeout_seconds: %d", sP2pTimeoutSeconds.Int64),
)
}
v := uint32(sP2pTimeoutSeconds.Int64)
account.Settings.P2pTimeoutSeconds = &v
}
if sP2pRetryMaxSeconds.Valid {
if sP2pRetryMaxSeconds.Int64 < 0 || sP2pRetryMaxSeconds.Int64 > int64(^uint32(0)) {
return nil, status.NewGetAccountFromStoreError(
fmt.Errorf("invalid settings_p2p_retry_max_seconds: %d", sP2pRetryMaxSeconds.Int64),
)
}
v := uint32(sP2pRetryMaxSeconds.Int64)
account.Settings.P2pRetryMaxSeconds = &v
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@management/server/store/sql_store.go` around lines 1635 - 1645, The code
casts sql.NullInt64 values (sRelayTimeoutSeconds, sP2pTimeoutSeconds,
sP2pRetryMaxSeconds) directly to uint32 and may wrap negative or out-of-range
values; validate each Int64 before assigning to account.Settings.*: check that
value is >= 0 and <= math.MaxUint32, otherwise return/propagate an error (or
skip assignment) and do not perform the cast; then assign via v := uint32(val)
and account.Settings.RelayTimeoutSeconds = &v (and similarly for
P2pTimeoutSeconds and P2pRetryMaxSeconds). Apply the same validation in the HTTP
handler that reads timeouts (the accounts handler functions that set
RelayTimeoutSeconds/P2pTimeoutSeconds/P2pRetryMaxSeconds) and return a 400 error
on invalid values. Ensure checks reference the same symbols
(sRelayTimeoutSeconds, sP2pTimeoutSeconds, sP2pRetryMaxSeconds and
account.Settings.*) so reviewers can locate and verify the fixes.

@mlsmaycon
Copy link
Copy Markdown
Collaborator

@MichaelUray can you open a discussion for us to talk about the PRs you are submitting? We have a few things coming that might overlap with your proposals, and we are also focusing on stability for the 1.0 release this quarter.

@MichaelUray
Copy link
Copy Markdown
Contributor Author

Hi @mlsmaycon,

I actually already opened an issue for a discussion, but i got no response to it so far.

#5989

The changes in these PRs shouldn't require a major version change, since they should be compatible to older versions as well.

@MichaelUray MichaelUray force-pushed the pr/c-phase3.7i-visibility branch from 5b18f3e to fbd0df5 Compare May 6, 2026 15:01
MichaelUray and others added 15 commits May 6, 2026 16:15
…nfig

Additive change for issue netbirdio#5989 Phase 1. New fields use new tag numbers
(11, 12, 13); existing fields (including LazyConnectionEnabled tag 6) are
unchanged so old clients ignore the additions and old servers send
UNSPECIFIED, which the new client maps back via the legacy boolean.

Note: the regenerated pb.go files now report protoc v5.29.3 in their
header (this branch was generated with locally-installed protoc 29.3
instead of upstream's v7.34.1). Functionally identical; header diff is
the only delta beyond the actual schema additions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Defines Mode enum (relay-forced, p2p, p2p-lazy, p2p-dynamic plus the
client-only sentinels Unspecified and FollowServer), ParseString for
CLI/env input, ToProto/FromProto for wire translation, and the two
backwards-compat helpers ResolveLegacyLazyBool / ToLazyConnectionEnabled
that bridge the old Settings.LazyConnectionEnabled boolean.

Phase 1 of issue netbirdio#5989. Pure addition -- no existing callers touched
in this commit; the engine/conn_mgr migration follows in subsequent
commits in the same PR.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…on warns

NB_CONNECTION_MODE wins over the legacy pair (NB_FORCE_RELAY,
NB_ENABLE_EXPERIMENTAL_LAZY_CONN); when the legacy pair is set
together, NB_FORCE_RELAY wins (most-restrictive, mirrors the
group-conflict rule from issue netbirdio#5990).

Each legacy var emits a one-shot deprecation warning when it actually
contributes to the resolved mode. NB_LAZY_CONN_INACTIVITY_THRESHOLD
becomes an alias for the future relay_timeout setting and warns once.

IsForceRelayed() is kept for callers that have not yet been migrated
(conn.go, statusrecorder); they will be updated in the engine/conn
refactor commits later in this PR.

Phase 1 of issue netbirdio#5989.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three new CLI flags map onto the new connection-mode plumbing:
- --connection-mode <relay-forced|p2p|p2p-lazy|p2p-dynamic|follow-server>
- --relay-timeout <seconds>
- --p2p-timeout <seconds>

Plumbed through three sites in cmd/up.go (SetConfigRequest, ConfigInput,
LoginRequest), persisted in profilemanager.Config, and added as new
fields on the daemon.proto IPC messages. Empty / not-changed flags fall
back to the server-pushed value (which itself falls back to the legacy
lazy_connection_enabled boolean for old servers).

Phase 1 of issue netbirdio#5989.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
EngineConfig gains ConnectionMode, RelayTimeoutSeconds, P2pTimeoutSeconds.
ConnMgr now stores the resolved Mode plus the raw inputs (env, config)
so it can re-resolve when the server pushes a new PeerConfig.

UpdatedRemoteFeatureFlag is renamed to UpdatedRemotePeerConfig and
takes the full PeerConfig pointer; a thin shim with the old name
delegates to it for callers that haven't been updated yet.

connect.go copies the three new fields from profilemanager.Config into
the EngineConfig builder, with a tolerant parser that logs and falls
through to Unspecified on invalid input.

Phase 1 of issue netbirdio#5989. peer/conn.go forwarding follows in C4.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ConnConfig gains a Mode field forwarded from the engine. Open() now
checks Mode == ModeRelayForced instead of calling the global env-reader
IsForceRelayed(). The local 'forceRelay' variable name is renamed to
'skipICE' to make the new branching intent explicit.

The PeerStateUpdate block at the end of Open() also reads from
conn.config.Mode now, so the StatusRecorder sees the per-peer mode
rather than the global env var.

A single remaining caller of IsForceRelayed() (srWatcher.Start in
engine.go) is left for a follow-up; that path uses a process-wide flag
not per-peer state, so it can be migrated in Phase 2 once srWatcher
itself learns about ConnectionMode.

Phase 1 of issue netbirdio#5989. Engine forwarding (C5) follows.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
createPeerConn now reads ConnMgr.Mode() and copies it into
peer.ConnConfig, so the per-peer Open() loop in conn.go can take the
ModeRelayForced skip-ICE branch without reading the global env var.

This is the last wiring commit for the client side of Phase 1; the
server-side mgmt changes (Settings + OpenAPI + handler + audit +
NetworkMap-build) follow in Section D.

Phase 1 of issue netbirdio#5989.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
All three fields are nullable to distinguish 'use built-in default'
(NULL) from explicit values (incl. 0 = never tear down). Copy() now
deep-clones the new pointer fields via two small helpers.

GORM AutoMigrate creates the new columns at first start; existing
accounts have NULL in all three columns and resolve via the legacy
LazyConnectionEnabled boolean.

Phase 1 of issue netbirdio#5989.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tings

Three new optional, nullable fields with descriptions of the NULL =
built-in-default semantics and the Phase-1-vs-Phase-2 status of
p2p-dynamic. Regenerated types.gen.go via the existing oapi-codegen
tooling.

The generated AccountSettingsConnectionMode enum has the canonical
values relay-forced / p2p / p2p-lazy / p2p-dynamic, plus a Valid()
helper for handler-side validation.

Phase 1 of issue netbirdio#5989.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PUT /api/accounts/{id} now accepts connection_mode (validated against
the four-value enum via the generated AccountSettingsConnectionMode.
Valid()), p2p_timeout_seconds and relay_timeout_seconds. NULL in the
JSON body keeps the existing value untouched (= "no client-side
override on this round-trip"); explicit NULL-clear via API uses a
distinct PATCH-style call which is out-of-scope for Phase 1.

Response payload mirrors the input fields back as nullable so the
dashboard can distinguish "use default" from "explicit value".

Phase 1 of issue netbirdio#5989. Audit-event emission follows in D5.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
AccountConnectionModeChanged (121), AccountRelayTimeoutChanged (122),
AccountP2pTimeoutChanged (123) -- emitted from account.go when settings
change. Per-peer / per-group event codes are reserved for Phase 3
(issue netbirdio#5990).

Phase 1 of issue netbirdio#5989.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
handleConnectionModeSettings is invoked from the same diff-detection
block as handleLazyConnectionSettings; emits one StoreEvent per
changed field (ConnectionMode, RelayTimeoutSeconds, P2pTimeoutSeconds)
with old/new values in the meta payload.

Four small ptr-equality / deref helpers are added for nullable string
and uint32 fields. They are package-private and named after the
existing convention used elsewhere in the package.

Phase 1 of issue netbirdio#5989.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…elds

Two changes in one commit because they're inseparable:

1. Move client/internal/peer/connectionmode/ to shared/connectionmode/.
   The package now needs to be importable from BOTH client/ and
   management/ (which is impossible while it lives under client/internal/
   per Go's internal-package rule). All imports updated; tests pass on
   both sides.

2. Extend management/internals/shared/grpc/conversion.go::toPeerConfig
   to populate the three new PeerConfig fields (ConnectionMode,
   P2PTimeoutSeconds, RelayTimeoutSeconds) using the connectionmode
   helpers. The legacy LazyConnectionEnabled boolean is now derived
   from the resolved Mode via ToLazyConnectionEnabled() rather than
   copied verbatim from Settings -- this is the central backwards-compat
   contract: old clients see only the boolean, new clients prefer the
   explicit enum and ignore the bool.

Resolution rules (Phase 1, account-wide only):
- Settings.ConnectionMode != nil and parses -> wins
- Otherwise -> ResolveLegacyLazyBool(LazyConnectionEnabled)
- timeouts: Settings.RelayTimeoutSeconds / P2pTimeoutSeconds when non-NULL,
  else 0 (= server has no preference; client uses built-in default)

Per-peer / per-group resolution comes in Phase 3 (netbirdio#5990).

Phase 1 of issue netbirdio#5989.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Nine sub-cases cover the Phase-1 resolution matrix from spec section 3:
- no settings -> default (P2P + lazy=false)
- legacy bool only -> mapped via ResolveLegacyLazyBool
- explicit ConnectionMode -> wins over the legacy bool
- timeouts propagate
- garbage ConnectionMode value -> tolerant fallback to legacy bool

Particular attention to the structural compat gap: relay-forced cannot
be expressed via the legacy boolean, so the wire field for old clients
is sent as false. Documented in the spec, asserted here.

Existing TestAccount_GetPeerNetworkMap remains green: existing test
peers have ConnectionMode=NULL in Settings, falls through to the legacy
ResolveLegacyLazyBool(false) -> ModeP2P -> wire bool false.

Phase 1 of issue netbirdio#5989.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 2 of netbirdio#5989 needs to dynamically detach the ICE listener at runtime
(when p2p-dynamic mode hits its ICE-inactivity threshold). Adds the
RemoveICEListener method and extends the Add/RemoveICEListener pair to
hold h.mu; Listen() now reads h.iceListener through readICEListener
under the same mutex, fixing a latent race that didn't matter while
AddICEListener was the only writer.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
MichaelUray and others added 28 commits May 6, 2026 16:16
Phase 3.7i of netbirdio#5989. Each NetworkMap update copies the RemotePeerConfig
mode/timeouts/groups/last-seen into peer.State via UpdatePeerRemoteMeta.
Both online (RemotePeers) and offline (OfflinePeers) lists get
populated so the daemon-RPC StatusResponse reflects everything.
peer.TimestampOrZero exported so the engine can nil-guard
last_seen_at_server cleanly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 3.7i of netbirdio#5989. Mirrors existing peers_handler RBAC: extracts
userAuth via nbcontext.GetUserAuthFromContext, looks up peer through
accountManager.GetPeer (which already enforces account-isolation +
permissions). Tests cover 200/401/404/202 and confirm cross-account
isolation. Enriches remote_fqdn via GetPeerByPubKey lookup.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 3.7i of netbirdio#5989. pcAccountManagerAdapter bridges the real
AccountManager into the small interface peer_connections.Handler uses.
GetPeerByPubKey added to AccountManager when missing — used by REST
handlers to enrich PeerConnectionMap entries with FQDNs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 3.7i of netbirdio#5989. Adds a 4th tab to the existing Networks AppTabs
in client/ui/network.go. Tab content: counter widget on top
(configured/online/breakdown), sorted accordion list (P2P -> Relayed
-> Idle -> Offline), Show-Full checkbox + Refresh button at bottom.
30 s polling via ctx-respecting goroutine; uses fyne.Do for
thread-safe widget updates. ASCII glyphs for status (avoids Unicode
render quirks on Fyne Windows). Real proto field names:
GetLocalIceCandidateEndpoint, GetRemoteIceCandidateEndpoint,
GetRelayAddress.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 3.7i (netbirdio#5989). Exposes ConfiguredPeersTotal, ServerOnlinePeers,
P2PConnectedPeers, RelayedConnectedPeers, IdleOnlinePeers,
ServerOfflinePeers from recorder.GetFullStatus() as int64 methods so
gomobile can generate the corresponding Java getters on Client.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 3.7i of netbirdio#5989 (follow-up to Phase 6.3). PeerInfo now exposes
Relayed, ServerOnline, ICE-candidate endpoints, RelayServerAddress,
LatencyMs, LastWireguardHandshake (RFC3339 string),
LastSeenAtServer (RFC3339 string), EffectiveConnectionMode,
ConfiguredConnectionMode, Groups (CSV), AgentVersion, OsVersion
(last two currently empty — peer.State has no Version/Os fields yet).

Strings instead of time.Time / []string for gomobile compatibility.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 3.7i UX polish:
- Tray menu entry "Networks" -> "Peers und Netzwerke" so the new
  Peers tab is discoverable from the menu name.
- "Peers" tab moves from 4th to 1st position in the Networks window
  (most-used per Phase 3.7i). User opens window -> sees Peers
  immediately.
- Accordion list wrapped in container.NewStack so VScroll expands to
  the center area's full size instead of collapsing to the
  Accordion's intrinsic content height (was ~1 row tall on a
  3-peer network).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 3.7i fix for inflated counter (Android showed 27/27 online when
4 peers were actually offline).

Root cause: NetworkMap.OfflinePeers contains only LOGIN-EXPIRED peers;
peers in OfflinePeers are not "currently unreachable", they're "session
needs re-auth". Peers with hardware/network down still live in
NetworkMap.RemotePeers as long as their login is valid. So the previous
counter math (online = len(d.peers); offline = len(d.offlinePeers))
mapped login-validity to liveness, which over-counts online.

Fix:
- Add bool live_online = 16 to RemotePeerConfig in mgmt proto.
- Mgmt server fills it from rPeer.Status.Connected (the actual
  liveness flag the server already tracks via heartbeat).
- Daemon: peer.RemoteMeta + State.RemoteLiveOnline carry the value.
- GetFullStatus counter: peer counted as online iff isLive
  (= RemoteLiveOnline || !mgmtKnowsLiveness for backward compat with
  pre-Phase-3.7i mgmt servers — fallback assumes online when mgmt
  doesn't populate LastSeenAtServer).

Backward compat: pre-Phase-3.7i mgmt servers leave both LastSeenAtServer
zero and LiveOnline false. The fallback condition (LastSeenAtServer
empty → assume online) preserves the legacy counter behaviour. Once
the upgraded mgmt server starts populating both fields, the daemon
honours the actual liveness flag.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
User feedback after Phase 3.7i deploy:
- Tray menu entry was German "Peers und Netzwerke"; should be English
  "Peers and Networks" to match the rest of the app.
- Same name applies to the window title (was "Networks").
- Peer list area was still small; the previous NewStack wrap inside
  buildPeersTabContent was not enough because the AppTabs container
  doesn't expand its children unless they request to fill.

Fix:
- network.go:38 -> "Peers and Networks" window title
- client_ui.go:1242 -> "Peers and Networks" tray entry
- network.go: wrap the Peers tab content in container.NewStack at the
  TabItem level so the tab area gives Border the full tab content
  rectangle.
- peers_tab.go: SetMinSize(400, 500) on the VScroll as a floor so
  the layout never collapses below a reasonable size; the parent
  containers can grow it beyond.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…are actions

User feedback round 2:
- "Show full peer details" checkbox should live in the WINDOW footer
  (next to Refresh / Select all / Deselect All), not duplicated inside
  the Peers tab.
- The duplicate Refresh button inside the tab should go away; the
  outer Refresh button now triggers Peers refresh when on the Peers
  tab and Networks refresh otherwise.
- Expanded peer rows wasted vertical space (Accordion distributes
  children across the available height when there are few of them).
- Multiple peers should be expandable simultaneously.

Refactor:
- buildPeersTabContent now returns a peersTabBundle{Content, ShowFull,
  Refresh}; network.go places ShowFull + Refresh in the outer footer
  and switches Select-all / Deselect-All visibility based on the
  active tab (hidden on Peers, shown on Networks tabs).
- Replaces widget.Accordion with a VBox of newPeerRow() elements:
  each row is a header-button + hidden detail-label that toggles on
  tap. Multi-expand naturally allowed (each row is independent).
  VBox packs tightly — no inter-row gap when only some are expanded.
- Counter format aligned with Android: "N of M peers online (server)"
  + "n P2P | n relayed | n idle | n offline".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two fixes for the Win UI Peers tab:
1. newPeerRow now adds/removes the detail Label dynamically on tap
   instead of pre-creating it Hidden. Some Fyne layouts still allocate
   space for hidden children, which made collapsed rows take ~150px
   each (inflating the VBox to where only 1-2 rows fit visibly).
   Dynamic add/remove guarantees collapsed rows are header-only.
2. scroll.SetMinSize(400, 600) restores the floor so the scroll area
   fills most of the window even when only a few peers exist. Without
   this floor the routeCheckContainer (an outer VBox) sized the tabs
   to MinSize and the Peers tab collapsed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tent

Two follow-up bugs reported on Windows after the dynamic add/remove
rewrite:
- Refresh collapsed all rows the user had opened. The render closure
  rebuilds listVBox from scratch and newPeerRow defaulted to collapsed.
  Persist expand state in a mutex-guarded map keyed by pubkey; rows
  consult it on construction (addDetail() if startExpanded) and update
  it on every tap.
- The inner VScroll inside the Peers tab over-allocated vertical space
  (scrollbar reaches far past the actual content). The outer Networks
  window already wraps the tab content in a VScroll, so the inner one
  was redundant and made the tab content claim a hard 600px floor
  regardless of how many peers were present. Remove the inner VScroll
  and let listVBox sit directly in container.NewBorder's center -- it
  now sizes to its content and the outer scroll handles overflow.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…rant version parsing

Fixes A/B/C reported after the first p2p-dynamic deployment:

A) When management activates lazy/dynamic mode at runtime, the previous
   path called lazyConnMgr.AddActivePeers() on every existing peer,
   keeping the open WireGuard tunnels running until their inactivity
   timers fired hours later. That contradicted the user-visible promise
   of lazy/dynamic ("idle until traffic"). Replaced with
   resetPeersToLazyIdle() which Close()s every supported peer and re-
   registers it via lazyConnMgr.AddPeer (idle entry). Mode changes are
   rare and the brief 1-2 s tunnel rebuild per peer is acceptable.

B) IsSupported() rejected any agent version without a "." in its name.
   Custom dev/CI builds tagged dev-<sha> or ci-<sha> by build-android-
   lib.sh and similar build scripts therefore got eager-opened by the
   lazy manager (treated like a pre-0.45 daemon). Now also accept
   "dev-" / "ci-" prefixes — those builds come from the same source
   tree as the special-cased "development" string and have the same
   lazy support guarantees.

C) startModeSideEffects() only set statusRecorder.UpdateLazyConnection(true)
   for ModeP2PLazy. Under p2p-dynamic the status recorder still
   reported "Lazy connection: false" even though peers were managed by
   the inactivity manager. Now both ModeP2PLazy and ModeP2PDynamic
   flip the flag, since they're the same to the end user.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codex#1: store.go merge dropped a non-zero refresh-nonce when a normal
delta with InResponseToNonce=0 followed a snapshot pushed in response
to nonce N. Subsequent GET ?since=N then returned 404 until the next
snapshot. Take max(prev, new) so the latest non-zero nonce sticks.

Codex#2: SnapshotRouter.Register overwrote any existing channel for
the same peer; a stale stream calling Unregister(peerKey) afterwards
could close the new stream's channel. Register now closes a previous
channel inline (fast-reconnect case), and Unregister takes the channel
token returned by Register and only acts on the matching channel.
Added regression test for the stale-Unregister race.

Codex#9: PostRefresh always returned 202 even when no active Sync
stream existed for the target peer (offline / between connections).
Response now includes "dispatched": bool so the caller can distinguish
"snapshot will arrive" from "no live stream — fall back to cache".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codex#3: flushDelta and flushFull marked events as lastPushed BEFORE
calling sink.Push and ignored its error. A failed push (mgmt
reconnect, transient gRPC error) silently lost state — the affected
peers stayed stale until their next state change or the 60 s heartbeat.
Now lastPushed is updated only AFTER a successful Push; on failure the
events stay dirty so the next computeDeltaFromSource will re-include
them.

Codex#4: connStatePusher.loop() emitted the initial flushFull
unconditionally on goroutine entry. The pusher is created in
engine.Start, BEFORE the first NetworkMap arrives and addNewPeers
populates the status recorder, so the initial snapshot was an empty
map. Real peers only surfaced to management on the next per-peer
state change or via the 60 s heartbeat. Loop now waits on initialReady
which the engine closes via TriggerInitialSnapshot() right after the
first FinishPeerListModifications().

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codex#5: UpdatePeerICEState and UpdatePeerRelayedState only invoked
notifyConnStateChange when ConnStatus or Relayed flipped. Endpoint
roams (NAT rebind, ICE candidate type change host->srflx) and
relay-server moves did not surface to management until the next 60 s
heartbeat tick, defeating the "endpoint flips immediate" UX promise.

Added hasMaterialICEChange / hasMaterialRelayChange helpers that
include local/remote endpoint, candidate type, and relay-server fields
in the change-detection. Both Update* paths now use them so any
material movement enqueues an immediate push.

Codex#6: added a server_liveness_known proto field on RemotePeerConfig
+ State.RemoteServerLivenessKnown. Phase-3.7i+ management servers set
it; the daemon's GetFullStatus counter trusts LiveOnline directly when
known instead of guessing from LastSeenAtServer-zero (the legacy
heuristic still applies when ServerLivenessKnown is false, so old
servers keep working).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase-3.7i+ management server now explicitly tells clients "I track
per-peer liveness authoritatively" via the new server_liveness_known
field on RemotePeerConfig. New clients (engine.go) plumb it through
RemoteMeta -> State.RemoteServerLivenessKnown so the GetFullStatus
counter can trust LiveOnline directly instead of falling back to the
LastSeenAtServer-zero heuristic.

Old servers leave the field at default (false); old clients ignore it.
Net effect: counter accuracy improves on the new-server / new-client
combination without breaking either backwards-compat path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codex#7: peerLatencyStr() called p.GetLatency().AsDuration() without a
nil check. The local daemon usually populates the field but a defensive
nil guard avoids a panic when a remote peer reports a status without
the latency field set.

Codex#8a: onICEConnected() paused wgProxyRelay BEFORE configuring the
new WG endpoint, opening a 1-2 s window where relay packets were
dropped while WG still pointed at the relay address. Reordered: bring
new ICE proxy up, configure WG endpoint, redirect leftover relay
traffic, THEN pause the relay. The redirect+pause swap covers the
remaining bytes that were in flight on the relay.

Codex#8b: onICEStateDisconnected() in the no-relay-fallback branch
called RemoveEndpointAddress, creating a black-hole window for any
brief ICE flap. Removed the active removal: a transient disconnect is
followed within 1-2 s by a fresh ICE-connected callback that re-points
WG; a permanent disconnect surfaces via WG's own keepalive timeout.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When the postgres pool is configured, GetAccount uses the pgx fast
path (getAccountPgx) instead of the GORM Take() path. The pgx SELECT
listed every settings_* column EXCEPT the four Phase-3.7i additions:
  - settings_connection_mode
  - settings_relay_timeout_seconds
  - settings_p2p_timeout_seconds
  - settings_p2p_retry_max_seconds

So account.Settings.ConnectionMode came back nil regardless of the
DB value, and toPeerConfig fell back to ResolveLegacyLazyBool —
clients only ever saw ModeP2P / ModeP2PLazy via the bool, never
relay-forced or p2p-dynamic.

Manifested as: dashboard saved p2p-dynamic, postgres stored it, but
the FWR daemon's first sync after deploy logged "lazy/dynamic
connection manager disabled by management push (mode=p2p)" and tore
down its already-correctly-initialized lazy manager.

The GORM path (getAccountGorm) loaded these columns automatically via
Take(&account); only the hand-written pgx SELECT was missing them.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…meout to 24h

Two unrelated fixes that surfaced during user verification:

1) Android UI showed 0 B / 0 B for every peer's transfer counters
   regardless of actual traffic. Root cause: BytesRx/BytesTx in peer
   State are only updated by RefreshWireGuardStats(), which the desktop
   daemon's Status RPC calls before serving CLI status output. The
   Android gomobile bridge's PeersList() snapshotted GetFullStatus()
   directly without that refresh, so the cached counters stayed at the
   value last written when the peer was opened/closed (typically 0).
   Now PeersList() calls RefreshWireGuardStats() first; transfer
   counters update on every UI poll (~30 s in the new peers tab).

2) DefaultInactivityThreshold (the relay-tunnel idle teardown
   fallback when neither client config nor server-pushed setting
   provides one) bumped 15 min -> 24 h. The 15-min default was
   aggressive: peers exchanging only NAT-keepalives got torn down
   every 15 min and had to re-handshake on the next packet. The new
   24 h default matches the dashboard placeholder change and what
   most users actually want for a mesh topology. The two p2p-lazy /
   p2p-dynamic test cases that hard-coded 20 min of past-activity
   were updated to 25 h to remain triggered.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… form

Fix B in 853d26c only handled the bare "dev-<sha>" / "ci-<sha>"
prefix. The Android build script (build-android-lib.sh) and the
infrastructure build wrappers (scripts/netbird-windows-build/,
build-and-deploy.sh) write a SEMVER-PADDED form like
"0.0.0-dev-1b923aad9" so version.NewVersion can parse it cleanly.

normalizeVersion() splits on "-" and takes index 0 = "0.0.0", which
is < the 0.45.0 minVersion, so IsSupported returned FALSE and every
peer running a custom dev build was eager-opened by the lazy manager.
This made p2p-dynamic ineffective on Bad Mitterndorf routers + S21 +
W11-test1 because they all run the dev build right now.

Now also accept any version containing "-dev-" or "-ci-". The bare-
prefix form is still accepted for backwards compatibility with older
build scripts. Random short-hash versions ("a6c5960") still get
rejected because they have no marker at all.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…Listeners helpers

Phase 3.7i UI-push helpers used by the peer-status-visibility code
above. Thin wrappers around the existing notifier.peerListChanged and
snapshotRouterPeersLocked + dispatchRouterPeers paths.
…onnections.Store + SnapshotRouter)

Phase 3.7i added the peer-connections Store and SnapshotRouter to
nbgrpc.NewServer's signature, but the test fixture in client/cmd
wasn't updated, so the package fails to compile against PR-C's
manangement code.
…xture

Phase 3.7i added peer_connections.Store and *peer_connections.SnapshotRouter
to nbgrpc.NewServer(). Production callers (BaseServer.PeerConnStore /
PeerConnRouter) always pass non-nil values, but several test fixtures
pass nil. Without nil-guards the Sync handler nil-derefs on
snapshotRouter.Register(peerKey).

- mgmt/grpc/server.go: NewServer now substitutes a fresh in-memory
  store + a fresh SnapshotRouter when either argument is nil. Cheap
  defaults; no behavioural change for production.
- client/server/server_test.go: pass non-nil store + router to the
  13-arg signature so the package compiles against the new signature
  on top of all this.
- mgmt/internals/server/boot.go: collapse PeerConnRouter() lambda to
  Create(s, peer_connections.NewSnapshotRouter) (unlambda staticcheck)
- client/internal/conn_mgr.go: //nolint:unused on
  addPeersToLazyConnManager (deliberately retained for the eventual
  snapshot-import path; documented inline)
- client/internal/peer/status.go: //nolint:unused on
  notifyPeerListChanged (defined here for the UpdatePeerRemoteMeta
  caller that lands in a follow-up commit on PR-D)
Fixes a panic on Engine.Stop teardown:

  Engine.Stop()
    e.connStatePusher.Stop()
    e.connStatePusher = nil          ← nilled here
    ...
    e.removeAllPeers()                ← still triggers
      Conn.Close
        UpdatePeerState               ← fires status listener
          listener-closure
            e.connStatePusher.        ← nil receiver
              OnPeerStateChange()     ← deref panic at p.events <- ev

Test_Engine_SSH on the netbirdio CI runners hit this reliably because
the runners can actually create the wg interface (sandbox cannot, so
the test fails earlier locally and never reaches the Stop path).

Stack: client/internal/conn_state_pusher.go:99 → engine.go:633 →
peer/status.go:300 (notify closure) → peer/conn.go:736 → conn.go:329
→ conn_mgr.go:454 → engine.go:864 → engine.go:852 → engine.go:391.

Defending in the pusher itself (vs. nil-checking each callsite) keeps
the engine cleanup path simple and matches the no-op-on-nil pattern
already used elsewhere in the package.
@MichaelUray MichaelUray force-pushed the pr/c-phase3.7i-visibility branch from fbd0df5 to c8fcd96 Compare May 6, 2026 16:16
@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented May 6, 2026

Quality Gate Failed Quality Gate failed

Failed conditions
24 New issues
24 New Code Smells (required ≤ 0)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants