[client] Fix exit node menu not refreshing on Windows#5553
Conversation
TrayOpenedCh is not implemented in the systray library on Windows, so exit nodes were never refreshed after the initial connect. Combined with the management sync not having populated routes yet when the Connected status fires, this caused the exit node menu to remain empty permanently after disconnect/reconnect cycles. Add a background poller on Windows that refreshes exit nodes while connected, with fast initial polling to catch routes from management sync followed by a steady 10s interval. On macOS/Linux, TrayOpenedCh continues to handle refreshes on each tray open. Also fix a data race on connectClient assignment in the server's connect() method and add nil checks in CleanState/DeleteState to prevent panics when connectClient is nil.
📝 WalkthroughWalkthroughAdds mutex-protected assignment for server connectClient, guards state cleanup against nil connectClient, adds tests for concurrent connectClient access and edge cases, and refactors exit-node refresh to use a cancelable OS-aware poller and boolean-returning update flow. Changes
Sequence Diagram(s)(omitted) Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
client/ui/network.go (1)
368-419:⚠️ Potential issue | 🟠 MajorCancellation doesn't stop the in-flight refresh from mutating the tray.
pollExitNodes()checksctx.Done()only between iterations. If disconnect cancels the poller whileupdateExitNodes()is already fetching routes, that call still recreates the menu and can re-enable stale exit nodes aftercancelExitNodeRetry()ran. Pass the polling context through toupdateExitNodes()and bail again before touching the menu.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@client/ui/network.go` around lines 368 - 419, Pass the polling context into updateExitNodes and abort menu mutation if the context is cancelled: change pollExitNodes to call s.updateExitNodes(ctx) and updateExitNodes to accept ctx context.Context; after long-running steps (after getSrvClient/getExitNodes return) check ctx.Done()/ctx.Err() and if cancelled return false without touching s.recreateExitNodeMenu or changing mExitNode/mExitNodeItems; keep the s.exitNodeMu.Lock/Unlock around only the menu mutation and ensure you bail before acquiring the lock when ctx is cancelled.
🧹 Nitpick comments (1)
client/server/server_connect_test.go (1)
28-88: These tests won't fail ifServer.connect()regresses.Both tests reimplement the mutex assignment inline instead of exercising
connect(), andTestConcurrentConnectClientAccessonly proves that 50 goroutines returned. An unlocked write in production would still leave this file green. Please drive the real publish path or extract it behind a small helper shared by prod and tests.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@client/server/server_connect_test.go` around lines 28 - 88, Both tests bypass Server.connect() and directly mutate s.connectClient, so regressions in connect() won't be caught; update tests to exercise the real publish path by calling s.connect(ctx) (or extract a small exported helper like Server.setConnectClientLocked(client) used by both connect() and tests) instead of assigning s.connectClient directly; change TestConnectSetsClientWithMutex and TestConcurrentConnectClientAccess to invoke connect() (or the new helper) with newDummyConnectClient/newTestServer so the mutex behavior in connect() is actually exercised and concurrent readers observe the real write path.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@client/server/server_connect_test.go`:
- Around line 131-179: The test TestDownThenUp_StaleRunningChan codifies the
stale-channel bug by expecting waitForUp() to succeed while clientRunningChan is
already closed and connectClient is nil; either mark this reproducer as
skipped/TODO (use t.Skip or add a TODO comment) or change the assertions to
verify the desired reconnect behavior: after calling cleanupConnection() and
launching waitForUp(), assert that waitForUp() only returns success when
s.connectClient is non-nil (or returns an error if appropriate), and remove the
current assertion that treats success with nil connectClient as correct; update
references to waitForUp(), clientRunningChan, connectClient,
cleanupConnection(), and TestDownThenUp_StaleRunningChan accordingly.
In `@client/server/state.go`:
- Line 42: The guard reads s.connectClient unsafely and calls Status() multiple
times; fix by taking a local snapshot of s.connectClient under s.mutex (the same
mutex used in connect()), then release the mutex and call client.Status() only
once to decide the guard. Apply this change in the checks inside CleanState and
DeleteState (where the current if uses s.connectClient and Status()), ensuring
you reference the same s.mutex and s.connectClient field and replace the
duplicated Status() calls with a single status check on the snapped local
variable.
---
Outside diff comments:
In `@client/ui/network.go`:
- Around line 368-419: Pass the polling context into updateExitNodes and abort
menu mutation if the context is cancelled: change pollExitNodes to call
s.updateExitNodes(ctx) and updateExitNodes to accept ctx context.Context; after
long-running steps (after getSrvClient/getExitNodes return) check
ctx.Done()/ctx.Err() and if cancelled return false without touching
s.recreateExitNodeMenu or changing mExitNode/mExitNodeItems; keep the
s.exitNodeMu.Lock/Unlock around only the menu mutation and ensure you bail
before acquiring the lock when ctx is cancelled.
---
Nitpick comments:
In `@client/server/server_connect_test.go`:
- Around line 28-88: Both tests bypass Server.connect() and directly mutate
s.connectClient, so regressions in connect() won't be caught; update tests to
exercise the real publish path by calling s.connect(ctx) (or extract a small
exported helper like Server.setConnectClientLocked(client) used by both
connect() and tests) instead of assigning s.connectClient directly; change
TestConnectSetsClientWithMutex and TestConcurrentConnectClientAccess to invoke
connect() (or the new helper) with newDummyConnectClient/newTestServer so the
mutex behavior in connect() is actually exercised and concurrent readers observe
the real write path.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: d72faf2c-45a8-4255-a334-c7aa11aaf8c1
📒 Files selected for processing (6)
client/server/server.goclient/server/server_connect_test.goclient/server/state.goclient/ui/client_ui.goclient/ui/event_handler.goclient/ui/network.go
|
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
client/ui/network.go (1)
391-415:⚠️ Potential issue | 🟠 MajorRefresh failures still leave stale exit-node actions in the tray.
Both early
return falsepaths exit before clearing or disabling the menu. With the new cancel-only disconnect flow, a failed refresh can leave old exit-node items visible and clickable even though the daemon state is gone or changed.🧹 Proposed fix
+func (s *serviceClient) clearExitNodeMenu() { + s.exitNodeMu.Lock() + defer s.exitNodeMu.Unlock() + s.recreateExitNodeMenu(nil) + s.mExitNode.Disable() +} + func (s *serviceClient) updateExitNodes() bool { conn, err := s.getSrvClient(defaultFailTimeout) if err != nil { log.Errorf("get client: %v", err) + s.clearExitNodeMenu() return false } exitNodes, err := s.getExitNodes(conn) if err != nil { log.Errorf("get exit nodes: %v", err) + s.clearExitNodeMenu() return false }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@client/ui/network.go` around lines 391 - 415, The early returns in updateExitNodes (when getSrvClient or getExitNodes fail) skip clearing or disabling the exit-node menu, leaving stale clickable items; acquire s.exitNodeMu before those error checks (or ensure cleanup runs on error) and call s.recreateExitNodeMenu(nil) then disable s.mExitNode and clear s.mExitNodeItems before returning false so the tray is always cleared when updateExitNodes fails; keep references to getSrvClient, getExitNodes, s.exitNodeMu, s.recreateExitNodeMenu, s.mExitNode and s.mExitNodeItems to locate and update the function.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@client/ui/network.go`:
- Around line 336-345: The Windows poller swap is not atomic because
cancelExitNodeRetry() unlocks exitNodeMu before startExitNodeRefresh stores the
new cancel func, allowing a disconnect to observe a nil cancel and let the new
goroutine survive; fix by performing the swap under exitNodeMu: change
startExitNodeRefresh to create the ctx/cancel, then lock exitNodeMu and
atomically replace exitNodeRetryCancel (calling the old cancel if non-nil)
before unlocking, and only after that launch the goroutine with the new ctx when
calling pollExitNodes; ensure cancelExitNodeRetry still uses exitNodeMu
consistently so there is no gap between clearing and setting
exitNodeRetryCancel.
---
Outside diff comments:
In `@client/ui/network.go`:
- Around line 391-415: The early returns in updateExitNodes (when getSrvClient
or getExitNodes fail) skip clearing or disabling the exit-node menu, leaving
stale clickable items; acquire s.exitNodeMu before those error checks (or ensure
cleanup runs on error) and call s.recreateExitNodeMenu(nil) then disable
s.mExitNode and clear s.mExitNodeItems before returning false so the tray is
always cleared when updateExitNodes fails; keep references to getSrvClient,
getExitNodes, s.exitNodeMu, s.recreateExitNodeMenu, s.mExitNode and
s.mExitNodeItems to locate and update the function.



Describe your changes
TrayOpenedCh is not implemented in the systray library on Windows, so exit nodes were never refreshed after the initial connect. Combined with the management sync not having populated routes yet when the Connected status fires, this caused the exit node menu to remain empty permanently after disconnect/reconnect cycles.
Add a background poller on Windows that refreshes exit nodes while connected, with fast initial polling to catch routes from management sync followed by a steady 10s interval. On macOS/Linux, TrayOpenedCh continues to handle refreshes on each tray open.
Also fix a data race on connectClient assignment in the server's connect() method and add nil checks in CleanState/DeleteState to prevent panics when connectClient is nil.
Issue ticket number and link
Stack
Checklist
Documentation
Select exactly one:
Docs PR URL (required if "docs added" is checked)
Paste the PR link from https://github.com/netbirdio/docs here:
https://github.com/netbirdio/docs/pull/__
Summary by CodeRabbit
Bug Fixes
New Features
Tests