Skip to content

[client] Fix exit node menu not refreshing on Windows#5553

Merged
mlsmaycon merged 3 commits intomainfrom
fix/exit-menu-refresh-windows
Mar 9, 2026
Merged

[client] Fix exit node menu not refreshing on Windows#5553
mlsmaycon merged 3 commits intomainfrom
fix/exit-menu-refresh-windows

Conversation

@mlsmaycon
Copy link
Copy Markdown
Collaborator

@mlsmaycon mlsmaycon commented Mar 9, 2026

Describe your changes

TrayOpenedCh is not implemented in the systray library on Windows, so exit nodes were never refreshed after the initial connect. Combined with the management sync not having populated routes yet when the Connected status fires, this caused the exit node menu to remain empty permanently after disconnect/reconnect cycles.

Add a background poller on Windows that refreshes exit nodes while connected, with fast initial polling to catch routes from management sync followed by a steady 10s interval. On macOS/Linux, TrayOpenedCh continues to handle refreshes on each tray open.

Also fix a data race on connectClient assignment in the server's connect() method and add nil checks in CleanState/DeleteState to prevent panics when connectClient is nil.

Issue ticket number and link

Stack

Checklist

  • Is it a bug fix
  • Is a typo/documentation fix
  • Is a feature enhancement
  • It is a refactor
  • Created tests that fail without the change (if possible)

By submitting this pull request, you confirm that you have read and agree to the terms of the Contributor License Agreement.

Documentation

Select exactly one:

  • I added/updated documentation for this change
  • Documentation is not needed for this change (explain why)

Docs PR URL (required if "docs added" is checked)

Paste the PR link from https://github.com/netbirdio/docs here:

https://github.com/netbirdio/docs/pull/__

Summary by CodeRabbit

  • Bug Fixes

    • Prevented nil-pointer issues by guarding connection state checks
    • Ensured cleanup clears connection state safely and cancels exit-node retries on disconnect
    • Improved concurrent access safety for connection client
  • New Features

    • OS-aware exit-node refresh with optimized polling and more reliable menu refresh
    • Safer connection initialization and assignment flow
  • Tests

    • Added tests covering connection synchronization, concurrent access, nil/edge cases, and exit-node refresh behavior

TrayOpenedCh is not implemented in the systray library on Windows,
so exit nodes were never refreshed after the initial connect. Combined
with the management sync not having populated routes yet when the
Connected status fires, this caused the exit node menu to remain empty
permanently after disconnect/reconnect cycles.
Add a background poller on Windows that refreshes exit nodes while
connected, with fast initial polling to catch routes from management
sync followed by a steady 10s interval. On macOS/Linux, TrayOpenedCh
continues to handle refreshes on each tray open.
Also fix a data race on connectClient assignment in the server's connect()
method and add nil checks in CleanState/DeleteState to prevent panics
when connectClient is nil.
@mlsmaycon mlsmaycon requested a review from lixmal March 9, 2026 15:39
lixmal
lixmal previously approved these changes Mar 9, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 9, 2026

📝 Walkthrough

Walkthrough

Adds mutex-protected assignment for server connectClient, guards state cleanup against nil connectClient, adds tests for concurrent connectClient access and edge cases, and refactors exit-node refresh to use a cancelable OS-aware poller and boolean-returning update flow.

Changes

Cohort / File(s) Summary
Server connection & tests
client/server/server.go, client/server/state.go, client/server/server_connect_test.go
Assigns newly created ConnectClient to a local variable, sets persistence, then mutex-protects assignment to s.connectClient; switches to running the local client. Adds nil-checks in CleanState/DeleteState. Introduces tests covering mutex protection, concurrent access, cleanup, nil handling, and stale running-channel scenarios.
Exit-node refresh refactor (UI)
client/ui/client_ui.go, client/ui/event_handler.go, client/ui/network.go
Replaces exitNodeStates slice with exitNodeRetryCancel cancel func; adds startExitNodeRefresh, cancelExitNodeRetry, and pollExitNodes; changes updateExitNodes() to return bool; implements OS-specific polling (Windows background poller vs. direct update), removes state-diff logic, and updates disconnect handling to cancel ongoing retries.

Sequence Diagram(s)

(omitted)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested reviewers

  • pappz
  • braginini

Poem

🐰 I tunneled through mutexes, neat and swift,
Assigned with care, no racing drift.
Exit nodes now poll with a thoughtful beat,
Cancel when we part, and rest when we meet.
Hop, hop—synchrony complete! 🥕

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 77.78% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ❓ Inconclusive The description covers the bug, root cause, solution approach, and additional fixes, with the bug fix checkbox marked; however, no issue ticket link is provided as required by the template. Add the issue ticket number and link in the 'Issue ticket number and link' section of the description.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely describes the main bugfix: exit node menu not refreshing on Windows, which is the primary issue addressed in the changeset.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/exit-menu-refresh-windows

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
client/ui/network.go (1)

368-419: ⚠️ Potential issue | 🟠 Major

Cancellation doesn't stop the in-flight refresh from mutating the tray.

pollExitNodes() checks ctx.Done() only between iterations. If disconnect cancels the poller while updateExitNodes() is already fetching routes, that call still recreates the menu and can re-enable stale exit nodes after cancelExitNodeRetry() ran. Pass the polling context through to updateExitNodes() and bail again before touching the menu.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/ui/network.go` around lines 368 - 419, Pass the polling context into
updateExitNodes and abort menu mutation if the context is cancelled: change
pollExitNodes to call s.updateExitNodes(ctx) and updateExitNodes to accept ctx
context.Context; after long-running steps (after getSrvClient/getExitNodes
return) check ctx.Done()/ctx.Err() and if cancelled return false without
touching s.recreateExitNodeMenu or changing mExitNode/mExitNodeItems; keep the
s.exitNodeMu.Lock/Unlock around only the menu mutation and ensure you bail
before acquiring the lock when ctx is cancelled.
🧹 Nitpick comments (1)
client/server/server_connect_test.go (1)

28-88: These tests won't fail if Server.connect() regresses.

Both tests reimplement the mutex assignment inline instead of exercising connect(), and TestConcurrentConnectClientAccess only proves that 50 goroutines returned. An unlocked write in production would still leave this file green. Please drive the real publish path or extract it behind a small helper shared by prod and tests.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/server/server_connect_test.go` around lines 28 - 88, Both tests bypass
Server.connect() and directly mutate s.connectClient, so regressions in
connect() won't be caught; update tests to exercise the real publish path by
calling s.connect(ctx) (or extract a small exported helper like
Server.setConnectClientLocked(client) used by both connect() and tests) instead
of assigning s.connectClient directly; change TestConnectSetsClientWithMutex and
TestConcurrentConnectClientAccess to invoke connect() (or the new helper) with
newDummyConnectClient/newTestServer so the mutex behavior in connect() is
actually exercised and concurrent readers observe the real write path.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@client/server/server_connect_test.go`:
- Around line 131-179: The test TestDownThenUp_StaleRunningChan codifies the
stale-channel bug by expecting waitForUp() to succeed while clientRunningChan is
already closed and connectClient is nil; either mark this reproducer as
skipped/TODO (use t.Skip or add a TODO comment) or change the assertions to
verify the desired reconnect behavior: after calling cleanupConnection() and
launching waitForUp(), assert that waitForUp() only returns success when
s.connectClient is non-nil (or returns an error if appropriate), and remove the
current assertion that treats success with nil connectClient as correct; update
references to waitForUp(), clientRunningChan, connectClient,
cleanupConnection(), and TestDownThenUp_StaleRunningChan accordingly.

In `@client/server/state.go`:
- Line 42: The guard reads s.connectClient unsafely and calls Status() multiple
times; fix by taking a local snapshot of s.connectClient under s.mutex (the same
mutex used in connect()), then release the mutex and call client.Status() only
once to decide the guard. Apply this change in the checks inside CleanState and
DeleteState (where the current if uses s.connectClient and Status()), ensuring
you reference the same s.mutex and s.connectClient field and replace the
duplicated Status() calls with a single status check on the snapped local
variable.

---

Outside diff comments:
In `@client/ui/network.go`:
- Around line 368-419: Pass the polling context into updateExitNodes and abort
menu mutation if the context is cancelled: change pollExitNodes to call
s.updateExitNodes(ctx) and updateExitNodes to accept ctx context.Context; after
long-running steps (after getSrvClient/getExitNodes return) check
ctx.Done()/ctx.Err() and if cancelled return false without touching
s.recreateExitNodeMenu or changing mExitNode/mExitNodeItems; keep the
s.exitNodeMu.Lock/Unlock around only the menu mutation and ensure you bail
before acquiring the lock when ctx is cancelled.

---

Nitpick comments:
In `@client/server/server_connect_test.go`:
- Around line 28-88: Both tests bypass Server.connect() and directly mutate
s.connectClient, so regressions in connect() won't be caught; update tests to
exercise the real publish path by calling s.connect(ctx) (or extract a small
exported helper like Server.setConnectClientLocked(client) used by both
connect() and tests) instead of assigning s.connectClient directly; change
TestConnectSetsClientWithMutex and TestConcurrentConnectClientAccess to invoke
connect() (or the new helper) with newDummyConnectClient/newTestServer so the
mutex behavior in connect() is actually exercised and concurrent readers observe
the real write path.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d72faf2c-45a8-4255-a334-c7aa11aaf8c1

📥 Commits

Reviewing files that changed from the base of the PR and between 11eb725 and d7f357f.

📒 Files selected for processing (6)
  • client/server/server.go
  • client/server/server_connect_test.go
  • client/server/state.go
  • client/ui/client_ui.go
  • client/ui/event_handler.go
  • client/ui/network.go

Comment thread client/server/server_connect_test.go
Comment thread client/server/state.go
@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented Mar 9, 2026

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
client/ui/network.go (1)

391-415: ⚠️ Potential issue | 🟠 Major

Refresh failures still leave stale exit-node actions in the tray.

Both early return false paths exit before clearing or disabling the menu. With the new cancel-only disconnect flow, a failed refresh can leave old exit-node items visible and clickable even though the daemon state is gone or changed.

🧹 Proposed fix
+func (s *serviceClient) clearExitNodeMenu() {
+	s.exitNodeMu.Lock()
+	defer s.exitNodeMu.Unlock()
+	s.recreateExitNodeMenu(nil)
+	s.mExitNode.Disable()
+}
+
 func (s *serviceClient) updateExitNodes() bool {
 	conn, err := s.getSrvClient(defaultFailTimeout)
 	if err != nil {
 		log.Errorf("get client: %v", err)
+		s.clearExitNodeMenu()
 		return false
 	}
 	exitNodes, err := s.getExitNodes(conn)
 	if err != nil {
 		log.Errorf("get exit nodes: %v", err)
+		s.clearExitNodeMenu()
 		return false
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/ui/network.go` around lines 391 - 415, The early returns in
updateExitNodes (when getSrvClient or getExitNodes fail) skip clearing or
disabling the exit-node menu, leaving stale clickable items; acquire
s.exitNodeMu before those error checks (or ensure cleanup runs on error) and
call s.recreateExitNodeMenu(nil) then disable s.mExitNode and clear
s.mExitNodeItems before returning false so the tray is always cleared when
updateExitNodes fails; keep references to getSrvClient, getExitNodes,
s.exitNodeMu, s.recreateExitNodeMenu, s.mExitNode and s.mExitNodeItems to locate
and update the function.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@client/ui/network.go`:
- Around line 336-345: The Windows poller swap is not atomic because
cancelExitNodeRetry() unlocks exitNodeMu before startExitNodeRefresh stores the
new cancel func, allowing a disconnect to observe a nil cancel and let the new
goroutine survive; fix by performing the swap under exitNodeMu: change
startExitNodeRefresh to create the ctx/cancel, then lock exitNodeMu and
atomically replace exitNodeRetryCancel (calling the old cancel if non-nil)
before unlocking, and only after that launch the goroutine with the new ctx when
calling pollExitNodes; ensure cancelExitNodeRetry still uses exitNodeMu
consistently so there is no gap between clearing and setting
exitNodeRetryCancel.

---

Outside diff comments:
In `@client/ui/network.go`:
- Around line 391-415: The early returns in updateExitNodes (when getSrvClient
or getExitNodes fail) skip clearing or disabling the exit-node menu, leaving
stale clickable items; acquire s.exitNodeMu before those error checks (or ensure
cleanup runs on error) and call s.recreateExitNodeMenu(nil) then disable
s.mExitNode and clear s.mExitNodeItems before returning false so the tray is
always cleared when updateExitNodes fails; keep references to getSrvClient,
getExitNodes, s.exitNodeMu, s.recreateExitNodeMenu, s.mExitNode and
s.mExitNodeItems to locate and update the function.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0f08ff84-6a76-40af-9e17-171ab3a61c80

📥 Commits

Reviewing files that changed from the base of the PR and between d7f357f and bd1d153.

📒 Files selected for processing (1)
  • client/ui/network.go

Comment thread client/ui/network.go
@mlsmaycon mlsmaycon merged commit 15aa6ba into main Mar 9, 2026
57 of 58 checks passed
@mlsmaycon mlsmaycon deleted the fix/exit-menu-refresh-windows branch March 9, 2026 17:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants