[client] scope macOS DNS state keys by WireGuard interface name#5504
[client] scope macOS DNS state keys by WireGuard interface name#5504siriobalmelli wants to merge 2 commits intonetbirdio:mainfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughScopes macOS DNS state to a specific network interface by adding Changes
Sequence Diagram(s)(omitted) Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
a664a05 to
4b97324
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (2)
client/internal/dns/host_darwin_test.go (1)
620-654: Strengthen multi-instance isolation assertions.This test currently validates key separation and domain counts, but not exact per-instance payload integrity. Using distinct domain sets and asserting full read-back equality would better catch cross-instance leakage.
♻️ Suggested test hardening
@@ - domains1 := generateShortDomains(60) // forces 2 batches - domains2 := generateShortDomains(60) + domains1 := generateShortDomains(60) // forces 2 batches + domains2 := generateLongDomains(60) // distinct payload for stronger isolation checks @@ - var got1, got2 []string - for key := range cfg1.createdKeys { + var got1, got2 []string + keys1 := make([]string, 0, len(cfg1.createdKeys)) + for key := range cfg1.createdKeys { + keys1 = append(keys1, key) + } + sort.Strings(keys1) + for _, key := range keys1 { got1 = append(got1, readDomainsFromKey(t, key)...) } - for key := range cfg2.createdKeys { + keys2 := make([]string, 0, len(cfg2.createdKeys)) + for key := range cfg2.createdKeys { + keys2 = append(keys2, key) + } + sort.Strings(keys2) + for _, key := range keys2 { got2 = append(got2, readDomainsFromKey(t, key)...) } - assert.Equal(t, len(domains1), len(got1), "all domains from instance 1 should be readable") - assert.Equal(t, len(domains2), len(got2), "all domains from instance 2 should be readable") + assert.Equal(t, domains1, got1, "instance 1 should read back only its own domains") + assert.Equal(t, domains2, got2, "instance 2 should read back only its own domains")@@ import ( @@ + "sort" )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@client/internal/dns/host_darwin_test.go` around lines 620 - 654, The test should not only check counts and key separation but verify payload integrity per instance: for each instance use cfg1.createdKeys and cfg2.createdKeys with readDomainsFromKey to collect all domains, normalize into sets, and assert that the set read from cfg1 equals the original domains1 set and the set read from cfg2 equals domains2; additionally assert that the intersection of those two read sets is empty to catch any cross-instance leakage (use addBatchedDomains, createdKeys, readDomainsFromKey, and the domains1/domains2 variables to locate the logic).client/internal/dns/host_darwin.go (1)
199-207: Indexed key discovery assumes contiguous indices.The
for i := 0; ; i++ { ... break on first miss }pattern can miss higher stale keys when there is an index gap. Consider scanning all returned DNS keys for matching indexed prefixes instead of stopping at the first missing slot.Also applies to: 218-227
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@client/internal/dns/host_darwin.go` around lines 199 - 207, The current loop that uses for i := 0; ; i++ with key := fmt.Sprintf(netbirdDNSStateKeyIndexedFormat, s.interfaceName, suffix, i) stops at the first missing index and can miss higher-numbered stale keys; change the discovery to scan the full dnsKeys string/list and collect keys whose names match the indexed prefix for each suffix instead of breaking on the first miss. Concretely, for each suffix (searchSuffix and matchSuffix) build the static prefix using s.interfaceName and suffix (based on netbirdDNSStateKeyIndexedFormat) and iterate over dnsKeys, appending any entry that has that prefix and the expected index pattern (or use strings.HasPrefix + index parsing/regex) to keys; apply the same replacement to the similar block around lines 218-227.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@client/internal/dns/host_darwin_test.go`:
- Around line 620-654: The test should not only check counts and key separation
but verify payload integrity per instance: for each instance use
cfg1.createdKeys and cfg2.createdKeys with readDomainsFromKey to collect all
domains, normalize into sets, and assert that the set read from cfg1 equals the
original domains1 set and the set read from cfg2 equals domains2; additionally
assert that the intersection of those two read sets is empty to catch any
cross-instance leakage (use addBatchedDomains, createdKeys, readDomainsFromKey,
and the domains1/domains2 variables to locate the logic).
In `@client/internal/dns/host_darwin.go`:
- Around line 199-207: The current loop that uses for i := 0; ; i++ with key :=
fmt.Sprintf(netbirdDNSStateKeyIndexedFormat, s.interfaceName, suffix, i) stops
at the first missing index and can miss higher-numbered stale keys; change the
discovery to scan the full dnsKeys string/list and collect keys whose names
match the indexed prefix for each suffix instead of breaking on the first miss.
Concretely, for each suffix (searchSuffix and matchSuffix) build the static
prefix using s.interfaceName and suffix (based on
netbirdDNSStateKeyIndexedFormat) and iterate over dnsKeys, appending any entry
that has that prefix and the expected index pattern (or use strings.HasPrefix +
index parsing/regex) to keys; apply the same replacement to the similar block
around lines 218-227.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 761f0703-bb96-4cee-9b42-018f6a237f23
📒 Files selected for processing (4)
client/internal/dns/host_darwin.goclient/internal/dns/host_darwin_test.goclient/internal/dns/server_darwin.goclient/internal/dns/unclean_shutdown_darwin.go
There was a problem hiding this comment.
🧹 Nitpick comments (2)
client/internal/dns/host_darwin_test.go (1)
548-553: Consider adding a negative test case for empty interface name.The test validates the happy path, but
newHostManageralso returns an error wheninterfaceNameis empty. Consider adding a test case to verify the error handling.📝 Optional test addition
func TestNewHostManagerWithEmptyInterfaceName(t *testing.T) { manager, err := newHostManager("") require.Error(t, err) assert.Nil(t, manager) assert.Contains(t, err.Error(), "interfaceName must not be empty") }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@client/internal/dns/host_darwin_test.go` around lines 548 - 553, Add a negative test in host_darwin_test.go that calls newHostManager with an empty string and asserts it returns an error and a nil manager; specifically, create a TestNewHostManagerWithEmptyInterfaceName that calls newHostManager("") and uses require.Error to check an error was returned, assert.Nil on the manager, and assert.Contains on err.Error() to check the message mentions the empty interfaceName (e.g., "interfaceName must not be empty"), so the error-handling branch of newHostManager is covered.client/internal/dns/unclean_shutdown_darwin.go (1)
20-26: Clarify comment about legacy key cleanup path.When
InterfaceNameis empty, returningnilcauses the state manager to delete this state entry (percleanupSingleStatebehavior instatemanager/manager.go). The comment says "legacy keys will be cleaned up on next successful start," which is accurate sincediscoverExistingKeys()independently discovers and removes legacy-format keys.Consider making the comment more explicit about the cleanup mechanism:
📝 Suggested comment clarification
func (s *ShutdownState) Cleanup() error { if s.InterfaceName == "" { - // State file from an older version without interface name — skip cleanup - // rather than fail; legacy keys will be cleaned up on next successful start. + // State file from an older version without interface name — skip cleanup. + // Return nil to allow state deletion; legacy keys will be discovered and + // removed by discoverExistingKeys() on next successful start. log.Warn("dns shutdown state has no interface name, skipping cleanup of legacy scutil keys") return nil }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@client/internal/dns/unclean_shutdown_darwin.go` around lines 20 - 26, The comment in ShutdownState.Cleanup is ambiguous about how legacy keys are removed; update it to explicitly state that returning nil causes the state manager (cleanupSingleState in statemanager/manager.go) to delete this state entry and that legacy-format keys are handled separately by discoverExistingKeys() on next successful start, so this code path intentionally skips cleanup when InterfaceName == "" to rely on that discovery-based removal. Reference ShutdownState.Cleanup and the discoverExistingKeys() behavior in your comment.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@client/internal/dns/host_darwin_test.go`:
- Around line 548-553: Add a negative test in host_darwin_test.go that calls
newHostManager with an empty string and asserts it returns an error and a nil
manager; specifically, create a TestNewHostManagerWithEmptyInterfaceName that
calls newHostManager("") and uses require.Error to check an error was returned,
assert.Nil on the manager, and assert.Contains on err.Error() to check the
message mentions the empty interfaceName (e.g., "interfaceName must not be
empty"), so the error-handling branch of newHostManager is covered.
In `@client/internal/dns/unclean_shutdown_darwin.go`:
- Around line 20-26: The comment in ShutdownState.Cleanup is ambiguous about how
legacy keys are removed; update it to explicitly state that returning nil causes
the state manager (cleanupSingleState in statemanager/manager.go) to delete this
state entry and that legacy-format keys are handled separately by
discoverExistingKeys() on next successful start, so this code path intentionally
skips cleanup when InterfaceName == "" to rely on that discovery-based removal.
Reference ShutdownState.Cleanup and the discoverExistingKeys() behavior in your
comment.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: f9b0ba2e-92b2-4b96-adf9-868791f42318
📒 Files selected for processing (4)
client/internal/dns/host_darwin.goclient/internal/dns/host_darwin_test.goclient/internal/dns/server_darwin.goclient/internal/dns/unclean_shutdown_darwin.go
|
Thanks for the review @coderabbitai! Addressed nitpicks #3 (negative test for empty interface name) and #4 (clarified comment) in the latest commit. On the two remaining nitpicks: Nitpick #1 — Strengthen multi-instance isolation assertions Skipping intentionally. Nitpick #2 — Indexed key discovery assumes contiguous indices Skipping intentionally. The contiguous-index assumption is correct by construction: |
|
Tip For best results, initiate chat on the files or code changes.
Nitpicks Nitpick Nitpick |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
client/internal/dns/host_darwin_test.go (1)
627-660: Strengthen isolation validation with distinct domain sets and content checks.This case currently uses the same domain list for both instances and only checks counts, so domain-content cross-over could still pass. Use different domain datasets per interface and assert each instance reads back its own expected domains.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@client/internal/dns/host_darwin_test.go` around lines 627 - 660, The test currently uses identical domain lists and only checks counts; change it to use distinct domain sets (domains1 vs domains2 already present) and replace the length assertions with content assertions: after building got1 and got2 via readDomainsFromKey, assert that got1 contains exactly the elements of domains1 and got2 contains exactly the elements of domains2 (use equality or set comparison), and additionally assert that none of domains1 appear in got2 and none of domains2 appear in got1 to ensure no cross-over; keep references to cfg1.addBatchedDomains, cfg2.addBatchedDomains, cfg1.createdKeys, cfg2.createdKeys and readDomainsFromKey when locating the code to modify.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@client/internal/dns/host_darwin_test.go`:
- Around line 627-660: The test currently uses identical domain lists and only
checks counts; change it to use distinct domain sets (domains1 vs domains2
already present) and replace the length assertions with content assertions:
after building got1 and got2 via readDomainsFromKey, assert that got1 contains
exactly the elements of domains1 and got2 contains exactly the elements of
domains2 (use equality or set comparison), and additionally assert that none of
domains1 appear in got2 and none of domains2 appear in got1 to ensure no
cross-over; keep references to cfg1.addBatchedDomains, cfg2.addBatchedDomains,
cfg1.createdKeys, cfg2.createdKeys and readDomainsFromKey when locating the code
to modify.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: a3c65aa9-4802-4d59-9a44-f4e4100742df
📒 Files selected for processing (2)
client/internal/dns/host_darwin_test.goclient/internal/dns/unclean_shutdown_darwin.go
b23e076 to
fc4097a
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
client/internal/dns/host_darwin_test.go (1)
24-102: Add a legacy-state cleanup regression here.
TestDarwinDNSUncleanShutdownCleanupnow exercises only the new state shape. Please add a case that loads a pre-changedns_statepayload withoutInterfaceName(and ideally with the oldCreatedKeysfield name), since that upgrade path is the critical compatibility promise in this PR.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@client/internal/dns/host_darwin_test.go` around lines 24 - 102, Add a legacy-state regression case to TestDarwinDNSUncleanShutdownCleanup that writes a pre-change dns_state JSON payload (omitting InterfaceName and using the old CreatedKeys field name) into the same stateFile before calling statemanager.New(...) / sm2.LoadState(...); then retrieve the loaded ShutdownState and call shutdownState.Cleanup() and assert created keys are removed using checkDNSKeyExists and cleanup with removeTestDNSKey. Ensure the test constructs the legacy JSON with the same key names the old version used (e.g., "CreatedKeys" array and no "InterfaceName"), writes it to stateFile, and then reuses the existing assertions around ShutdownState.Cleanup to verify backward compatibility.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@client/internal/dns/host_darwin_test.go`:
- Around line 24-102: Add a legacy-state regression case to
TestDarwinDNSUncleanShutdownCleanup that writes a pre-change dns_state JSON
payload (omitting InterfaceName and using the old CreatedKeys field name) into
the same stateFile before calling statemanager.New(...) / sm2.LoadState(...);
then retrieve the loaded ShutdownState and call shutdownState.Cleanup() and
assert created keys are removed using checkDNSKeyExists and cleanup with
removeTestDNSKey. Ensure the test constructs the legacy JSON with the same key
names the old version used (e.g., "CreatedKeys" array and no "InterfaceName"),
writes it to stateFile, and then reuses the existing assertions around
ShutdownState.Cleanup to verify backward compatibility.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 3dd0d1c9-cc89-4967-a75e-d4c2e2a74f0e
📒 Files selected for processing (4)
client/internal/dns/host_darwin.goclient/internal/dns/host_darwin_test.goclient/internal/dns/server_darwin.goclient/internal/dns/unclean_shutdown_darwin.go
🚧 Files skipped from review as they are similar to previous changes (1)
- client/internal/dns/server_darwin.go
fc4097a to
d07f09b
Compare
|
Addressed the nitpick from the review as well: added |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@client/internal/dns/host_darwin.go`:
- Around line 75-79: The call to
stateManager.UpdateState(&ShutdownState{InterfaceName: s.interfaceName}) must
not be ignored on failure; if UpdateState returns an error you must stop further
execution so the InterfaceName is persisted before creating scoped keys. Modify
the surrounding function (the one invoking UpdateState in host_darwin.go) to
propagate or handle the error immediately (e.g., return the error or abort
initialization) instead of logging and continuing; reference
stateManager.UpdateState, ShutdownState, s.interfaceName and ensure Cleanup will
see a persisted InterfaceName on restart.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: d61f5b7d-cc79-4928-b4d2-eeb09e82358a
📒 Files selected for processing (4)
client/internal/dns/host_darwin.goclient/internal/dns/host_darwin_test.goclient/internal/dns/server_darwin.goclient/internal/dns/unclean_shutdown_darwin.go
| if err := stateManager.UpdateState(&ShutdownState{ | ||
| InterfaceName: s.interfaceName, | ||
| }); err != nil { | ||
| log.Errorf("failed to update shutdown state: %s", err) | ||
| } |
There was a problem hiding this comment.
Don't continue past a failed interface checkpoint.
This is the only state write that happens before we create any scoped NetBird-<iface>-* keys. If it fails and we keep going, an unclean restart can end up with no persisted InterfaceName, and Cleanup() then falls back to legacy non-scoped discovery only, leaving the new scoped keys behind.
🩹 Proposed fix
if err := stateManager.UpdateState(&ShutdownState{
InterfaceName: s.interfaceName,
}); err != nil {
- log.Errorf("failed to update shutdown state: %s", err)
+ return fmt.Errorf("update dns shutdown state: %w", err)
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if err := stateManager.UpdateState(&ShutdownState{ | |
| InterfaceName: s.interfaceName, | |
| }); err != nil { | |
| log.Errorf("failed to update shutdown state: %s", err) | |
| } | |
| if err := stateManager.UpdateState(&ShutdownState{ | |
| InterfaceName: s.interfaceName, | |
| }); err != nil { | |
| return fmt.Errorf("update dns shutdown state: %w", err) | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@client/internal/dns/host_darwin.go` around lines 75 - 79, The call to
stateManager.UpdateState(&ShutdownState{InterfaceName: s.interfaceName}) must
not be ignored on failure; if UpdateState returns an error you must stop further
execution so the InterfaceName is persisted before creating scoped keys. Modify
the surrounding function (the one invoking UpdateState in host_darwin.go) to
propagate or handle the error immediately (e.g., return the error or abort
initialization) instead of logging and continuing; reference
stateManager.UpdateState, ShutdownState, s.interfaceName and ensure Cleanup will
see a persisted InterfaceName on restart.
There was a problem hiding this comment.
This is a false positive — no code change needed. Here's why:
-
UpdateStatecan only fail if the state was never registered — the sole error path insetState()iserrStateNotRegistered, which is a programming bug, not a runtime condition. If it fails here, every subsequentUpdateStatecall in this function (lines 95, 117, 127 vias.updateState(stateManager)) will also fail, so returning early from this one call doesn't improve the situation. -
Codebase convention is log-and-continue — 9 of 10
UpdateStatecall sites across the codebase (routemanager,engine_ssh,dns/systemd_linux,dns/network_manager_unix,dns/resolvconf_unix,dns/host_windows,updatemanager, etc.) all uselog.Errorf/log.Warnfand continue. Changing just this one call to return an error would be inconsistent. -
The
updateState()helper at line 136 also logs-and-continues — even if we returned early here, the same non-fatal pattern is used for all subsequent state updates in this function. Making only the first call fatal while the rest are non-fatal is incoherent. -
Returning an error here would break DNS entirely —
applyDNSConfigis the function that configures the client's DNS. Returning early because state persistence failed means DNS won't be configured at all, which is strictly worse than the theoretical risk of orphaned keys on an unclean shutdown.
There was a problem hiding this comment.
@siriobalmelli, you're right — I accept all four points. The errStateNotRegistered path is a programming invariant (not a runtime condition), and blocking DNS configuration because of a state-persistence failure would be strictly worse than the theoretical risk of orphaned keys. The log-and-continue pattern is consistent with the rest of the codebase. I'll retract this comment.
✏️ Learnings added
Learnt from: siriobalmelli
Repo: netbirdio/netbird PR: 5504
File: client/internal/dns/host_darwin.go:75-79
Timestamp: 2026-03-13T09:37:32.529Z
Learning: In netbirdio/netbird, `stateManager.UpdateState()` failures are handled with log.Errorf/log.Warnf and execution continues (never return an error). The sole error path is `errStateNotRegistered`, which is a programming bug not a runtime condition. All call sites across routemanager, engine_ssh, dns/systemd_linux, dns/network_manager_unix, dns/resolvconf_unix, dns/host_windows, and updatemanager follow this log-and-continue convention. Do not flag log-and-continue on UpdateState as a bug in this codebase.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
🧠 Learnings used
Learnt from: lixmal
Repo: netbirdio/netbird PR: 5202
File: client/internal/portforward/manager.go:219-222
Timestamp: 2026-01-30T14:43:13.106Z
Learning: In client/internal/portforward/manager.go, the state persistence pattern is designed for retry-on-failure: if DeletePortMapping fails in Stop(), the state is intentionally left intact (via early return) so that cleanupResidual() can attempt deletion again on the next startup. This is the intended behavior, not a bug.
Learnt from: pappz
Repo: netbirdio/netbird PR: 5448
File: client/server/server.go:142-146
Timestamp: 2026-02-25T17:44:13.439Z
Learning: In the netbirdio/netbird client/server/server.go file, the update manager is intentionally designed to operate only with GUI flows, not in headless/daemon mode. When the update manager is initialized in the Start() method, only CheckUpdateSuccess is called; the manager's Start() method is deliberately not invoked in the daemon lifecycle and is instead called only for GUI flows via startUpdateManagerForGUI().
</learning]
Learnt from: pappz
Repo: netbirdio/netbird PR: 5512
File: client/internal/metrics/push.go:167-181
Timestamp: 2026-03-13T09:34:29.533Z
Learning: In netbirdio/netbird PR `#5512`, the `NB_METRICS_INTERVAL` env var (`p.envInterval`) in `client/internal/metrics/push.go` does NOT bypass remote config version gating (`isVersionInRange`) or the `RefreshIfNeeded` nil check. The earlier documentation describing this override behavior was deprecated. Do not flag the `resolve()` function for failing to skip version/config checks when `envInterval > 0` — the current behavior is intentional.
Learnt from: bcmmbaga
Repo: netbirdio/netbird PR: 5441
File: management/server/user.go:745-749
Timestamp: 2026-02-24T19:32:13.189Z
Learning: In Go codebases like netbirdio/netbird, methods returning (T, error) should follow the convention: if an error is returned, propagate it and return early. When the caller handles the error immediately, explicit nil checks on the returned value are unnecessary; rely on the error to guard control flow and only inspect the value when err is nil. This guidance applies broadly to Store methods such as GetUserByUserID in Go code; prefer early return on error and avoid redundant nil checks after a successful error path.
Learnt from: lixmal
Repo: netbirdio/netbird PR: 5530
File: proxy/internal/udp/relay.go:276-278
Timestamp: 2026-03-07T13:59:16.506Z
Learning: For Go 1.25+ projects, use sync.WaitGroup.Go() to run goroutines because it handles Add(1) and defer Done() automatically. Do not flag wg.Go(...) as a compilation error in codebases that require Go 1.25 or newer. If supporting older Go versions, gate such usage with build tags or avoid wg.Go() to maintain compatibility; verify the module targets Go 1.25+ before adopting this pattern.
Multiple netbird instances on the same host (each with its own WireGuard interface and state directory) clobber each other's scutil DNS entries because the key format 'State:/Network/Service/NetBird-%s/DNS' has no per-instance disambiguation. Add the WireGuard interface name as a component in all NetBird scutil key formats: - named keys: NetBird-<iface>-<suffix>/DNS - batched keys: NetBird-<iface>-<suffix>-<index>/DNS This preserves the domain-batching logic from netbirdio#5368 (which prevents silent domain loss with 60+ domains) while adding the interface scoping needed for multi-instance correctness. Also fix a latent bug reintroduced by netbirdio#5368: primaryServiceStateKeyFormat has only one %s placeholder (for the system service UUID), so it must be formatted with fmt.Sprintf directly rather than getKeyWithInput. On upgrade, discoverExistingKeys() also finds and removes legacy-format keys (without interface scope) so stale entries are cleaned up rather than left orphaned in scutil. Fixes: netbirdio#446 Signed-off-by: Sirio Balmelli <sirio@b-ad.ch>
…sions When upgrading from an older NetBird version after an unclean shutdown, legacy scutil DNS keys (e.g. NetBird-Match/DNS) were permanently orphaned because Cleanup() returned nil early when InterfaceName was empty, causing the state manager to delete the state entry before any cleanup ran. Two compounding issues: 1. The early return prevented discoverExistingKeys() from ever running. discoverExistingKeys() is only called when createdKeys is empty, which only happens during teardown — but by then the new instance has its own keys in createdKeys, so legacy keys were never discovered. 2. The json:"created_keys" tag introduced in the previous commit does not match the old PascalCase "CreatedKeys" field name (underscore vs camelCase are not equivalent under Go's case-insensitive matching rules), so CreatedKeys always deserialized as nil from old state files. Fix both issues: - Replace the early return with a bare systemConfigurator (empty interfaceName, empty createdKeys) that falls through to the normal cleanup path. discoverExistingKeys() will be called and its hardcoded legacy key probes will find and remove old non-scoped keys. - Add UnmarshalJSON to handle both old "CreatedKeys" and new "created_keys" JSON field names, with new-format taking precedence when both are present. Signed-off-by: Sirio Balmelli <sirio@b-ad.ch>
d07f09b to
1c8d5ce
Compare
|



Summary
Fixes a bug where multiple netbird instances on the same macOS host clobber each other's DNS entries in
scutil, because the key formatState:/Network/Service/NetBird-%s/DNShas no per-instance disambiguation.Alleviates #446: while one instance wouldn't connect to multiple networks, removes the barrier to having multiple instances running simultaneously.
Problem
When two netbird instances run on the same host with different WireGuard interfaces (e.g.
utun0andutun1), they both write to the same scutil keys (NetBird-Match/DNS,NetBird-Search/DNS). The second instance silently overwrites the first's DNS configuration, breaking DNS for the first instance's network.Solution
Add the WireGuard interface name as a component in all NetBird scutil key formats:
NetBird-Match/DNSNetBird-utun0-Match/DNSNetBird-Match-0/DNSNetBird-utun0-Match-0/DNSThis builds on top of the domain-batching work from #5368 (which prevents silent domain loss with 60+ domains) — both features are now combined.
Upgrade migration
discoverExistingKeys()now discovers and removes legacy-format keys (without interface scope) on startup, so stale scutil entries from older versions are cleaned up automatically rather than left orphaned.Changes
host_darwin.go: Updated key format constants to include interface name; addedinterfaceNamefield tosystemConfigurator; updated all key generation inaddBatchedDomains,addLocalDNS,discoverExistingKeys; fixedprimaryServiceStateKeyFormatbug; updatedgetKeyWithInputsignatureserver_darwin.go: Passs.wgInterface.Name()tonewHostManager()unclean_shutdown_darwin.go: AddedInterfaceNamefield toShutdownStatefor proper cleanup scoping after unclean shutdownhost_darwin_test.go: Updated all existing tests for new key format; addedTestGetKeyWithInput,TestNewHostManagerWithInterfaceName,TestMultipleInterfacesGenerateDifferentKeys,TestShutdownStateIncludesInterfaceName,TestPrimaryServiceKeyFormatNotAffected,TestMultipleInstancesBatchedIsolationTesting
Unit tests (no scutil required, any platform):
Integration tests (macOS with scutil, requires admin):
This supersedes PR #4633 (which was opened from
mainand became unmergeable after #5368 landed).cc @mlsmaycon and @lixmal
Summary by CodeRabbit
Bug Fixes
Tests
Documentation
Select exactly one:
Internal implementation detail: scopes macOS scutil DNS state keys by WireGuard interface name to prevent multi-instance conflicts. No user-facing API, CLI, or configuration changes.