[client] Batch macOS DNS domains to avoid truncation#5368
Conversation
… truncation scutil has undocumented limits: 99-element cap on d.add arrays and ~2048 byte value buffer for SupplementalMatchDomains. Users with 60+ domains hit silent domain loss. This applies the same batching approach used on Windows (nrptMaxDomainsPerRule=50), splitting domains into indexed resolver keys (NetBird-Match-0, NetBird-Match-1, etc.) with 50-element and 1500-byte limits per key.
|
No actionable comments were generated in the recent review. 🎉 📝 WalkthroughWalkthroughReworks Darwin DNS state storage to use indexed, batched scutil keys instead of single-key entries; adds domain-splitting by count and byte-size limits, discovery/removal of existing keys, batched add/remove flows, and expanded tests for batching and cleanup. (≤50 words) Changes
Sequence Diagram(s)sequenceDiagram
participant Configurator
participant scutil
participant SystemDNS
Configurator->>Configurator: start / restore
Configurator->>SystemDNS: discoverExistingKeys (getSystemDNSKeys)
alt existing keys found
Configurator->>scutil: removeKeysContaining(old-suffixes)
scutil->>SystemDNS: delete legacy resolver entries
end
Configurator->>Configurator: splitDomainsIntoBatches(domains)
loop for each batch i
Configurator->>scutil: add batched resolver key (indexed i)
scutil->>SystemDNS: create/update resolver entry with domains batch i
scutil-->>Configurator: result (success/error with batch index)
end
Configurator->>Configurator: track createdKeys map
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~30 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
client/internal/dns/host_darwin_test.go (1)
161-248: The "empty" test case passes vacuously — consider an explicit assertion.When
tc.expectedCountis 0, the length check at line 219 is skipped. Combined withcheckAllPresent: false, no assertions run for the empty case. Add an explicit check:Suggested fix
{ name: "empty", domains: nil, expectedCount: 0, + checkAllPresent: false, },And in the test body, after line 217:
+ if tc.domains == nil { + assert.Nil(t, batches) + return + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@client/internal/dns/host_darwin_test.go` around lines 161 - 248, TestSplitDomainsIntoBatches currently skips all assertions for the "empty" case because tc.expectedCount == 0 and checkAllPresent is false; add an explicit assertion that splitDomainsIntoBatches(nil) returns zero batches. Concretely, inside TestSplitDomainsIntoBatches after calling batches := splitDomainsIntoBatches(tc.domains) add an assertion when tc.expectedCount == 0 (or when tc.name == "empty") such as assert.Len(t, batches, 0) to ensure the empty input case is actually validated; keep existing checks (element/byte limits and aggregation) unchanged for other cases.client/internal/dns/host_darwin.go (1)
329-343:removeKeysContainingalways returnsnil— error return is misleading.The function logs individual removal failures as warnings but never propagates them. Callers (lines 96-98, 106-108) check the returned error, but that branch is dead code. Either collect/return errors or change the signature to not return one.
Also, the parameter is named
suffixbut the function usesstrings.Contains, notstrings.HasSuffix— consider renaming tosubstringorpatternfor clarity.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@client/internal/dns/host_darwin.go` around lines 329 - 343, The function removeKeysContaining currently always returns nil and uses parameter name suffix while calling strings.Contains; update removeKeysContaining to either (A) aggregate and return errors from s.removeKeyFromSystemConfig (e.g., collect errors into a slice and return a combined error via fmt.Errorf or multierror) so callers can observe failures, or (B) change the signature to return only error-less (remove the error return) and keep logging; also fix the parameter name to reflect behavior (rename suffix to substring) or change the check to strings.HasSuffix if the intent was suffix-matching; reference symbols: removeKeysContaining, s.createdKeys, and removeKeyFromSystemConfig.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@client/internal/dns/host_darwin.go`:
- Around line 154-162: getRemovableKeysWithDefaults currently returns only index
0 for the indexed key format when s.createdKeys is empty, leaving orphaned
scutil keys; change it to discover all existing indexed keys and return them so
removeKeysContaining can clean them. Specifically, in
getRemovableKeysWithDefaults (and any callers relying on its return), probe
increasing indices for netbirdDNSStateKeyIndexedFormat (e.g.,
fmt.Sprintf(netbirdDNSStateKeyIndexedFormat, suffix, i)) and stop when scutil
show for that key returns "No such key" (or use a scutil list/glob to find keys
with the NetBird prefix), accumulate all found indexed keys plus the legacy
single-key formats, and return that full list so orphaned indexed keys >0 are
included for removal.
---
Nitpick comments:
In `@client/internal/dns/host_darwin_test.go`:
- Around line 161-248: TestSplitDomainsIntoBatches currently skips all
assertions for the "empty" case because tc.expectedCount == 0 and
checkAllPresent is false; add an explicit assertion that
splitDomainsIntoBatches(nil) returns zero batches. Concretely, inside
TestSplitDomainsIntoBatches after calling batches :=
splitDomainsIntoBatches(tc.domains) add an assertion when tc.expectedCount == 0
(or when tc.name == "empty") such as assert.Len(t, batches, 0) to ensure the
empty input case is actually validated; keep existing checks (element/byte
limits and aggregation) unchanged for other cases.
In `@client/internal/dns/host_darwin.go`:
- Around line 329-343: The function removeKeysContaining currently always
returns nil and uses parameter name suffix while calling strings.Contains;
update removeKeysContaining to either (A) aggregate and return errors from
s.removeKeyFromSystemConfig (e.g., collect errors into a slice and return a
combined error via fmt.Errorf or multierror) so callers can observe failures, or
(B) change the signature to return only error-less (remove the error return) and
keep logging; also fix the parameter name to reflect behavior (rename suffix to
substring) or change the check to strings.HasSuffix if the intent was
suffix-matching; reference symbols: removeKeysContaining, s.createdKeys, and
removeKeyFromSystemConfig.
There was a problem hiding this comment.
🧹 Nitpick comments (2)
client/internal/dns/host_darwin.go (2)
197-205: Consider sanitizing or limiting thescutil listoutput scope.The
list .*DNSregex pattern is broad and could match non-NetBird DNS keys. While this is only used indiscoverExistingKeyswhich then filters for NetBird-prefixed keys, the intermediate string could be large on systems with many network services. This is fine in practice but worth noting.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@client/internal/dns/host_darwin.go` around lines 197 - 205, The getSystemDNSKeys function uses a broad scutil pattern "list .*DNS" which can return a large unrelated output; narrow the scope by changing the command string to only list keys likely to contain NetBird entries (e.g., include the NetBird prefix or other identifying token) or otherwise constrain the regex before calling runSystemConfigCommand so discoverExistingKeys receives a smaller, focused payload; update the command variable in getSystemDNSKeys and ensure any adjusted pattern still covers the keys that discoverExistingKeys expects (refer to getSystemDNSKeys and discoverExistingKeys and the runSystemConfigCommand call).
364-378:removeKeysContainingalways returnsnil, making the error return misleading.The function logs individual key removal failures as warnings but never returns an error. The callers at lines 96 and 106 check the returned error, but it's always
nil. Either propagate errors (e.g., collect and return a combined error) or change the signature to not return an error to avoid misleading callers.Option A: propagate errors
func (s *systemConfigurator) removeKeysContaining(suffix string) error { var toRemove []string + var errs []error for key := range s.createdKeys { if strings.Contains(key, suffix) { toRemove = append(toRemove, key) } } for _, key := range toRemove { if err := s.removeKeyFromSystemConfig(key); err != nil { log.Warnf("failed to remove key %s: %v", key, err) + errs = append(errs, err) } } + if len(errs) > 0 { + return fmt.Errorf("failed to remove %d keys", len(errs)) + } return nil }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@client/internal/dns/host_darwin.go` around lines 364 - 378, The removeKeysContaining function currently always returns nil which makes its error return misleading; update removeKeysContaining to collect errors from s.removeKeyFromSystemConfig and return a non-nil error when any removals fail (e.g., accumulate individual errors or build a combined error string) so callers that check the returned error actually receive failure information; keep the logging but also append the error to a slice and at the end return a single error (or wrapped error) if len(errors)>0, leaving the function signature removeKeysContaining(suffix string) error unchanged and ensuring removeKeyFromSystemConfig calls are the source of the propagated errors.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@client/internal/dns/host_darwin.go`:
- Around line 197-205: The getSystemDNSKeys function uses a broad scutil pattern
"list .*DNS" which can return a large unrelated output; narrow the scope by
changing the command string to only list keys likely to contain NetBird entries
(e.g., include the NetBird prefix or other identifying token) or otherwise
constrain the regex before calling runSystemConfigCommand so
discoverExistingKeys receives a smaller, focused payload; update the command
variable in getSystemDNSKeys and ensure any adjusted pattern still covers the
keys that discoverExistingKeys expects (refer to getSystemDNSKeys and
discoverExistingKeys and the runSystemConfigCommand call).
- Around line 364-378: The removeKeysContaining function currently always
returns nil which makes its error return misleading; update removeKeysContaining
to collect errors from s.removeKeyFromSystemConfig and return a non-nil error
when any removals fail (e.g., accumulate individual errors or build a combined
error string) so callers that check the returned error actually receive failure
information; keep the logging but also append the error to a slice and at the
end return a single error (or wrapped error) if len(errors)>0, leaving the
function signature removeKeysContaining(suffix string) error unchanged and
ensuring removeKeyFromSystemConfig calls are the source of the propagated
errors.
|
Multiple netbird instances on the same host (each with its own WireGuard interface and state directory) clobber each other's scutil DNS entries because the key format 'State:/Network/Service/NetBird-%s/DNS' has no per-instance disambiguation. Add the WireGuard interface name as a component in all NetBird scutil key formats: - named keys: NetBird-<iface>-<suffix>/DNS - batched keys: NetBird-<iface>-<suffix>-<index>/DNS This preserves the domain-batching logic from netbirdio#5368 (which prevents silent domain loss with 60+ domains) while adding the interface scoping needed for multi-instance correctness. Also fix a latent bug reintroduced by netbirdio#5368: primaryServiceStateKeyFormat has only one %s placeholder (for the system service UUID), so it must be formatted with fmt.Sprintf directly rather than getKeyWithInput. On upgrade, discoverExistingKeys() also finds and removes legacy-format keys (without interface scope) so stale entries are cleaned up rather than left orphaned in scutil. Fixes: netbirdio#446 Signed-off-by: Sirio Balmelli <sirio@b-ad.ch>
Multiple netbird instances on the same host (each with its own WireGuard interface and state directory) clobber each other's scutil DNS entries because the key format 'State:/Network/Service/NetBird-%s/DNS' has no per-instance disambiguation. Add the WireGuard interface name as a component in all NetBird scutil key formats: - named keys: NetBird-<iface>-<suffix>/DNS - batched keys: NetBird-<iface>-<suffix>-<index>/DNS This preserves the domain-batching logic from netbirdio#5368 (which prevents silent domain loss with 60+ domains) while adding the interface scoping needed for multi-instance correctness. Also fix a latent bug reintroduced by netbirdio#5368: primaryServiceStateKeyFormat has only one %s placeholder (for the system service UUID), so it must be formatted with fmt.Sprintf directly rather than getKeyWithInput. On upgrade, discoverExistingKeys() also finds and removes legacy-format keys (without interface scope) so stale entries are cleaned up rather than left orphaned in scutil. Fixes: netbirdio#446 Signed-off-by: Sirio Balmelli <sirio@b-ad.ch>
Multiple netbird instances on the same host (each with its own WireGuard interface and state directory) clobber each other's scutil DNS entries because the key format 'State:/Network/Service/NetBird-%s/DNS' has no per-instance disambiguation. Add the WireGuard interface name as a component in all NetBird scutil key formats: - named keys: NetBird-<iface>-<suffix>/DNS - batched keys: NetBird-<iface>-<suffix>-<index>/DNS This preserves the domain-batching logic from netbirdio#5368 (which prevents silent domain loss with 60+ domains) while adding the interface scoping needed for multi-instance correctness. Also fix a latent bug reintroduced by netbirdio#5368: primaryServiceStateKeyFormat has only one %s placeholder (for the system service UUID), so it must be formatted with fmt.Sprintf directly rather than getKeyWithInput. On upgrade, discoverExistingKeys() also finds and removes legacy-format keys (without interface scope) so stale entries are cleaned up rather than left orphaned in scutil. Fixes: netbirdio#446 Signed-off-by: Sirio Balmelli <sirio@b-ad.ch>
Multiple netbird instances on the same host (each with its own WireGuard interface and state directory) clobber each other's scutil DNS entries because the key format 'State:/Network/Service/NetBird-%s/DNS' has no per-instance disambiguation. Add the WireGuard interface name as a component in all NetBird scutil key formats: - named keys: NetBird-<iface>-<suffix>/DNS - batched keys: NetBird-<iface>-<suffix>-<index>/DNS This preserves the domain-batching logic from netbirdio#5368 (which prevents silent domain loss with 60+ domains) while adding the interface scoping needed for multi-instance correctness. Also fix a latent bug reintroduced by netbirdio#5368: primaryServiceStateKeyFormat has only one %s placeholder (for the system service UUID), so it must be formatted with fmt.Sprintf directly rather than getKeyWithInput. On upgrade, discoverExistingKeys() also finds and removes legacy-format keys (without interface scope) so stale entries are cleaned up rather than left orphaned in scutil. Fixes: netbirdio#446 Signed-off-by: Sirio Balmelli <sirio@b-ad.ch>
Multiple netbird instances on the same host (each with its own WireGuard interface and state directory) clobber each other's scutil DNS entries because the key format 'State:/Network/Service/NetBird-%s/DNS' has no per-instance disambiguation. Add the WireGuard interface name as a component in all NetBird scutil key formats: - named keys: NetBird-<iface>-<suffix>/DNS - batched keys: NetBird-<iface>-<suffix>-<index>/DNS This preserves the domain-batching logic from netbirdio#5368 (which prevents silent domain loss with 60+ domains) while adding the interface scoping needed for multi-instance correctness. Also fix a latent bug reintroduced by netbirdio#5368: primaryServiceStateKeyFormat has only one %s placeholder (for the system service UUID), so it must be formatted with fmt.Sprintf directly rather than getKeyWithInput. On upgrade, discoverExistingKeys() also finds and removes legacy-format keys (without interface scope) so stale entries are cleaned up rather than left orphaned in scutil. Fixes: netbirdio#446 Signed-off-by: Sirio Balmelli <sirio@b-ad.ch>



Describe your changes
scutil has undocumented limits: 99-element cap on d.add arrays and ~2048
byte value buffer for SupplementalMatchDomains. Users with 60+ domains
hit silent domain loss. This applies the same batching approach used on
Windows (nrptMaxDomainsPerRule=50), splitting domains into indexed
resolver keys (NetBird-Match-0, NetBird-Match-1, etc.) with 50-element
and 1500-byte limits per key.
Issue ticket number and link
Stack
Checklist
Documentation
Select exactly one:
Docs PR URL (required if "docs added" is checked)
Paste the PR link from https://github.com/netbirdio/docs here:
https://github.com/netbirdio/docs/pull/__
Summary by CodeRabbit
New Features
Bug Fixes
Tests