Skip to content

[client] Batch macOS DNS domains to avoid truncation#5368

Merged
mlsmaycon merged 5 commits intomainfrom
fix/macos-domain-setup
Feb 18, 2026
Merged

[client] Batch macOS DNS domains to avoid truncation#5368
mlsmaycon merged 5 commits intomainfrom
fix/macos-domain-setup

Conversation

@mlsmaycon
Copy link
Copy Markdown
Collaborator

@mlsmaycon mlsmaycon commented Feb 17, 2026

Describe your changes

scutil has undocumented limits: 99-element cap on d.add arrays and ~2048
byte value buffer for SupplementalMatchDomains. Users with 60+ domains
hit silent domain loss. This applies the same batching approach used on
Windows (nrptMaxDomainsPerRule=50), splitting domains into indexed
resolver keys (NetBird-Match-0, NetBird-Match-1, etc.) with 50-element
and 1500-byte limits per key.

Issue ticket number and link

Stack

Checklist

  • Is it a bug fix
  • Is a typo/documentation fix
  • Is a feature enhancement
  • It is a refactor
  • Created tests that fail without the change (if possible)

By submitting this pull request, you confirm that you have read and agree to the terms of the Contributor License Agreement.

Documentation

Select exactly one:

  • I added/updated documentation for this change
  • Documentation is not needed for this change (explain why)

Docs PR URL (required if "docs added" is checked)

Paste the PR link from https://github.com/netbirdio/docs here:

https://github.com/netbirdio/docs/pull/__

Summary by CodeRabbit

  • New Features

    • macOS DNS handling now batches domain entries to respect per-entry count/size limits and detects/restores existing system DNS state at startup.
  • Bug Fixes

    • Cleanup reliably removes legacy and batched DNS entries; improved error reporting and logging for batch operations and removal flows.
  • Tests

    • Expanded tests for batching logic, multi-entry DNS configurations, cleanup, and restoration behavior.

… truncation

scutil has undocumented limits: 99-element cap on d.add arrays and ~2048
  byte value buffer for SupplementalMatchDomains. Users with 60+ domains
  hit silent domain loss. This applies the same batching approach used on
  Windows (nrptMaxDomainsPerRule=50), splitting domains into indexed
  resolver keys (NetBird-Match-0, NetBird-Match-1, etc.) with 50-element
  and 1500-byte limits per key.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Feb 17, 2026

No actionable comments were generated in the recent review. 🎉


📝 Walkthrough

Walkthrough

Reworks Darwin DNS state storage to use indexed, batched scutil keys instead of single-key entries; adds domain-splitting by count and byte-size limits, discovery/removal of existing keys, batched add/remove flows, and expanded tests for batching and cleanup. (≤50 words)

Changes

Cohort / File(s) Summary
DNS batching implementation
client/internal/dns/host_darwin.go
Adds indexed state key format (netbirdDNSStateKeyIndexedFormat), batching limits (maxDomainsPerResolverEntry, maxDomainBytesPerResolverEntry), domain-splitting (splitDomainsIntoBatches), batched add/remove flows (addBatchedDomains, removeKeysContaining, updated addDNSState usage), and system-key discovery (discoverExistingKeys, getSystemDNSKeys). Adjusts startup/removal to handle legacy and indexed keys and updates error/log messages with batch context.
Tests & test helpers
client/internal/dns/host_darwin_test.go
Replaces hard-coded key usage with dynamic createdKeys handling, adds domain generators (generateShortDomains, generateLongDomains), key parsing (readDomainsFromKey), and tests for batching (TestSplitDomainsIntoBatches, TestMatchDomainBatching). Expands cleanup to remove legacy-format keys and validate batched writes/reads.

Sequence Diagram(s)

sequenceDiagram
    participant Configurator
    participant scutil
    participant SystemDNS

    Configurator->>Configurator: start / restore
    Configurator->>SystemDNS: discoverExistingKeys (getSystemDNSKeys)
    alt existing keys found
        Configurator->>scutil: removeKeysContaining(old-suffixes)
        scutil->>SystemDNS: delete legacy resolver entries
    end
    Configurator->>Configurator: splitDomainsIntoBatches(domains)
    loop for each batch i
        Configurator->>scutil: add batched resolver key (indexed i)
        scutil->>SystemDNS: create/update resolver entry with domains batch i
        scutil-->>Configurator: result (success/error with batch index)
    end
    Configurator->>Configurator: track createdKeys map
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~30 minutes

Suggested reviewers

  • pappz

Poem

🐇
I hopped through keys both old and new,
I split the domains into tidy queues,
Batches danced in indexed rows,
macOS finds what the scutil knows —
A floppy-eared cheer for cleaner views!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 60.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely describes the main change: batching macOS DNS domains to prevent truncation, which directly addresses the core problem solved by the changeset.
Description check ✅ Passed The PR description provides a clear explanation of the issue, solution approach, and fills in most required template sections including issue summary, checklist selection, and documentation status.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/macos-domain-setup

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
client/internal/dns/host_darwin_test.go (1)

161-248: The "empty" test case passes vacuously — consider an explicit assertion.

When tc.expectedCount is 0, the length check at line 219 is skipped. Combined with checkAllPresent: false, no assertions run for the empty case. Add an explicit check:

Suggested fix
 		{
 			name:          "empty",
 			domains:       nil,
 			expectedCount: 0,
+			checkAllPresent: false,
 		},

And in the test body, after line 217:

+			if tc.domains == nil {
+				assert.Nil(t, batches)
+				return
+			}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/internal/dns/host_darwin_test.go` around lines 161 - 248,
TestSplitDomainsIntoBatches currently skips all assertions for the "empty" case
because tc.expectedCount == 0 and checkAllPresent is false; add an explicit
assertion that splitDomainsIntoBatches(nil) returns zero batches. Concretely,
inside TestSplitDomainsIntoBatches after calling batches :=
splitDomainsIntoBatches(tc.domains) add an assertion when tc.expectedCount == 0
(or when tc.name == "empty") such as assert.Len(t, batches, 0) to ensure the
empty input case is actually validated; keep existing checks (element/byte
limits and aggregation) unchanged for other cases.
client/internal/dns/host_darwin.go (1)

329-343: removeKeysContaining always returns nil — error return is misleading.

The function logs individual removal failures as warnings but never propagates them. Callers (lines 96-98, 106-108) check the returned error, but that branch is dead code. Either collect/return errors or change the signature to not return one.

Also, the parameter is named suffix but the function uses strings.Contains, not strings.HasSuffix — consider renaming to substring or pattern for clarity.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/internal/dns/host_darwin.go` around lines 329 - 343, The function
removeKeysContaining currently always returns nil and uses parameter name suffix
while calling strings.Contains; update removeKeysContaining to either (A)
aggregate and return errors from s.removeKeyFromSystemConfig (e.g., collect
errors into a slice and return a combined error via fmt.Errorf or multierror) so
callers can observe failures, or (B) change the signature to return only
error-less (remove the error return) and keep logging; also fix the parameter
name to reflect behavior (rename suffix to substring) or change the check to
strings.HasSuffix if the intent was suffix-matching; reference symbols:
removeKeysContaining, s.createdKeys, and removeKeyFromSystemConfig.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@client/internal/dns/host_darwin.go`:
- Around line 154-162: getRemovableKeysWithDefaults currently returns only index
0 for the indexed key format when s.createdKeys is empty, leaving orphaned
scutil keys; change it to discover all existing indexed keys and return them so
removeKeysContaining can clean them. Specifically, in
getRemovableKeysWithDefaults (and any callers relying on its return), probe
increasing indices for netbirdDNSStateKeyIndexedFormat (e.g.,
fmt.Sprintf(netbirdDNSStateKeyIndexedFormat, suffix, i)) and stop when scutil
show for that key returns "No such key" (or use a scutil list/glob to find keys
with the NetBird prefix), accumulate all found indexed keys plus the legacy
single-key formats, and return that full list so orphaned indexed keys >0 are
included for removal.

---

Nitpick comments:
In `@client/internal/dns/host_darwin_test.go`:
- Around line 161-248: TestSplitDomainsIntoBatches currently skips all
assertions for the "empty" case because tc.expectedCount == 0 and
checkAllPresent is false; add an explicit assertion that
splitDomainsIntoBatches(nil) returns zero batches. Concretely, inside
TestSplitDomainsIntoBatches after calling batches :=
splitDomainsIntoBatches(tc.domains) add an assertion when tc.expectedCount == 0
(or when tc.name == "empty") such as assert.Len(t, batches, 0) to ensure the
empty input case is actually validated; keep existing checks (element/byte
limits and aggregation) unchanged for other cases.

In `@client/internal/dns/host_darwin.go`:
- Around line 329-343: The function removeKeysContaining currently always
returns nil and uses parameter name suffix while calling strings.Contains;
update removeKeysContaining to either (A) aggregate and return errors from
s.removeKeyFromSystemConfig (e.g., collect errors into a slice and return a
combined error via fmt.Errorf or multierror) so callers can observe failures, or
(B) change the signature to return only error-less (remove the error return) and
keep logging; also fix the parameter name to reflect behavior (rename suffix to
substring) or change the check to strings.HasSuffix if the intent was
suffix-matching; reference symbols: removeKeysContaining, s.createdKeys, and
removeKeyFromSystemConfig.

Comment thread client/internal/dns/host_darwin.go Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
client/internal/dns/host_darwin.go (2)

197-205: Consider sanitizing or limiting the scutil list output scope.

The list .*DNS regex pattern is broad and could match non-NetBird DNS keys. While this is only used in discoverExistingKeys which then filters for NetBird-prefixed keys, the intermediate string could be large on systems with many network services. This is fine in practice but worth noting.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/internal/dns/host_darwin.go` around lines 197 - 205, The
getSystemDNSKeys function uses a broad scutil pattern "list .*DNS" which can
return a large unrelated output; narrow the scope by changing the command string
to only list keys likely to contain NetBird entries (e.g., include the NetBird
prefix or other identifying token) or otherwise constrain the regex before
calling runSystemConfigCommand so discoverExistingKeys receives a smaller,
focused payload; update the command variable in getSystemDNSKeys and ensure any
adjusted pattern still covers the keys that discoverExistingKeys expects (refer
to getSystemDNSKeys and discoverExistingKeys and the runSystemConfigCommand
call).

364-378: removeKeysContaining always returns nil, making the error return misleading.

The function logs individual key removal failures as warnings but never returns an error. The callers at lines 96 and 106 check the returned error, but it's always nil. Either propagate errors (e.g., collect and return a combined error) or change the signature to not return an error to avoid misleading callers.

Option A: propagate errors
 func (s *systemConfigurator) removeKeysContaining(suffix string) error {
 	var toRemove []string
+	var errs []error
 	for key := range s.createdKeys {
 		if strings.Contains(key, suffix) {
 			toRemove = append(toRemove, key)
 		}
 	}
 	for _, key := range toRemove {
 		if err := s.removeKeyFromSystemConfig(key); err != nil {
 			log.Warnf("failed to remove key %s: %v", key, err)
+			errs = append(errs, err)
 		}
 	}
+	if len(errs) > 0 {
+		return fmt.Errorf("failed to remove %d keys", len(errs))
+	}
 	return nil
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/internal/dns/host_darwin.go` around lines 364 - 378, The
removeKeysContaining function currently always returns nil which makes its error
return misleading; update removeKeysContaining to collect errors from
s.removeKeyFromSystemConfig and return a non-nil error when any removals fail
(e.g., accumulate individual errors or build a combined error string) so callers
that check the returned error actually receive failure information; keep the
logging but also append the error to a slice and at the end return a single
error (or wrapped error) if len(errors)>0, leaving the function signature
removeKeysContaining(suffix string) error unchanged and ensuring
removeKeyFromSystemConfig calls are the source of the propagated errors.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@client/internal/dns/host_darwin.go`:
- Around line 197-205: The getSystemDNSKeys function uses a broad scutil pattern
"list .*DNS" which can return a large unrelated output; narrow the scope by
changing the command string to only list keys likely to contain NetBird entries
(e.g., include the NetBird prefix or other identifying token) or otherwise
constrain the regex before calling runSystemConfigCommand so
discoverExistingKeys receives a smaller, focused payload; update the command
variable in getSystemDNSKeys and ensure any adjusted pattern still covers the
keys that discoverExistingKeys expects (refer to getSystemDNSKeys and
discoverExistingKeys and the runSystemConfigCommand call).
- Around line 364-378: The removeKeysContaining function currently always
returns nil which makes its error return misleading; update removeKeysContaining
to collect errors from s.removeKeyFromSystemConfig and return a non-nil error
when any removals fail (e.g., accumulate individual errors or build a combined
error string) so callers that check the returned error actually receive failure
information; keep the logging but also append the error to a slice and at the
end return a single error (or wrapped error) if len(errors)>0, leaving the
function signature removeKeysContaining(suffix string) error unchanged and
ensuring removeKeyFromSystemConfig calls are the source of the propagated
errors.

Comment thread client/internal/dns/host_darwin.go
@sonarqubecloud
Copy link
Copy Markdown

@mlsmaycon mlsmaycon requested a review from pappz February 18, 2026 17:40
@mlsmaycon mlsmaycon merged commit d1ead22 into main Feb 18, 2026
40 checks passed
@mlsmaycon mlsmaycon deleted the fix/macos-domain-setup branch February 18, 2026 18:14
siriobalmelli added a commit to siriobalmelli/netbird that referenced this pull request Mar 4, 2026
Multiple netbird instances on the same host (each with its own WireGuard
interface and state directory) clobber each other's scutil DNS entries
because the key format 'State:/Network/Service/NetBird-%s/DNS' has no
per-instance disambiguation.

Add the WireGuard interface name as a component in all NetBird scutil
key formats:
  - named keys:   NetBird-<iface>-<suffix>/DNS
  - batched keys: NetBird-<iface>-<suffix>-<index>/DNS

This preserves the domain-batching logic from netbirdio#5368 (which prevents
silent domain loss with 60+ domains) while adding the interface scoping
needed for multi-instance correctness.

Also fix a latent bug reintroduced by netbirdio#5368: primaryServiceStateKeyFormat
has only one %s placeholder (for the system service UUID), so it must be
formatted with fmt.Sprintf directly rather than getKeyWithInput.

On upgrade, discoverExistingKeys() also finds and removes legacy-format
keys (without interface scope) so stale entries are cleaned up rather
than left orphaned in scutil.

Fixes: netbirdio#446

Signed-off-by: Sirio Balmelli <sirio@b-ad.ch>
siriobalmelli added a commit to siriobalmelli/netbird that referenced this pull request Mar 4, 2026
Multiple netbird instances on the same host (each with its own WireGuard
interface and state directory) clobber each other's scutil DNS entries
because the key format 'State:/Network/Service/NetBird-%s/DNS' has no
per-instance disambiguation.

Add the WireGuard interface name as a component in all NetBird scutil
key formats:
  - named keys:   NetBird-<iface>-<suffix>/DNS
  - batched keys: NetBird-<iface>-<suffix>-<index>/DNS

This preserves the domain-batching logic from netbirdio#5368 (which prevents
silent domain loss with 60+ domains) while adding the interface scoping
needed for multi-instance correctness.

Also fix a latent bug reintroduced by netbirdio#5368: primaryServiceStateKeyFormat
has only one %s placeholder (for the system service UUID), so it must be
formatted with fmt.Sprintf directly rather than getKeyWithInput.

On upgrade, discoverExistingKeys() also finds and removes legacy-format
keys (without interface scope) so stale entries are cleaned up rather
than left orphaned in scutil.

Fixes: netbirdio#446

Signed-off-by: Sirio Balmelli <sirio@b-ad.ch>
siriobalmelli added a commit to siriobalmelli/netbird that referenced this pull request Mar 12, 2026
Multiple netbird instances on the same host (each with its own WireGuard
interface and state directory) clobber each other's scutil DNS entries
because the key format 'State:/Network/Service/NetBird-%s/DNS' has no
per-instance disambiguation.

Add the WireGuard interface name as a component in all NetBird scutil
key formats:
  - named keys:   NetBird-<iface>-<suffix>/DNS
  - batched keys: NetBird-<iface>-<suffix>-<index>/DNS

This preserves the domain-batching logic from netbirdio#5368 (which prevents
silent domain loss with 60+ domains) while adding the interface scoping
needed for multi-instance correctness.

Also fix a latent bug reintroduced by netbirdio#5368: primaryServiceStateKeyFormat
has only one %s placeholder (for the system service UUID), so it must be
formatted with fmt.Sprintf directly rather than getKeyWithInput.

On upgrade, discoverExistingKeys() also finds and removes legacy-format
keys (without interface scope) so stale entries are cleaned up rather
than left orphaned in scutil.

Fixes: netbirdio#446

Signed-off-by: Sirio Balmelli <sirio@b-ad.ch>
siriobalmelli added a commit to siriobalmelli/netbird that referenced this pull request Mar 13, 2026
Multiple netbird instances on the same host (each with its own WireGuard
interface and state directory) clobber each other's scutil DNS entries
because the key format 'State:/Network/Service/NetBird-%s/DNS' has no
per-instance disambiguation.

Add the WireGuard interface name as a component in all NetBird scutil
key formats:
  - named keys:   NetBird-<iface>-<suffix>/DNS
  - batched keys: NetBird-<iface>-<suffix>-<index>/DNS

This preserves the domain-batching logic from netbirdio#5368 (which prevents
silent domain loss with 60+ domains) while adding the interface scoping
needed for multi-instance correctness.

Also fix a latent bug reintroduced by netbirdio#5368: primaryServiceStateKeyFormat
has only one %s placeholder (for the system service UUID), so it must be
formatted with fmt.Sprintf directly rather than getKeyWithInput.

On upgrade, discoverExistingKeys() also finds and removes legacy-format
keys (without interface scope) so stale entries are cleaned up rather
than left orphaned in scutil.

Fixes: netbirdio#446

Signed-off-by: Sirio Balmelli <sirio@b-ad.ch>
siriobalmelli added a commit to siriobalmelli/netbird that referenced this pull request Mar 26, 2026
Multiple netbird instances on the same host (each with its own WireGuard
interface and state directory) clobber each other's scutil DNS entries
because the key format 'State:/Network/Service/NetBird-%s/DNS' has no
per-instance disambiguation.

Add the WireGuard interface name as a component in all NetBird scutil
key formats:
  - named keys:   NetBird-<iface>-<suffix>/DNS
  - batched keys: NetBird-<iface>-<suffix>-<index>/DNS

This preserves the domain-batching logic from netbirdio#5368 (which prevents
silent domain loss with 60+ domains) while adding the interface scoping
needed for multi-instance correctness.

Also fix a latent bug reintroduced by netbirdio#5368: primaryServiceStateKeyFormat
has only one %s placeholder (for the system service UUID), so it must be
formatted with fmt.Sprintf directly rather than getKeyWithInput.

On upgrade, discoverExistingKeys() also finds and removes legacy-format
keys (without interface scope) so stale entries are cleaned up rather
than left orphaned in scutil.

Fixes: netbirdio#446

Signed-off-by: Sirio Balmelli <sirio@b-ad.ch>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants