Skip to content

[client] Fall through dns chain for custom dns zones#5081

Merged
lixmal merged 3 commits intomainfrom
local-resolver-fallthrough
Jan 12, 2026
Merged

[client] Fall through dns chain for custom dns zones#5081
lixmal merged 3 commits intomainfrom
local-resolver-fallthrough

Conversation

@lixmal
Copy link
Copy Markdown
Collaborator

@lixmal lixmal commented Jan 9, 2026

Describe your changes

Custom zones act as "overlays". You define specific records while unmatched queries in that zone fall through to lower-priority handlers (DNS routes, upstream). Without this, the local resolver claims authority over the entire zone and returns NXDOMAIN for any record it doesn't have, blocking CNAMEs from resolving via other handlers.

Issue ticket number and link

Stack

Checklist

  • Is it a bug fix
  • Is a typo/documentation fix
  • Is a feature enhancement
  • It is a refactor
  • Created tests that fail without the change (if possible)

By submitting this pull request, you confirm that you have read and agree to the terms of the Contributor License Agreement.

Documentation

Select exactly one:

  • I added/updated documentation for this change
  • Documentation is not needed for this change (explain why)

Docs PR URL (required if "docs added" is checked)

Paste the PR link from https://github.com/netbirdio/docs here:

https://github.com/netbirdio/docs/pull/__

Summary by CodeRabbit

Release Notes

  • New Features

    • Added DNS fallthrough for user-created zones: when a name error occurs in a non-authoritative zone, queries continue to the next resolver instead of failing.
  • Bug Fixes

    • Fixed DNS message header handling to prevent external upstream servers from interfering with internal fallthrough signaling.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jan 9, 2026

📝 Walkthrough

Walkthrough

The changes refactor DNS zone handling by replacing the SkipPTRProcess flag with NonAuthoritative, updating the local resolver's zone storage from a slice to a map, refactoring the Update API to accept CustomZone objects instead of separate records and zones parameters, and introducing fallthrough behavior for non-authoritative zones while clearing DNS Zero bits to prevent external signaling interference.

Changes

Cohort / File(s) Summary
DNS Zone Definition & Protocol
dns/dns.go, shared/management/proto/management.proto, management/internals/shared/grpc/conversion.go
Replaced SkipPTRProcess bool field with NonAuthoritative bool field in CustomZone struct across proto definition and conversion logic. Updated proto message and grpc conversion to map the new field.
DNS PTR Collection Logic
client/internal/dns.go
Changed zone filter condition in collectPTRRecords from checking SkipPTRProcess to checking NonAuthoritative when deciding which zones to evaluate for PTR record creation.
DNS Local Resolver Core
client/internal/dns/local/local.go
Transformed zones storage from slice ([]domain.Domain) to map (map[domain.Domain]bool) with bool indicating non-authoritative status. Refactored Update API from Update([]nbdns.SimpleRecord, []domain.Domain) to Update([]nbdns.CustomZone). Added zone lookup helpers (findZone, shouldFallthrough, continueToNext) and introduced fallthrough behavior in ServeDNS for non-authoritative zones via NXDOMAIN responses with Zero bit signaling.
DNS Local Resolver Tests
client/internal/dns/local/local_test.go
Updated all Update API calls to use new CustomZone-based signature. Added test coverage for fallthrough behavior with case-insensitive domain matching and benchmark tests for zone lookup performance.
DNS Server Configuration
client/internal/dns/server.go
Updated buildLocalHandlerUpdate signature from returning ([]handlerWrapper, []nbdns.SimpleRecord, []domain.Domain, error) to ([]handlerWrapper, []nbdns.CustomZone, error). Simplified local resolver update call to use unified CustomZone objects, removing separate localRecords/zones handling.
DNS Server Tests
client/internal/dns/server_test.go
Refactored test inputs from SimpleRecord-based to CustomZone-based structures and updated all resolver Update calls to match new API signature.
Upstream DNS Response Handling
client/internal/dns/upstream.go, client/internal/routemanager/dnsinterceptor/handler.go
Added clearing of Zero bit in DNS message header after response processing to prevent external upstream servers from interfering with internal fallthrough signaling.
Engine Configuration
client/internal/engine.go
Added backward compatibility logic for single custom zone scenarios: computes NonAuthoritative as zone.GetNonAuthoritative() && !singleZoneCompat, treating single zones as authoritative regardless of flag.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant LocalResolver as Local DNS Resolver
    participant UpstreamResolver as Upstream Resolver
    
    Client->>LocalResolver: Query for non-authoritative zone
    LocalResolver->>LocalResolver: findZone(qname)
    alt Zone found and non-authoritative
        LocalResolver->>LocalResolver: Check NXDOMAIN condition
        LocalResolver->>Client: Return NXDOMAIN + set Zero bit (fallthrough signal)
        Client->>UpstreamResolver: Forward to next handler (Zero bit triggers fallthrough)
        UpstreamResolver->>UpstreamResolver: Clear Zero bit to prevent cascading signals
        UpstreamResolver->>Client: Return upstream response
    else Zone found and authoritative
        LocalResolver->>LocalResolver: Serve from local records
        LocalResolver->>Client: Return response
    else Zone not found
        LocalResolver->>UpstreamResolver: Forward to next handler
        UpstreamResolver->>Client: Return response
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested reviewers

  • mlsmaycon

Poem

🐰 A rabbit's DNS hop so spry,
From skipping zones to truth we spy,
NonAuthoritative marks the way,
Fallthrough signals won't delay,
Zero bits cleared, responses fly! 🌟

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 77.78% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main feature: implementing fall-through behavior for custom DNS zones in the client DNS resolver.
Description check ✅ Passed The description provides a clear rationale for the change (custom zones as overlays), confirms feature enhancement and test additions, and explains why documentation is not needed.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Base automatically changed from musl-cname to main January 12, 2026 11:35
@sonarqubecloud
Copy link
Copy Markdown

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
dns/dns.go (1)

50-51: LGTM! Better semantic naming.

The rename from SkipPTRProcess to NonAuthoritative better conveys the semantic meaning of this field. Consider expanding the comment to mention the fallthrough behavior on NXDOMAIN, similar to the proto definition comment.

📝 Optional: More descriptive comment
-	// NonAuthoritative marks user-created zones
+	// NonAuthoritative marks user-created zones; non-authoritative zones will
+	// fallthrough to lower-priority handlers on NXDOMAIN and skip PTR processing.
 	NonAuthoritative bool
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 394ad19 and e655551.

⛔ Files ignored due to path filters (1)
  • shared/management/proto/management.pb.go is excluded by !**/*.pb.go
📒 Files selected for processing (11)
  • client/internal/dns.go
  • client/internal/dns/local/local.go
  • client/internal/dns/local/local_test.go
  • client/internal/dns/server.go
  • client/internal/dns/server_test.go
  • client/internal/dns/upstream.go
  • client/internal/engine.go
  • client/internal/routemanager/dnsinterceptor/handler.go
  • dns/dns.go
  • management/internals/shared/grpc/conversion.go
  • shared/management/proto/management.proto
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: bcmmbaga
Repo: netbirdio/netbird PR: 4849
File: management/internals/modules/zones/manager/manager.go:55-86
Timestamp: 2025-11-28T12:20:47.254Z
Learning: In the NetBird management server, DNS zones without records are automatically filtered out in network map generation (filterPeerAppliedZones in management/internals/controllers/network_map/controller/controller.go checks `len(zone.Records) == 0`). Therefore, CreateZone operations don't need to call UpdateAccountPeers since empty zones don't affect the network map.
📚 Learning: 2025-11-28T12:20:47.254Z
Learnt from: bcmmbaga
Repo: netbirdio/netbird PR: 4849
File: management/internals/modules/zones/manager/manager.go:55-86
Timestamp: 2025-11-28T12:20:47.254Z
Learning: In the NetBird management server, DNS zones without records are automatically filtered out in network map generation (filterPeerAppliedZones in management/internals/controllers/network_map/controller/controller.go checks `len(zone.Records) == 0`). Therefore, CreateZone operations don't need to call UpdateAccountPeers since empty zones don't affect the network map.

Applied to files:

  • client/internal/engine.go
  • client/internal/dns/upstream.go
  • client/internal/dns/server.go
  • client/internal/dns.go
  • client/internal/dns/local/local.go
  • client/internal/dns/server_test.go
  • client/internal/dns/local/local_test.go
🧬 Code graph analysis (3)
client/internal/engine.go (2)
dns/dns.go (1)
  • CustomZone (43-52)
shared/management/proto/management.pb.go (3)
  • CustomZone (2868-2877)
  • CustomZone (2892-2892)
  • CustomZone (2907-2909)
management/internals/shared/grpc/conversion.go (2)
dns/dns.go (1)
  • SimpleRecord (55-66)
shared/management/proto/management.pb.go (3)
  • SimpleRecord (2940-2950)
  • SimpleRecord (2965-2965)
  • SimpleRecord (2980-2982)
client/internal/dns/server.go (1)
dns/dns.go (3)
  • CustomZone (43-52)
  • SimpleRecord (55-66)
  • DefaultClass (21-21)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: Windows
  • GitHub Check: Linux
  • GitHub Check: Client / Unit
  • GitHub Check: release
  • GitHub Check: release_ui
  • GitHub Check: release_ui_darwin
  • GitHub Check: JS / Lint
  • GitHub Check: Client / Unit
  • GitHub Check: Build Cache
  • GitHub Check: iOS / Build
  • GitHub Check: Android / Build
🔇 Additional comments (20)
client/internal/engine.go (1)

1254-1263: Backward compatibility logic is well-implemented.

The approach of treating single-zone scenarios as authoritative to maintain backward compatibility with older servers that only send the peer FQDN zone is sound. The comment clearly documents the reasoning, and the && !singleZoneCompat condition correctly ensures:

  • Old servers (single zone, no field 4 set) → zone treated as authoritative (existing behavior preserved)
  • New servers (multiple zones, field 4 set) → respects NonAuthoritative flag for fallthrough behavior

The extraction of protoZones before the loop avoids redundant getter calls.

client/internal/dns/upstream.go (1)

205-207: LGTM! Good defensive measure.

Clearing the Zero bit from external upstream responses prevents external DNS servers from inadvertently triggering the internal fallthrough signaling mechanism. This is a sensible security hardening for the new handler chain continuation logic.

management/internals/shared/grpc/conversion.go (1)

375-390: LGTM! Correct proto field mapping.

The NonAuthoritative field is properly propagated from the domain model to the proto message, maintaining consistency with the renamed field in the proto definition.

client/internal/routemanager/dnsinterceptor/handler.go (1)

328-330: LGTM! Consistent with upstream.go change.

Clearing the Zero bit from peer DNS responses provides the same defensive measure as in upstream.go, preventing external sources (routing peers in this case) from manipulating the internal fallthrough signaling mechanism.

shared/management/proto/management.proto (1)

467-469: LGTM! Wire-compatible field rename with good documentation.

The field tag number (4) is preserved, ensuring wire compatibility with existing clients. The comment clearly documents both behaviors: fallthrough to lower-priority handlers on NXDOMAIN and PTR processing skip for user-created zones.

client/internal/dns.go (1)

79-81: LGTM!

The field rename from SkipPTRProcess to NonAuthoritative is consistent with the broader refactoring. The logic correctly skips PTR record collection for non-authoritative (user-created) zones.

client/internal/dns/server_test.go (3)

131-131: LGTM!

The test field rename from initLocalRecords to initLocalZones with type []nbdns.CustomZone aligns with the updated API.


388-388: LGTM!

The test correctly passes []nbdns.CustomZone to the updated Update method.


2015-2016: LGTM!

The test correctly calls buildLocalHandlerUpdate with config.CustomZones and expects the updated return signature ([]handlerWrapper, []nbdns.CustomZone, error).

client/internal/dns/server.go (2)

488-501: LGTM!

The updated call site correctly uses the new return signature and passes the CustomZone slice to the local resolver.


661-691: LGTM!

The refactored buildLocalHandlerUpdate function:

  1. Correctly processes CustomZone objects instead of separate records and zone domains
  2. Filters out invalid class types while preserving valid records
  3. Returns the processed zones preserving the NonAuthoritative flag for downstream use

The loop variable customZone is a struct copy, so assigning customZone.Records = localRecords correctly builds a filtered copy without mutating the original input.

client/internal/dns/local/local_test.go (4)

127-143: LGTM!

Test correctly updated to use []nbdns.CustomZone with proper structure containing Domain and Records fields.


1094-1188: Comprehensive fallthrough test coverage.

This test properly validates:

  1. Authoritative zones return NXDOMAIN without fallthrough (Zero bit = false)
  2. Non-authoritative zones trigger fallthrough (Zero bit = true)
  3. Record matches in both zone types return normally without fallthrough

The use of responseMSG.MsgHdr.Zero to signal fallthrough is correctly verified.


1319-1342: Good edge case coverage for case-insensitive domain matching.

This ensures the fallthrough logic correctly handles zone domains with different casing (e.g., EXAMPLE.COM. matches queries for nonexistent.example.com.).


1344-1420: Useful performance benchmarks.

The benchmarks cover:

  • Best case: immediate zone match
  • Worst case: many zones, no match, many labels
  • Typical case: few zones with subdomain match
  • Many zones: 100 zones with mid-list match

These help ensure zone lookup remains efficient as the number of zones grows.

client/internal/dns/local/local.go (5)

30-36: LGTM!

The zones map with bool value for NonAuthoritative status is a clean representation that supports O(1) zone lookup and O(k) suffix matching where k is the number of labels.


102-105: Fallthrough signaling correctly implemented.

When NXDOMAIN is determined for a query in a non-authoritative zone, the resolver signals the handler chain to continue to the next handler. This enables the overlay behavior described in the PR objectives.


130-145: Efficient zone lookup implementation.

The findZone function correctly performs reverse suffix matching with O(k) complexity where k is the number of labels in the query name. The algorithm strips leading labels progressively to find the longest matching zone.

Note: This function must only be called with the read lock held (via shouldFallthrough or isInManagedZone), which is currently the case.


383-402: LGTM!

The Update function is cleanly refactored to:

  1. Accept []nbdns.CustomZone as a single parameter
  2. Clear existing state before populating
  3. Properly normalize zone domains (lowercase + FQDN)
  4. Store NonAuthoritative flag per zone
  5. Register individual records within each zone

This aligns with the zone-centric data model introduced by the PR.


157-164: No action required. The Zero bit mechanism for handler chain fallthrough signaling is correctly implemented. The ResponseWriterChain.WriteMsg() in handler_chain.go properly checks for the signal (NXDOMAIN with Zero bit set) and continues to the next handler, while upstream.go protects against abuse by clearing the bit from external responses.

@lixmal lixmal merged commit b12c084 into main Jan 12, 2026
40 of 41 checks passed
@lixmal lixmal deleted the local-resolver-fallthrough branch January 12, 2026 12:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants