Skip to content

[client] Chase CNAMEs in the local resolver to ensure musl compatibility#5046

Merged
lixmal merged 1 commit intomainfrom
musl-cname
Jan 12, 2026
Merged

[client] Chase CNAMEs in the local resolver to ensure musl compatibility#5046
lixmal merged 1 commit intomainfrom
musl-cname

Conversation

@lixmal
Copy link
Copy Markdown
Collaborator

@lixmal lixmal commented Jan 7, 2026

Describe your changes

  • Chase CNAMEs in the local resolver, either in our managed zones or external. This is required to satisfy musl libc, which won't follow up on CNAMEs by itself
  • Refactor and extract some code from the DNS forwarder for use in the local resolver
  • Improve DNS logging across the board:
    • Add request_id to local resolver and DNS forwarder
    • Move some logs to the handler chain and reduce some excessive logging there
    • Add dns_id from the DNS request. This can be tracked across routing client/peers and various tools like tcpdump

Issue ticket number and link

Stack

Checklist

  • Is it a bug fix
  • Is a typo/documentation fix
  • Is a feature enhancement
  • It is a refactor
  • Created tests that fail without the change (if possible)

By submitting this pull request, you confirm that you have read and agree to the terms of the Contributor License Agreement.

Documentation

Select exactly one:

  • I added/updated documentation for this change
  • Documentation is not needed for this change (explain why)

Docs PR URL (required if "docs added" is checked)

Paste the PR link from https://github.com/netbirdio/docs here:

https://github.com/netbirdio/docs/pull/__

Summary by CodeRabbit

Release Notes

  • New Features

    • Added request tracing with unique request IDs for improved diagnostics and logging
    • Enhanced CNAME chain resolution with proper handling of domain aliases
    • Implemented external DNS resolution fallback for domains not found locally
    • Added zone-aware DNS query handling
    • Introduced response metadata tracking for better observability
  • Bug Fixes

    • Improved DNS error handling with better distinction between domain-not-found and server errors

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jan 7, 2026

📝 Walkthrough

Walkthrough

This PR introduces comprehensive DNS request tracing and metadata management across the DNS resolution stack, implements CNAME chain resolution with external lookup support in the local resolver, centralizes DNS resolution utilities into a new module, and threads per-request logging context throughout handlers and forwarders to enable detailed observability.

Changes

Cohort / File(s) Summary
Request Tracing & Handler Chain
client/internal/dns/handler_chain.go
Added per-request ID generation and metadata tracking; expanded ResponseWriterChain with requestID, response, and meta fields; added RequestID() and SetMeta() methods; implemented logHandlers() and logResponse() for handler visibility and response logging.
DNS Resolution Utilities
client/internal/dns/resutil/resolve.go
New module providing shared DNS helpers: GenerateRequestID(), GetRequestID(), IPsToRRs(), NetworkForQtype(), SetMeta(), LookupIP(), FormatAnswers(), and LookupResult struct for encapsulating resolution outcomes with structured error handling.
Local Resolver with CNAME Support
client/internal/dns/local/local.go
Added CNAME chain resolution via lookupCNAMEChain() and resolveCNAMETarget(); introduced external resolution (resolveExternal()) with configurable timeout; added zone-aware behavior via Update() signature change; implemented Stop() method for lifecycle management and structured DNS error logging via logDNSError().
Local Resolver Tests
client/internal/dns/local/local_test.go
Extended tests with mock resolver injection; added comprehensive test coverage for CNAME chain resolution, max-depth handling, zone-aware resolution, external resolution, authoritative flag behavior, and stop/cancellation semantics.
Server & Config Propagation
client/internal/dns/server.go, client/internal/dns/server_test.go
Updated buildLocalHandlerUpdate() to extract and return zones information; modified call sites to pass zones to localResolver.Update(records, zones) instead of just records.
Upstream Handler Refactoring
client/internal/dns/upstream.go
Replaced internal request-ID generation with centralized resutil.GetRequestID(w); removed GenerateRequestID() function; updated response metadata handling to use resutil.SetMeta() instead of timing logs.
DNS Forwarder Enhancement
client/internal/dns/dnsfwd/forwarder.go, client/internal/dns/dnsfwd/forwarder_test.go
Added logger parameter to handleDNSQuery() and handleDNSError(); replaced direct IP lookup with resutil.LookupIP() abstraction; centralized network type determination via resutil.NetworkForQtype(); updated all test call sites with logger parameter.
Route Manager DNS Handler
client/internal/routemanager/dnsinterceptor/handler.go
Threaded logger parameter through multiple methods (writeMsg(), logPrefixChanges(), updateDomainPrefixes(), addDNATMappings(), removeDNATMappings(), replaceIPsInDNSResponse()); integrated resutil.GetRequestID() and resutil.SetMeta() for request tracking and metadata attachment.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant HandlerChain as Handler Chain
    participant LocalResolver as Local Resolver
    participant ExternalResolver as External Resolver
    participant ResponseWriter as Response Writer

    Client->>HandlerChain: DNS Query (domain, type)
    activate HandlerChain
    Note over HandlerChain: Generate requestID<br/>Create logger with request_id
    
    HandlerChain->>LocalResolver: ServeDNS(chainWriter with requestID)
    activate LocalResolver
    LocalResolver->>LocalResolver: lookupRecords(qname)
    
    alt CNAME Record Found
        LocalResolver->>LocalResolver: lookupCNAMEChain(target)
        activate LocalResolver
        loop While CNAME target exists
            LocalResolver->>LocalResolver: resolveCNAMETarget(current)
            alt Target in local zone
                LocalResolver->>LocalResolver: Resolve locally
            else Target external
                LocalResolver->>ExternalResolver: resolveExternal(target)
                activate ExternalResolver
                ExternalResolver-->>LocalResolver: IPs or error
                deactivate ExternalResolver
            end
        end
        LocalResolver->>LocalResolver: buildChainResult(chain records)
        deactivate LocalResolver
    else No CNAME
        LocalResolver->>LocalResolver: Return matched records
    end
    
    LocalResolver->>LocalResolver: determineRcode(result)
    LocalResolver->>ResponseWriter: Write DNS response
    activate ResponseWriter
    Note over ResponseWriter: Capture response<br/>Set metadata (requestID, result info)
    ResponseWriter-->>LocalResolver: Success
    deactivate ResponseWriter
    
    LocalResolver-->>HandlerChain: Response
    deactivate LocalResolver
    
    HandlerChain->>HandlerChain: logResponse(rcode, answers, timing)
    HandlerChain-->>Client: DNS Response
    deactivate HandlerChain
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • pappz

Poem

🐰 Through handler chains we hop so spry,
With request IDs marking our way,
CNAME threads the rabbits tie,
Tracing queries through night and day,
Metadata gardens where logs now play! 🌿✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 69.23% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly describes the main change: adding CNAME chasing to the local resolver for musl compatibility, which is the primary technical change across multiple DNS files.
Description check ✅ Passed The PR description covers the main changes (CNAME chasing, refactoring, improved DNS logging) and confirms proper checklist completion and CLA agreement, meeting the template requirements.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@lixmal lixmal changed the title [client] Chase CNAMEs in local resolver to ensure musl compatibility [client] Chase CNAMEs in the local resolver to ensure musl compatibility Jan 7, 2026
@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented Jan 7, 2026

Quality Gate Passed Quality Gate passed

Issues
0 New issues
2 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
client/internal/dns/local/local.go (2)

194-230: Consider documenting the magic number for max depth.

The CNAME chain following is well-implemented with depth protection against infinite loops. The max depth of 8 aligns with common resolver implementations.

💡 Optional: Extract maxDepth as a package-level constant for documentation
+// maxCNAMEDepth limits CNAME chain following to prevent infinite loops.
+// This aligns with common resolver implementations (BIND uses 8, Unbound uses 12).
+const maxCNAMEDepth = 8
+
 func (d *Resolver) lookupCNAMEChain(logger *log.Entry, cnameQuestion dns.Question, targetType uint16) lookupResult {
-	const maxDepth = 8
 	var chain []dns.RR
 
-	for range maxDepth {
+	for range maxCNAMEDepth {

256-279: The sentinel value -1 for "keep following chain" is functional but unconventional.

The logic is correct, but using -1 as a magic sentinel for "continue following" could be confusing for future maintainers. Consider using a dedicated field or constant.

💡 Optional: Use a named constant for the sentinel
+// rcodeFollowChain is a sentinel value indicating the CNAME chain should continue.
+// This is an internal-only value, never returned to clients.
+const rcodeFollowChain = -1
+
 func (d *Resolver) resolveCNAMETarget(logger *log.Entry, targetName string, targetType uint16, qclass uint16) lookupResult {
 	// ...
 	// another CNAME, keep following
 	if d.hasRecord(dns.Question{Name: targetName, Qtype: dns.TypeCNAME, Qclass: qclass}) {
-		return lookupResult{rcode: -1}
+		return lookupResult{rcode: rcodeFollowChain}
 	}
client/internal/dns/resutil/resolve.go (1)

27-57: Consider handling IPv4-mapped IPv6 addresses.

If an IPv4-mapped IPv6 address (e.g., ::ffff:192.0.2.1) is passed to IPsToRRs, Is6() returns true, creating an AAAA record with a 16-byte mapped address instead of an A record. While LookupIP does call Unmap() before returning, direct callers of IPsToRRs could encounter unexpected behavior.

🔧 Suggested defensive handling
 func IPsToRRs(name string, ips []netip.Addr, ttl uint32) []dns.RR {
 	var result []dns.RR

 	for _, ip := range ips {
+		ip = ip.Unmap() // Normalize IPv4-mapped IPv6 to IPv4
 		if ip.Is6() {
 			result = append(result, &dns.AAAA{
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f012fb8 and 87c99c3.

📒 Files selected for processing (10)
  • client/internal/dns/handler_chain.go
  • client/internal/dns/local/local.go
  • client/internal/dns/local/local_test.go
  • client/internal/dns/resutil/resolve.go
  • client/internal/dns/server.go
  • client/internal/dns/server_test.go
  • client/internal/dns/upstream.go
  • client/internal/dnsfwd/forwarder.go
  • client/internal/dnsfwd/forwarder_test.go
  • client/internal/routemanager/dnsinterceptor/handler.go
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-28T12:20:47.254Z
Learnt from: bcmmbaga
Repo: netbirdio/netbird PR: 4849
File: management/internals/modules/zones/manager/manager.go:55-86
Timestamp: 2025-11-28T12:20:47.254Z
Learning: In the NetBird management server, DNS zones without records are automatically filtered out in network map generation (filterPeerAppliedZones in management/internals/controllers/network_map/controller/controller.go checks `len(zone.Records) == 0`). Therefore, CreateZone operations don't need to call UpdateAccountPeers since empty zones don't affect the network map.

Applied to files:

  • client/internal/dns/server.go
  • client/internal/dns/server_test.go
🧬 Code graph analysis (5)
client/internal/dns/upstream.go (1)
client/internal/dns/resutil/resolve.go (2)
  • GetRequestID (84-91)
  • SetMeta (94-98)
client/internal/dns/local/local.go (1)
dns/dns.go (1)
  • SimpleRecord (55-66)
client/internal/dns/server.go (2)
dns/dns.go (2)
  • CustomZone (43-52)
  • SimpleRecord (55-66)
management/server/networks/resources/types/resource.go (1)
  • Domain (25-25)
client/internal/dns/local/local_test.go (3)
dns/dns.go (2)
  • SimpleRecord (55-66)
  • DefaultClass (21-21)
client/internal/dns/local/local.go (1)
  • NewResolver (41-49)
client/internal/dns/test/mock.go (1)
  • MockResponseWriter (9-11)
client/internal/routemanager/dnsinterceptor/handler.go (1)
client/internal/dns/resutil/resolve.go (2)
  • GetRequestID (84-91)
  • SetMeta (94-98)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
  • GitHub Check: Client / Unit
  • GitHub Check: release_ui_darwin
  • GitHub Check: FreeBSD Port / Build & Test
  • GitHub Check: release
  • GitHub Check: release_ui
  • GitHub Check: Client / Unit
  • GitHub Check: Build Cache
  • GitHub Check: Android / Build
  • GitHub Check: iOS / Build
  • GitHub Check: Linux
  • GitHub Check: Windows
  • GitHub Check: JS / Lint
  • GitHub Check: Darwin
  • GitHub Check: Client / Unit
🔇 Additional comments (58)
client/internal/dns/server_test.go (4)

388-388: LGTM!

The Update call is correctly updated to include the new zones parameter, passing nil since zone-aware behavior isn't being tested here.


514-514: LGTM!

Consistent with the updated Update API signature.


2016-2016: LGTM!

The test correctly handles the new 4-value return from buildLocalHandlerUpdate, discarding the zones slice with _ since it's not needed for this specific test assertion.


2077-2077: LGTM!

Consistent handling of the updated return signature.

client/internal/dnsfwd/forwarder_test.go (10)

13-13: LGTM!

Import added to support the new logger parameter requirement for handleDNSQuery.


321-321: LGTM!

Test correctly updated to pass a logger entry as the first parameter to handleDNSQuery, matching the new signature.


469-469: LGTM!

Consistent logger parameter usage across test cases.


531-531: LGTM!

Consistent with the updated API.


608-608: LGTM!

Consistent logger parameter usage.


678-688: LGTM!

Both cache-related test calls correctly pass the logger parameter.


718-732: LGTM!

Cache normalization test correctly uses the new signature for both query invocations.


787-787: LGTM!

Multiple overlapping patterns test updated correctly.


908-908: LGTM!

NODATA vs NXDOMAIN test uses the correct signature.


941-941: LGTM!

Empty query test correctly passes logger to handleDNSQuery.

client/internal/dns/upstream.go (3)

23-23: LGTM!

Import added for the new centralized DNS utilities package.


116-116: LGTM!

Good refactoring to use centralized resutil.GetRequestID(w) which extracts the request ID from the ResponseWriterChain if available, ensuring consistent request tracing across the handler chain.


200-211: LGTM!

The timing log has been appropriately replaced with resutil.SetMeta to attach upstream metadata to the response. This enables structured observability in the handler chain where logging now occurs centrally. The function correctly returns true even on write errors since the upstream query itself succeeded.

client/internal/dns/local/local_test.go (9)

22-32: LGTM!

Well-designed mock resolver that implements the resolver interface, enabling clean dependency injection for testing external resolution paths.


132-132: LGTM!

Update calls correctly adapted to the new signature with nil zones.


618-676: LGTM!

Excellent test coverage for CNAME chain resolution. Tests verify:

  • Simple internal CNAME chains produce both CNAME and A records
  • Multi-hop chains are correctly followed
  • CNAME to non-existent internal target returns only the CNAME

679-751: LGTM!

Good coverage for max depth protection:

  • Chain at max depth (7 hops + A record = 8 total) resolves correctly
  • Chain exceeding max depth stops appropriately
  • Circular CNAME references are protected by the depth limit

753-852: LGTM!

Comprehensive external CNAME resolution tests covering:

  • IPv4 and IPv6 external resolution
  • Concurrent external resolution safety
  • Proper CNAME + address record combination in responses

854-886: LGTM!

Zone management tests properly verify:

  • Zones are set correctly via Update
  • isInManagedZone is case-insensitive
  • Zones are cleared when passed nil

888-1055: LGTM!

Excellent RFC-compliant zone-aware CNAME resolution tests:

  • CNAME target in managed zone returns NXDOMAIN per RFC 6604
  • External CNAME targets skip zone check and resolve externally
  • NODATA vs NXDOMAIN distinction is properly handled
  • Table-driven tests cover all external resolution outcomes (NXDOMAIN, SERVFAIL variants, success)

1057-1096: LGTM!

Authoritative flag tests verify correct AA bit behavior:

  • Direct local record lookups are authoritative
  • External resolution results are not authoritative

1098-1169: LGTM!

Thorough Stop/cleanup tests:

  • Stop clears all state (records, domains, zones)
  • Stop is idempotent (safe to call multiple times)
  • Stop cancels in-flight external resolutions properly

The context cancellation test is particularly valuable for ensuring graceful shutdown behavior.

client/internal/dns/server.go (3)

488-488: LGTM!

Call site correctly updated to handle the new 4-value return from buildLocalHandlerUpdate.


502-502: LGTM!

The local resolver now receives zone information, enabling zone-aware CNAME resolution. This is the key integration point for the musl compatibility fix.


662-691: LGTM!

Well-implemented zone extraction from custom zones:

  • New zones slice correctly collects all custom zone domains
  • Each custom zone domain is converted to domain.Domain type
  • Zones are returned alongside mux updates and local records

This enables the local resolver to know which zones it manages, preventing unnecessary external lookups for CNAME targets within managed zones.

client/internal/dns/local/local.go (9)

24-28: LGTM!

Good addition of the timeout constant and resolver interface. The 4-second timeout for external resolution prevents blocking while giving sufficient time for DNS lookups.


30-49: LGTM!

The Resolver struct is properly extended with:

  • zones for zone-aware resolution
  • resolver interface for testability (allows mock injection)
  • ctx/cancel for graceful shutdown support

The constructor correctly initializes the context.


60-71: LGTM!

The Stop method properly:

  1. Cancels the context (stops in-flight external resolutions)
  2. Clears all state under lock
  3. Handles nil cancel gracefully

81-103: LGTM!

ServeDNS properly integrates the new lookup flow:

  • Uses lookupRecords which returns the full result including rcode
  • Sets Authoritative based on whether external data was used
  • Delegates rcode determination to determineRcode for proper handling

105-121: LGTM!

Correct implementation of RFC 6604 rcode determination:

  • CNAME chain rcodes are propagated from final target resolution
  • NODATA case (domain exists but no records of requested type) returns Success
  • Only returns NXDOMAIN when domain truly doesn't exist

132-148: LGTM!

Zone membership check is correctly implemented:

  • Case-insensitive comparison
  • Handles both exact match and subdomain cases
  • Properly FQDNs the names for comparison

294-322: LGTM!

External resolution is well-implemented:

  • Falls back to net.DefaultResolver when no mock is injected
  • Uses context with timeout for cancellation support
  • Properly handles the lookup result and builds RRs from IPs
  • Uses resutil utilities for consistent handling

324-344: LGTM!

Good structured error logging with appropriate log levels:

  • Trace for not-found (expected case)
  • Debug for actual errors with server context when available

346-363: LGTM!

The Update method correctly stores the zones for zone-aware resolution while maintaining the existing record update behavior.

client/internal/dnsfwd/forwarder.go (5)

21-21: LGTM!

Import added for centralized DNS utilities.


193-237: LGTM!

The handleDNSQuery method is well-refactored:

  • Logger parameter enables per-request tracing context
  • resutil.NetworkForQtype simplifies type-to-network mapping
  • resutil.LookupIP provides a clean result object pattern
  • resutil.IPsToRRs standardizes RR construction

The error handling flow is cleaner with the result object.


239-270: LGTM!

UDP handler correctly:

  1. Creates a structured logger with request_id and dns_id for request tracing
  2. Measures response time
  3. Logs a concise response summary with domain, rcode, answers, and duration

272-291: LGTM!

TCP handler follows the same consistent pattern as UDP, ensuring uniform logging across both protocols.


329-387: LGTM!

The handleDNSError method is well-refactored to use the result object pattern:

  • Uses result.Rcode directly for response
  • Properly handles cache fallback on upstream failures
  • Provides contextual logging with server info when available from net.DNSError
  • Distinguishes between cacheable negative results (NXDOMAIN/NODATA) and transient failures
client/internal/dns/handler_chain.go (5)

46-67: LGTM!

The ResponseWriterChain extensions cleanly support per-request metadata propagation. The lazy initialization of the meta map in SetMeta is appropriate since each DNS request gets its own writer instance.


69-77: LGTM!

Response capture at line 75 correctly stores the message before the underlying write, enabling post-handler logging. The continue-signal path (lines 71-74) appropriately skips capture since the chain will try the next handler.


174-189: LGTM!

The trace-level guard at line 176-178 prevents unnecessary work when trace logging is disabled. The use of strings.Builder for the outer structure is appropriate for this diagnostic output.


191-252: Well-structured request tracing implementation.

The per-request logger with request_id and dns_id fields enables effective cross-component correlation. The special-casing to suppress cache-priority continue logs (line 234) is a thoughtful noise reduction.


254-267: LGTM!

The nil-response guard prevents panics in edge cases. The integration with resutil.FormatAnswers maintains consistency with the unified logging format introduced across DNS components.

client/internal/dns/resutil/resolve.go (6)

17-25: LGTM!

Using crypto/rand is appropriate for trace IDs. The empty-string fallback on error is acceptable since request IDs are for observability rather than security-critical operations.


59-70: LGTM!

Clean mapping function. Returning an empty string for unsupported query types allows callers to handle appropriately.


100-139: LGTM!

The LookupResult struct preserving the original error while providing a normalized Rcode is a good pattern. The IPv4-mapped address normalization at lines 117-120 ensures consistent address handling.


141-173: Well-reasoned NXDOMAIN vs NODATA distinction.

The approach of querying the alternative record type to distinguish NXDOMAIN from NODATA is sound for musl compatibility. The conservative fallback to RcodeSuccess on non-DNS errors (line 168) prevents false NXDOMAINs.

One consideration: the alternative query at line 161 reuses the original context, so if the initial query consumed most of the timeout budget, the alternative query might timeout. This is likely acceptable since it fails gracefully to RcodeSuccess.


76-98: LGTM!

The interface-based metadata propagation allows clean integration with existing response writers. The fallback to generate a new ID in GetRequestID ensures tracing works even for non-chained writers.


175-197: LGTM!

Compact and readable log formatting. The type-specific handling for common DNS record types provides useful output while the fallback to dns.TypeToString handles edge cases gracefully.

client/internal/routemanager/dnsinterceptor/handler.go (4)

222-290: LGTM!

The structured logger integration with request_id and dns_id aligns with the handler chain's tracing pattern. Using resutil.GetRequestID(w) correctly reuses the ID from the chain rather than generating a new one, enabling end-to-end request correlation.


323-398: LGTM!

Logger threading through writeMsg and logPrefixChanges ensures trace context is preserved across the call chain. The pattern is applied consistently.


400-518: LGTM!

Consistent logger propagation through the DNAT management methods maintains request tracing throughout the domain prefix update flow.


521-529: LGTM!

Creating a fresh logger entry here is appropriate since cleanupDNATMappings is called from RemoveRoute during teardown, not in the context of a DNS request. The logs will still be useful for debugging cleanup operations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants