[client] Chase CNAMEs in the local resolver to ensure musl compatibility#5046
[client] Chase CNAMEs in the local resolver to ensure musl compatibility#5046
Conversation
📝 WalkthroughWalkthroughThis PR introduces comprehensive DNS request tracing and metadata management across the DNS resolution stack, implements CNAME chain resolution with external lookup support in the local resolver, centralizes DNS resolution utilities into a new module, and threads per-request logging context throughout handlers and forwarders to enable detailed observability. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant HandlerChain as Handler Chain
participant LocalResolver as Local Resolver
participant ExternalResolver as External Resolver
participant ResponseWriter as Response Writer
Client->>HandlerChain: DNS Query (domain, type)
activate HandlerChain
Note over HandlerChain: Generate requestID<br/>Create logger with request_id
HandlerChain->>LocalResolver: ServeDNS(chainWriter with requestID)
activate LocalResolver
LocalResolver->>LocalResolver: lookupRecords(qname)
alt CNAME Record Found
LocalResolver->>LocalResolver: lookupCNAMEChain(target)
activate LocalResolver
loop While CNAME target exists
LocalResolver->>LocalResolver: resolveCNAMETarget(current)
alt Target in local zone
LocalResolver->>LocalResolver: Resolve locally
else Target external
LocalResolver->>ExternalResolver: resolveExternal(target)
activate ExternalResolver
ExternalResolver-->>LocalResolver: IPs or error
deactivate ExternalResolver
end
end
LocalResolver->>LocalResolver: buildChainResult(chain records)
deactivate LocalResolver
else No CNAME
LocalResolver->>LocalResolver: Return matched records
end
LocalResolver->>LocalResolver: determineRcode(result)
LocalResolver->>ResponseWriter: Write DNS response
activate ResponseWriter
Note over ResponseWriter: Capture response<br/>Set metadata (requestID, result info)
ResponseWriter-->>LocalResolver: Success
deactivate ResponseWriter
LocalResolver-->>HandlerChain: Response
deactivate LocalResolver
HandlerChain->>HandlerChain: logResponse(rcode, answers, timing)
HandlerChain-->>Client: DNS Response
deactivate HandlerChain
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (3)
client/internal/dns/local/local.go (2)
194-230: Consider documenting the magic number for max depth.The CNAME chain following is well-implemented with depth protection against infinite loops. The max depth of 8 aligns with common resolver implementations.
💡 Optional: Extract maxDepth as a package-level constant for documentation
+// maxCNAMEDepth limits CNAME chain following to prevent infinite loops. +// This aligns with common resolver implementations (BIND uses 8, Unbound uses 12). +const maxCNAMEDepth = 8 + func (d *Resolver) lookupCNAMEChain(logger *log.Entry, cnameQuestion dns.Question, targetType uint16) lookupResult { - const maxDepth = 8 var chain []dns.RR - for range maxDepth { + for range maxCNAMEDepth {
256-279: The sentinel value -1 for "keep following chain" is functional but unconventional.The logic is correct, but using
-1as a magic sentinel for "continue following" could be confusing for future maintainers. Consider using a dedicated field or constant.💡 Optional: Use a named constant for the sentinel
+// rcodeFollowChain is a sentinel value indicating the CNAME chain should continue. +// This is an internal-only value, never returned to clients. +const rcodeFollowChain = -1 + func (d *Resolver) resolveCNAMETarget(logger *log.Entry, targetName string, targetType uint16, qclass uint16) lookupResult { // ... // another CNAME, keep following if d.hasRecord(dns.Question{Name: targetName, Qtype: dns.TypeCNAME, Qclass: qclass}) { - return lookupResult{rcode: -1} + return lookupResult{rcode: rcodeFollowChain} }client/internal/dns/resutil/resolve.go (1)
27-57: Consider handling IPv4-mapped IPv6 addresses.If an IPv4-mapped IPv6 address (e.g.,
::ffff:192.0.2.1) is passed toIPsToRRs,Is6()returns true, creating an AAAA record with a 16-byte mapped address instead of an A record. WhileLookupIPdoes callUnmap()before returning, direct callers ofIPsToRRscould encounter unexpected behavior.🔧 Suggested defensive handling
func IPsToRRs(name string, ips []netip.Addr, ttl uint32) []dns.RR { var result []dns.RR for _, ip := range ips { + ip = ip.Unmap() // Normalize IPv4-mapped IPv6 to IPv4 if ip.Is6() { result = append(result, &dns.AAAA{
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
client/internal/dns/handler_chain.goclient/internal/dns/local/local.goclient/internal/dns/local/local_test.goclient/internal/dns/resutil/resolve.goclient/internal/dns/server.goclient/internal/dns/server_test.goclient/internal/dns/upstream.goclient/internal/dnsfwd/forwarder.goclient/internal/dnsfwd/forwarder_test.goclient/internal/routemanager/dnsinterceptor/handler.go
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-28T12:20:47.254Z
Learnt from: bcmmbaga
Repo: netbirdio/netbird PR: 4849
File: management/internals/modules/zones/manager/manager.go:55-86
Timestamp: 2025-11-28T12:20:47.254Z
Learning: In the NetBird management server, DNS zones without records are automatically filtered out in network map generation (filterPeerAppliedZones in management/internals/controllers/network_map/controller/controller.go checks `len(zone.Records) == 0`). Therefore, CreateZone operations don't need to call UpdateAccountPeers since empty zones don't affect the network map.
Applied to files:
client/internal/dns/server.goclient/internal/dns/server_test.go
🧬 Code graph analysis (5)
client/internal/dns/upstream.go (1)
client/internal/dns/resutil/resolve.go (2)
GetRequestID(84-91)SetMeta(94-98)
client/internal/dns/local/local.go (1)
dns/dns.go (1)
SimpleRecord(55-66)
client/internal/dns/server.go (2)
dns/dns.go (2)
CustomZone(43-52)SimpleRecord(55-66)management/server/networks/resources/types/resource.go (1)
Domain(25-25)
client/internal/dns/local/local_test.go (3)
dns/dns.go (2)
SimpleRecord(55-66)DefaultClass(21-21)client/internal/dns/local/local.go (1)
NewResolver(41-49)client/internal/dns/test/mock.go (1)
MockResponseWriter(9-11)
client/internal/routemanager/dnsinterceptor/handler.go (1)
client/internal/dns/resutil/resolve.go (2)
GetRequestID(84-91)SetMeta(94-98)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
- GitHub Check: Client / Unit
- GitHub Check: release_ui_darwin
- GitHub Check: FreeBSD Port / Build & Test
- GitHub Check: release
- GitHub Check: release_ui
- GitHub Check: Client / Unit
- GitHub Check: Build Cache
- GitHub Check: Android / Build
- GitHub Check: iOS / Build
- GitHub Check: Linux
- GitHub Check: Windows
- GitHub Check: JS / Lint
- GitHub Check: Darwin
- GitHub Check: Client / Unit
🔇 Additional comments (58)
client/internal/dns/server_test.go (4)
388-388: LGTM!The
Updatecall is correctly updated to include the newzonesparameter, passingnilsince zone-aware behavior isn't being tested here.
514-514: LGTM!Consistent with the updated
UpdateAPI signature.
2016-2016: LGTM!The test correctly handles the new 4-value return from
buildLocalHandlerUpdate, discarding the zones slice with_since it's not needed for this specific test assertion.
2077-2077: LGTM!Consistent handling of the updated return signature.
client/internal/dnsfwd/forwarder_test.go (10)
13-13: LGTM!Import added to support the new logger parameter requirement for
handleDNSQuery.
321-321: LGTM!Test correctly updated to pass a logger entry as the first parameter to
handleDNSQuery, matching the new signature.
469-469: LGTM!Consistent logger parameter usage across test cases.
531-531: LGTM!Consistent with the updated API.
608-608: LGTM!Consistent logger parameter usage.
678-688: LGTM!Both cache-related test calls correctly pass the logger parameter.
718-732: LGTM!Cache normalization test correctly uses the new signature for both query invocations.
787-787: LGTM!Multiple overlapping patterns test updated correctly.
908-908: LGTM!NODATA vs NXDOMAIN test uses the correct signature.
941-941: LGTM!Empty query test correctly passes logger to
handleDNSQuery.client/internal/dns/upstream.go (3)
23-23: LGTM!Import added for the new centralized DNS utilities package.
116-116: LGTM!Good refactoring to use centralized
resutil.GetRequestID(w)which extracts the request ID from theResponseWriterChainif available, ensuring consistent request tracing across the handler chain.
200-211: LGTM!The timing log has been appropriately replaced with
resutil.SetMetato attach upstream metadata to the response. This enables structured observability in the handler chain where logging now occurs centrally. The function correctly returnstrueeven on write errors since the upstream query itself succeeded.client/internal/dns/local/local_test.go (9)
22-32: LGTM!Well-designed mock resolver that implements the
resolverinterface, enabling clean dependency injection for testing external resolution paths.
132-132: LGTM!Update calls correctly adapted to the new signature with
nilzones.
618-676: LGTM!Excellent test coverage for CNAME chain resolution. Tests verify:
- Simple internal CNAME chains produce both CNAME and A records
- Multi-hop chains are correctly followed
- CNAME to non-existent internal target returns only the CNAME
679-751: LGTM!Good coverage for max depth protection:
- Chain at max depth (7 hops + A record = 8 total) resolves correctly
- Chain exceeding max depth stops appropriately
- Circular CNAME references are protected by the depth limit
753-852: LGTM!Comprehensive external CNAME resolution tests covering:
- IPv4 and IPv6 external resolution
- Concurrent external resolution safety
- Proper CNAME + address record combination in responses
854-886: LGTM!Zone management tests properly verify:
- Zones are set correctly via
UpdateisInManagedZoneis case-insensitive- Zones are cleared when passed nil
888-1055: LGTM!Excellent RFC-compliant zone-aware CNAME resolution tests:
- CNAME target in managed zone returns NXDOMAIN per RFC 6604
- External CNAME targets skip zone check and resolve externally
- NODATA vs NXDOMAIN distinction is properly handled
- Table-driven tests cover all external resolution outcomes (NXDOMAIN, SERVFAIL variants, success)
1057-1096: LGTM!Authoritative flag tests verify correct AA bit behavior:
- Direct local record lookups are authoritative
- External resolution results are not authoritative
1098-1169: LGTM!Thorough Stop/cleanup tests:
- Stop clears all state (records, domains, zones)
- Stop is idempotent (safe to call multiple times)
- Stop cancels in-flight external resolutions properly
The context cancellation test is particularly valuable for ensuring graceful shutdown behavior.
client/internal/dns/server.go (3)
488-488: LGTM!Call site correctly updated to handle the new 4-value return from
buildLocalHandlerUpdate.
502-502: LGTM!The local resolver now receives zone information, enabling zone-aware CNAME resolution. This is the key integration point for the musl compatibility fix.
662-691: LGTM!Well-implemented zone extraction from custom zones:
- New
zonesslice correctly collects all custom zone domains- Each custom zone domain is converted to
domain.Domaintype- Zones are returned alongside mux updates and local records
This enables the local resolver to know which zones it manages, preventing unnecessary external lookups for CNAME targets within managed zones.
client/internal/dns/local/local.go (9)
24-28: LGTM!Good addition of the timeout constant and resolver interface. The 4-second timeout for external resolution prevents blocking while giving sufficient time for DNS lookups.
30-49: LGTM!The Resolver struct is properly extended with:
zonesfor zone-aware resolutionresolverinterface for testability (allows mock injection)ctx/cancelfor graceful shutdown supportThe constructor correctly initializes the context.
60-71: LGTM!The Stop method properly:
- Cancels the context (stops in-flight external resolutions)
- Clears all state under lock
- Handles nil cancel gracefully
81-103: LGTM!ServeDNS properly integrates the new lookup flow:
- Uses
lookupRecordswhich returns the full result including rcode- Sets
Authoritativebased on whether external data was used- Delegates rcode determination to
determineRcodefor proper handling
105-121: LGTM!Correct implementation of RFC 6604 rcode determination:
- CNAME chain rcodes are propagated from final target resolution
- NODATA case (domain exists but no records of requested type) returns Success
- Only returns NXDOMAIN when domain truly doesn't exist
132-148: LGTM!Zone membership check is correctly implemented:
- Case-insensitive comparison
- Handles both exact match and subdomain cases
- Properly FQDNs the names for comparison
294-322: LGTM!External resolution is well-implemented:
- Falls back to
net.DefaultResolverwhen no mock is injected- Uses context with timeout for cancellation support
- Properly handles the lookup result and builds RRs from IPs
- Uses resutil utilities for consistent handling
324-344: LGTM!Good structured error logging with appropriate log levels:
- Trace for not-found (expected case)
- Debug for actual errors with server context when available
346-363: LGTM!The Update method correctly stores the zones for zone-aware resolution while maintaining the existing record update behavior.
client/internal/dnsfwd/forwarder.go (5)
21-21: LGTM!Import added for centralized DNS utilities.
193-237: LGTM!The
handleDNSQuerymethod is well-refactored:
- Logger parameter enables per-request tracing context
resutil.NetworkForQtypesimplifies type-to-network mappingresutil.LookupIPprovides a clean result object patternresutil.IPsToRRsstandardizes RR constructionThe error handling flow is cleaner with the result object.
239-270: LGTM!UDP handler correctly:
- Creates a structured logger with
request_idanddns_idfor request tracing- Measures response time
- Logs a concise response summary with domain, rcode, answers, and duration
272-291: LGTM!TCP handler follows the same consistent pattern as UDP, ensuring uniform logging across both protocols.
329-387: LGTM!The
handleDNSErrormethod is well-refactored to use the result object pattern:
- Uses
result.Rcodedirectly for response- Properly handles cache fallback on upstream failures
- Provides contextual logging with server info when available from
net.DNSError- Distinguishes between cacheable negative results (NXDOMAIN/NODATA) and transient failures
client/internal/dns/handler_chain.go (5)
46-67: LGTM!The
ResponseWriterChainextensions cleanly support per-request metadata propagation. The lazy initialization of themetamap inSetMetais appropriate since each DNS request gets its own writer instance.
69-77: LGTM!Response capture at line 75 correctly stores the message before the underlying write, enabling post-handler logging. The continue-signal path (lines 71-74) appropriately skips capture since the chain will try the next handler.
174-189: LGTM!The trace-level guard at line 176-178 prevents unnecessary work when trace logging is disabled. The use of
strings.Builderfor the outer structure is appropriate for this diagnostic output.
191-252: Well-structured request tracing implementation.The per-request logger with
request_idanddns_idfields enables effective cross-component correlation. The special-casing to suppress cache-priority continue logs (line 234) is a thoughtful noise reduction.
254-267: LGTM!The nil-response guard prevents panics in edge cases. The integration with
resutil.FormatAnswersmaintains consistency with the unified logging format introduced across DNS components.client/internal/dns/resutil/resolve.go (6)
17-25: LGTM!Using
crypto/randis appropriate for trace IDs. The empty-string fallback on error is acceptable since request IDs are for observability rather than security-critical operations.
59-70: LGTM!Clean mapping function. Returning an empty string for unsupported query types allows callers to handle appropriately.
100-139: LGTM!The
LookupResultstruct preserving the original error while providing a normalizedRcodeis a good pattern. The IPv4-mapped address normalization at lines 117-120 ensures consistent address handling.
141-173: Well-reasoned NXDOMAIN vs NODATA distinction.The approach of querying the alternative record type to distinguish NXDOMAIN from NODATA is sound for musl compatibility. The conservative fallback to
RcodeSuccesson non-DNS errors (line 168) prevents false NXDOMAINs.One consideration: the alternative query at line 161 reuses the original context, so if the initial query consumed most of the timeout budget, the alternative query might timeout. This is likely acceptable since it fails gracefully to
RcodeSuccess.
76-98: LGTM!The interface-based metadata propagation allows clean integration with existing response writers. The fallback to generate a new ID in
GetRequestIDensures tracing works even for non-chained writers.
175-197: LGTM!Compact and readable log formatting. The type-specific handling for common DNS record types provides useful output while the fallback to
dns.TypeToStringhandles edge cases gracefully.client/internal/routemanager/dnsinterceptor/handler.go (4)
222-290: LGTM!The structured logger integration with
request_idanddns_idaligns with the handler chain's tracing pattern. Usingresutil.GetRequestID(w)correctly reuses the ID from the chain rather than generating a new one, enabling end-to-end request correlation.
323-398: LGTM!Logger threading through
writeMsgandlogPrefixChangesensures trace context is preserved across the call chain. The pattern is applied consistently.
400-518: LGTM!Consistent logger propagation through the DNAT management methods maintains request tracing throughout the domain prefix update flow.
521-529: LGTM!Creating a fresh logger entry here is appropriate since
cleanupDNATMappingsis called fromRemoveRouteduring teardown, not in the context of a DNS request. The logs will still be useful for debugging cleanup operations.

Describe your changes
Issue ticket number and link
Stack
Checklist
Documentation
Select exactly one:
Docs PR URL (required if "docs added" is checked)
Paste the PR link from https://github.com/netbirdio/docs here:
https://github.com/netbirdio/docs/pull/__
Summary by CodeRabbit
Release Notes
New Features
Bug Fixes
✏️ Tip: You can customize this high-level summary in your review settings.