-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
DNS-aware dial for HTTP reverse proxy — use NetBird nameservers for hostname resolution #5905
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,91 @@ | ||||||||||||||||||||||
| package roundtrip | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| import ( | ||||||||||||||||||||||
| "context" | ||||||||||||||||||||||
| "fmt" | ||||||||||||||||||||||
| "net" | ||||||||||||||||||||||
| "net/netip" | ||||||||||||||||||||||
| ) | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| // dialWithDNSResolution wraps a DialContext function so that target addresses | ||||||||||||||||||||||
| // containing hostnames (rather than IPs) are resolved through NetBird's own | ||||||||||||||||||||||
| // DNS infrastructure before the connection is dialed. | ||||||||||||||||||||||
| // | ||||||||||||||||||||||
| // getDNSAddr is called on every dial that requires hostname resolution; it | ||||||||||||||||||||||
| // should return the current NetBird DNS server address (IP + port) and true. | ||||||||||||||||||||||
| // When the DNS server is not yet available it should return false, in which | ||||||||||||||||||||||
| // case resolution falls back to the process-level default resolver. | ||||||||||||||||||||||
| // | ||||||||||||||||||||||
| // The resolver dials the DNS server using the same underlying dial function | ||||||||||||||||||||||
| // (i.e. through the WireGuard netstack), because in userspace / netstack mode | ||||||||||||||||||||||
| // the DNS server is reachable only via the virtual WireGuard interface. | ||||||||||||||||||||||
| func dialWithDNSResolution( | ||||||||||||||||||||||
| getDNSAddr func() (netip.AddrPort, bool), | ||||||||||||||||||||||
| dial func(ctx context.Context, network, addr string) (net.Conn, error), | ||||||||||||||||||||||
| ) func(ctx context.Context, network, addr string) (net.Conn, error) { | ||||||||||||||||||||||
| return func(ctx context.Context, network, addr string) (net.Conn, error) { | ||||||||||||||||||||||
| host, port, err := net.SplitHostPort(addr) | ||||||||||||||||||||||
| if err != nil { | ||||||||||||||||||||||
| // Malformed address — let the underlying dialer handle or fail it. | ||||||||||||||||||||||
| return dial(ctx, network, addr) | ||||||||||||||||||||||
| } | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| // If the host is already an IP literal, skip resolution entirely. | ||||||||||||||||||||||
| if _, err := netip.ParseAddr(host); err == nil { | ||||||||||||||||||||||
| return dial(ctx, network, addr) | ||||||||||||||||||||||
| } | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| resolved, err := resolveHost(ctx, host, getDNSAddr, dial) | ||||||||||||||||||||||
| if err != nil { | ||||||||||||||||||||||
| return nil, err | ||||||||||||||||||||||
| } | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| return dial(ctx, network, net.JoinHostPort(resolved, port)) | ||||||||||||||||||||||
| } | ||||||||||||||||||||||
|
Comment on lines
+22
to
+44
|
||||||||||||||||||||||
| } | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| // resolveHost resolves a hostname to its first IPv4/IPv6 address using a | ||||||||||||||||||||||
| // custom net.Resolver backed by the NetBird DNS server (when available) or | ||||||||||||||||||||||
| // the process-level default resolver as a fallback. | ||||||||||||||||||||||
| func resolveHost( | ||||||||||||||||||||||
| ctx context.Context, | ||||||||||||||||||||||
| host string, | ||||||||||||||||||||||
| getDNSAddr func() (netip.AddrPort, bool), | ||||||||||||||||||||||
| dial func(ctx context.Context, network, addr string) (net.Conn, error), | ||||||||||||||||||||||
| ) (string, error) { | ||||||||||||||||||||||
| resolver := buildResolver(getDNSAddr, dial) | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| addrs, err := resolver.LookupHost(ctx, host) | ||||||||||||||||||||||
| if err != nil { | ||||||||||||||||||||||
| return "", fmt.Errorf("dns: resolve %q: %w", host, err) | ||||||||||||||||||||||
| } | ||||||||||||||||||||||
| if len(addrs) == 0 { | ||||||||||||||||||||||
| return "", fmt.Errorf("dns: no addresses returned for %q", host) | ||||||||||||||||||||||
| } | ||||||||||||||||||||||
| return addrs[0], nil | ||||||||||||||||||||||
| } | ||||||||||||||||||||||
|
Comment on lines
+58
to
+66
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧩 Analysis chain🌐 Web query:
💡 Result: Yes, for Go HTTP dialers (via net/http.Transport using the default net.Dialer), when DNS returns multiple A/AAAA records, clients attempt connections to addresses sequentially in order until one succeeds or all fail, before the overall dial fails. This follows "try each IP address in order until one succeeds" behavior. The net/http package resolves the hostname to all IP addresses upfront (potentially waiting for both A and AAAA). Then, http.Transport's dialConnFor calls the DialContext (defaultTransportDialContext, which is net.Dialer.DialContext) with the specific "network:ip:port" (e.g., "tcp4:1.2.3.4:80"). The net.Dialer.DialContext handles multi-homed hosts by trying each resolved IP sequentially, spreading any timeout across attempts. This is the standard, recommended approach in Go's net/http—no configuration needed for basic failover. For custom behavior (e.g., parallel "Happy Eyeballs" for IPv4/IPv6), override Transport.DialContext. Citations:
🏁 Script executed: # Locate the dns.go file in the roundtrip directory
find . -path "*/proxy/internal/roundtrip/dns.go" -type fRepository: netbirdio/netbird Length of output: 94 🏁 Script executed: # If file found, inspect its full content to understand context
if [ -f "proxy/internal/roundtrip/dns.go" ]; then
wc -l proxy/internal/roundtrip/dns.go
fiRepository: netbirdio/netbird Length of output: 95 🏁 Script executed: # Search for resolveHost usage across the codebase
rg "resolveHost" --type go -B 2 -A 5Repository: netbirdio/netbird Length of output: 1236 🏁 Script executed: # Also search for dialWithDNSResolution to understand how resolver is used
rg "dialWithDNSResolution" --type go -B 2 -A 10Repository: netbirdio/netbird Length of output: 3287 🏁 Script executed: cat -n proxy/internal/roundtrip/dns.goRepository: netbirdio/netbird Length of output: 3879 Return all resolved addresses and implement retry logic in the caller to handle unreachable first addresses. Using only Proposed direction-func resolveHost(
+func resolveHost(
ctx context.Context,
host string,
getDNSAddr func() (netip.AddrPort, bool),
dial func(ctx context.Context, network, addr string) (net.Conn, error),
-) (string, error) {
+) ([]string, error) {
resolver := buildResolver(getDNSAddr, dial)
addrs, err := resolver.LookupHost(ctx, host)
if err != nil {
- return "", fmt.Errorf("dns: resolve %q: %w", host, err)
+ return nil, fmt.Errorf("dns: resolve %q: %w", host, err)
}
if len(addrs) == 0 {
- return "", fmt.Errorf("dns: no addresses returned for %q", host)
+ return nil, fmt.Errorf("dns: no addresses returned for %q", host)
}
- return addrs[0], nil
+ return addrs, nil
}And in 🤖 Prompt for AI Agents |
||||||||||||||||||||||
|
|
||||||||||||||||||||||
| // buildResolver returns a *net.Resolver configured to query the NetBird DNS | ||||||||||||||||||||||
| // server via the provided dial function. If the DNS server address is not | ||||||||||||||||||||||
| // yet available, the default system resolver is returned so that the caller | ||||||||||||||||||||||
| // can still attempt resolution (useful during client startup). | ||||||||||||||||||||||
| func buildResolver( | ||||||||||||||||||||||
| getDNSAddr func() (netip.AddrPort, bool), | ||||||||||||||||||||||
| dial func(ctx context.Context, network, addr string) (net.Conn, error), | ||||||||||||||||||||||
| ) *net.Resolver { | ||||||||||||||||||||||
| dnsAddr, ok := getDNSAddr() | ||||||||||||||||||||||
| if !ok { | ||||||||||||||||||||||
| return net.DefaultResolver | ||||||||||||||||||||||
| } | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| addrStr := dnsAddr.String() | ||||||||||||||||||||||
| return &net.Resolver{ | ||||||||||||||||||||||
| PreferGo: true, | ||||||||||||||||||||||
| Dial: func(ctx context.Context, _, _ string) (net.Conn, error) { | ||||||||||||||||||||||
| // Always use UDP toward the DNS server. The network and address | ||||||||||||||||||||||
| // arguments passed by net.Resolver are intentionally ignored; | ||||||||||||||||||||||
| // we route through the WireGuard netstack instead. | ||||||||||||||||||||||
| return dial(ctx, "udp", addrStr) | ||||||||||||||||||||||
|
Comment on lines
+84
to
+88
|
||||||||||||||||||||||
| Dial: func(ctx context.Context, _, _ string) (net.Conn, error) { | |
| // Always use UDP toward the DNS server. The network and address | |
| // arguments passed by net.Resolver are intentionally ignored; | |
| // we route through the WireGuard netstack instead. | |
| return dial(ctx, "udp", addrStr) | |
| Dial: func(ctx context.Context, network, _ string) (net.Conn, error) { | |
| // Preserve the transport requested by net.Resolver (udp/tcp and | |
| // v4/v6 variants), but always route to the NetBird DNS server via | |
| // the WireGuard netstack. | |
| return dial(ctx, network, addrStr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🌐 Web query:
In Go's net.Resolver, can Resolver.Dial be invoked with network "tcp" (for example after UDP truncation), and should custom Resolver.Dial implementations preserve the provided network?
💡 Result:
Yes, Go's net.Resolver.Dial can be invoked with network "tcp", for example when the resolver falls back to TCP after receiving a truncated UDP response (TC bit set), as per standard DNS behavior described in RFC 1035 and confirmed in Go issues discussing UDP truncation handling. Custom Resolver.Dial implementations must preserve the provided network parameter, returning a Conn of the exact type requested: a *TCPConn for "tcp" (or variants like "tcp4", "tcp6") or *UDPConn for UDP networks. This is explicitly stated in the official documentation and original CL adding the Dial field.
Citations:
- 1: https://groups.google.com/g/golang-codereviews/c/9JjM6ld8IsE
- 2: https://pkg.go.dev/net@go1.25.1
- 3: net: Resolver doesn't use provided Dial function in all cases golang/go#60712
- 4: https://github.com/golang/go/blob/master/src/net/lookup.go
- 5: https://github.com/golang/go/blob/master/src/net/dnsclient_unix.go
🏁 Script executed:
# Inspect the DNS dial code context
fd -n 'dns.go' proxy/internal/roundtrip | head -5Repository: netbirdio/netbird
Length of output: 291
🏁 Script executed:
# Read the full file to understand context
cat -n proxy/internal/roundtrip/dns.go | head -150Repository: netbirdio/netbird
Length of output: 3879
🏁 Script executed:
# Search for any comments or docs explaining DNS transport strategy
rg -i "udp|tcp|fallback|transport" proxy/internal/roundtrip/dns.go -A 2 -B 2Repository: netbirdio/netbird
Length of output: 625
🏁 Script executed:
# Check for any README or design docs in proxy directory
find proxy -type f \( -name "*.md" -o -name "*.txt" -o -name "README*" \) | head -10Repository: netbirdio/netbird
Length of output: 130
🏁 Script executed:
# Search for DNS over TCP or TCP DNS references
rg -i "dns.*tcp|tcp.*dns" --type go proxy/Repository: netbirdio/netbird
Length of output: 139
🏁 Script executed:
# Check for WireGuard netstack DNS documentation or implementation
rg -i "netstack|wireguard" proxy/internal/roundtrip/ -A 3 -B 3Repository: netbirdio/netbird
Length of output: 4035
🏁 Script executed:
# Search for issues or TODOs about DNS TCP support
rg -i "todo|fixme|issue.*dns.*tcp" --type goRepository: netbirdio/netbird
Length of output: 20953
🏁 Script executed:
# Look at the underlying dial function signature to understand constraints
rg -B 5 -A 10 "func dialWithDNSResolution" proxy/internal/roundtrip/dns.goRepository: netbirdio/netbird
Length of output: 825
🏁 Script executed:
# Find callers of dialWithDNSResolution
rg "dialWithDNSResolution" --type go -B 3 -A 3Repository: netbirdio/netbird
Length of output: 1844
🏁 Script executed:
# Check if there are other DNS dial implementations that use TCP
rg -i "tcp.*dial|dial.*tcp" proxy/internal/roundtrip/ --type goRepository: netbirdio/netbird
Length of output: 43
🏁 Script executed:
# Look for WireGuard netstack DNS implementation
fd -type f -name "*.go" proxy/internal/roundtrip | xargs rg -l "netstack|wireguard" | head -5Repository: netbirdio/netbird
Length of output: 396
🏁 Script executed:
# Check netbird.go for how the dial function is created and used
sed -n '1,100p' proxy/internal/roundtrip/netbird.goRepository: netbirdio/netbird
Length of output: 3292
🏁 Script executed:
# Find the client.DialContext implementation in embedded client
find . -path "*/client/embed*" -name "*.go" | head -5Repository: netbirdio/netbird
Length of output: 131
🏁 Script executed:
# Search for DialContext in embedded client
rg "func.*DialContext" client/embed --type go -A 5Repository: netbirdio/netbird
Length of output: 458
🏁 Script executed:
# Check if the WireGuard netstack interface supports TCP
rg "DialContext|Dial.*tcp" client/iface --type go | head -20Repository: netbirdio/netbird
Length of output: 185
Preserve Resolver.Dial network parameter to support TCP fallback after DNS truncation.
Go's net.Resolver invokes Resolver.Dial with network="tcp" when a truncated UDP response (TC bit) requires TCP fallback per RFC 1035. Forcing UDP ignores this contract and causes DNS failures on truncated responses. The underlying client.DialContext supports both TCP and UDP, so the fix is straightforward: accept the network parameter and default to UDP only if not specified.
Proposed fix
- Dial: func(ctx context.Context, _, _ string) (net.Conn, error) {
+ Dial: func(ctx context.Context, network, _ string) (net.Conn, error) {
// Always use UDP toward the DNS server. The network and address
// arguments passed by net.Resolver are intentionally ignored;
// we route through the WireGuard netstack instead.
- return dial(ctx, "udp", addrStr)
+ if network == "" {
+ network = "udp"
+ }
+ return dial(ctx, network, addrStr)
},🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@proxy/internal/roundtrip/dns.go` around lines 84 - 89, The Resolver.Dial
currently ignores the network parameter and always uses "udp", which breaks TCP
fallback on truncated DNS responses; update the anonymous Dial func in
proxy/internal/roundtrip/dns.go to accept and use the incoming network parameter
(e.g., name it "network") when calling dial(ctx, network, addrStr), falling back
to "udp" only if the provided network is empty, so net.Resolver's TCP fallback
behavior is preserved and dial(ctx, ...) can still handle both "tcp" and "udp".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GetDNSAddrPort() takes
syncMsgMux, which is also used to serialize management message processing and is held across relatively long operations (e.g., Engine.Start). Because this getter is called on every dial that needs hostname resolution, it can introduce avoidable contention and request latency. Consider protecting just the DNS server address with a dedicated lightweight lock/atomic (or caching the addrport in the engine) so dialing isn’t blocked by management sync work.