Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions client/embed/embed.go
Original file line number Diff line number Diff line change
Expand Up @@ -507,3 +507,14 @@ func (c *Client) getNet() (*wgnetstack.Net, netip.Addr, error) {

return nsnet, addr, nil
}

// GetDNSAddrPort returns the address of the NetBird DNS resolver for this client.
// Returns the zero AddrPort and false if the client is not started or the DNS
// server is not yet initialized.
func (c *Client) GetDNSAddrPort() (netip.AddrPort, bool) {
engine, err := c.getEngine()
if err != nil {
return netip.AddrPort{}, false
}
return engine.GetDNSAddrPort()
}
4 changes: 4 additions & 0 deletions client/internal/dns/mock_server.go
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,10 @@ func (m *MockServer) DnsIP() netip.Addr {
return netip.MustParseAddr("100.10.254.255")
}

func (m *MockServer) DnsAddrPort() netip.AddrPort {
return netip.AddrPortFrom(netip.MustParseAddr("100.10.254.255"), 53)
}

func (m *MockServer) OnUpdatedHostDNSServer(addrs []netip.AddrPort) {
// TODO implement me
panic("implement me")
Expand Down
7 changes: 7 additions & 0 deletions client/internal/dns/server.go
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,8 @@ type Server interface {
Initialize() error
Stop()
DnsIP() netip.Addr
// DnsAddrPort returns the full address (IP + port) of the DNS resolver.
DnsAddrPort() netip.AddrPort
UpdateDNSServer(serial uint64, update nbdns.Config) error
OnUpdatedHostDNSServer(addrs []netip.AddrPort)
SearchDomains() []string
Expand Down Expand Up @@ -380,6 +382,11 @@ func (s *DefaultServer) DnsIP() netip.Addr {
return s.service.RuntimeIP()
}

// DnsAddrPort returns the full address (IP + port) of the DNS resolver.
func (s *DefaultServer) DnsAddrPort() netip.AddrPort {
return netip.AddrPortFrom(s.service.RuntimeIP(), uint16(s.service.RuntimePort()))
}

// SetFirewall sets the firewall used for DNS port DNAT rules.
// This must be called before Initialize when using the listener-based service,
// because the firewall is typically not available at construction time.
Expand Down
16 changes: 16 additions & 0 deletions client/internal/engine.go
Original file line number Diff line number Diff line change
Expand Up @@ -2070,6 +2070,22 @@ func (e *Engine) GetWgAddr() netip.Addr {
return e.wgInterface.Address().IP
}

// GetDNSAddrPort returns the DNS server address (IP + port) used by this engine.
// Returns the zero AddrPort and false if the DNS server is not yet initialized.
func (e *Engine) GetDNSAddrPort() (netip.AddrPort, bool) {
e.syncMsgMux.Lock()
dnsServer := e.dnsServer
e.syncMsgMux.Unlock()
if dnsServer == nil {
return netip.AddrPort{}, false
}
addr := dnsServer.DnsAddrPort()
if !addr.IsValid() {
return netip.AddrPort{}, false
}
return addr, true
}
Comment on lines +2075 to +2087
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetDNSAddrPort() takes syncMsgMux, which is also used to serialize management message processing and is held across relatively long operations (e.g., Engine.Start). Because this getter is called on every dial that needs hostname resolution, it can introduce avoidable contention and request latency. Consider protecting just the DNS server address with a dedicated lightweight lock/atomic (or caching the addrport in the engine) so dialing isn’t blocked by management sync work.

Copilot uses AI. Check for mistakes.

func (e *Engine) RenewTun(fd int) error {
e.syncMsgMux.Lock()
wgInterface := e.wgInterface
Expand Down
91 changes: 91 additions & 0 deletions proxy/internal/roundtrip/dns.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
package roundtrip

import (
"context"
"fmt"
"net"
"net/netip"
)

// dialWithDNSResolution wraps a DialContext function so that target addresses
// containing hostnames (rather than IPs) are resolved through NetBird's own
// DNS infrastructure before the connection is dialed.
//
// getDNSAddr is called on every dial that requires hostname resolution; it
// should return the current NetBird DNS server address (IP + port) and true.
// When the DNS server is not yet available it should return false, in which
// case resolution falls back to the process-level default resolver.
//
// The resolver dials the DNS server using the same underlying dial function
// (i.e. through the WireGuard netstack), because in userspace / netstack mode
// the DNS server is reachable only via the virtual WireGuard interface.
func dialWithDNSResolution(
getDNSAddr func() (netip.AddrPort, bool),
dial func(ctx context.Context, network, addr string) (net.Conn, error),
) func(ctx context.Context, network, addr string) (net.Conn, error) {
return func(ctx context.Context, network, addr string) (net.Conn, error) {
host, port, err := net.SplitHostPort(addr)
if err != nil {
// Malformed address — let the underlying dialer handle or fail it.
return dial(ctx, network, addr)
}

// If the host is already an IP literal, skip resolution entirely.
if _, err := netip.ParseAddr(host); err == nil {
return dial(ctx, network, addr)
}

resolved, err := resolveHost(ctx, host, getDNSAddr, dial)
if err != nil {
return nil, err
}

return dial(ctx, network, net.JoinHostPort(resolved, port))
}
Comment on lines +22 to +44
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new DNS-aware dialing behavior (IP-literal passthrough, hostname resolution via NetBird DNS when available, fallback to default resolver) is not covered by tests in the proxy roundtrip package. Adding unit tests for dialWithDNSResolution/buildResolver would help prevent regressions (e.g., ensuring hostnames trigger resolver usage and that IP literals don’t).

Copilot uses AI. Check for mistakes.
}

// resolveHost resolves a hostname to its first IPv4/IPv6 address using a
// custom net.Resolver backed by the NetBird DNS server (when available) or
// the process-level default resolver as a fallback.
func resolveHost(
ctx context.Context,
host string,
getDNSAddr func() (netip.AddrPort, bool),
dial func(ctx context.Context, network, addr string) (net.Conn, error),
) (string, error) {
resolver := buildResolver(getDNSAddr, dial)

addrs, err := resolver.LookupHost(ctx, host)
if err != nil {
return "", fmt.Errorf("dns: resolve %q: %w", host, err)
}
if len(addrs) == 0 {
return "", fmt.Errorf("dns: no addresses returned for %q", host)
}
return addrs[0], nil
}
Comment on lines +58 to +66
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

For Go HTTP dialers, what is the recommended approach when DNS returns multiple A/AAAA records—should clients attempt all resolved addresses before failing?

💡 Result:

Yes, for Go HTTP dialers (via net/http.Transport using the default net.Dialer), when DNS returns multiple A/AAAA records, clients attempt connections to addresses sequentially in order until one succeeds or all fail, before the overall dial fails. This follows "try each IP address in order until one succeeds" behavior. The net/http package resolves the hostname to all IP addresses upfront (potentially waiting for both A and AAAA). Then, http.Transport's dialConnFor calls the DialContext (defaultTransportDialContext, which is net.Dialer.DialContext) with the specific "network:ip:port" (e.g., "tcp4:1.2.3.4:80"). The net.Dialer.DialContext handles multi-homed hosts by trying each resolved IP sequentially, spreading any timeout across attempts. This is the standard, recommended approach in Go's net/http—no configuration needed for basic failover. For custom behavior (e.g., parallel "Happy Eyeballs" for IPv4/IPv6), override Transport.DialContext.

Citations:


🏁 Script executed:

# Locate the dns.go file in the roundtrip directory
find . -path "*/proxy/internal/roundtrip/dns.go" -type f

Repository: netbirdio/netbird

Length of output: 94


🏁 Script executed:

# If file found, inspect its full content to understand context
if [ -f "proxy/internal/roundtrip/dns.go" ]; then
  wc -l proxy/internal/roundtrip/dns.go
fi

Repository: netbirdio/netbird

Length of output: 95


🏁 Script executed:

# Search for resolveHost usage across the codebase
rg "resolveHost" --type go -B 2 -A 5

Repository: netbirdio/netbird

Length of output: 1236


🏁 Script executed:

# Also search for dialWithDNSResolution to understand how resolver is used
rg "dialWithDNSResolution" --type go -B 2 -A 10

Repository: netbirdio/netbird

Length of output: 3287


🏁 Script executed:

cat -n proxy/internal/roundtrip/dns.go

Repository: netbirdio/netbird

Length of output: 3879


Return all resolved addresses and implement retry logic in the caller to handle unreachable first addresses.

Using only addrs[0] makes dialing brittle when the first DNS record is unreachable but later records are valid. This differs from Go's standard library behavior (net.Dialer tries all addresses sequentially). Update resolveHost to return []string and modify dialWithDNSResolution to iterate through all addresses, dialing each until one succeeds or all fail.

Proposed direction
-func resolveHost(
+func resolveHost(
 	ctx context.Context,
 	host string,
 	getDNSAddr func() (netip.AddrPort, bool),
 	dial func(ctx context.Context, network, addr string) (net.Conn, error),
-) (string, error) {
+) ([]string, error) {
 	resolver := buildResolver(getDNSAddr, dial)

 	addrs, err := resolver.LookupHost(ctx, host)
 	if err != nil {
-		return "", fmt.Errorf("dns: resolve %q: %w", host, err)
+		return nil, fmt.Errorf("dns: resolve %q: %w", host, err)
 	}
 	if len(addrs) == 0 {
-		return "", fmt.Errorf("dns: no addresses returned for %q", host)
+		return nil, fmt.Errorf("dns: no addresses returned for %q", host)
 	}
-	return addrs[0], nil
+	return addrs, nil
}

And in dialWithDNSResolution, try each resolved address until one dial succeeds, returning the last error if all fail.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@proxy/internal/roundtrip/dns.go` around lines 58 - 66, Change resolveHost to
return ([]string, error) instead of a single string: keep the existing
resolver.LookupHost call, return the full addrs slice (or an error if zero or
lookup failed). Then update dialWithDNSResolution to call resolveHost, iterate
over each returned address and attempt to dial sequentially (using the same
DialContext/Dial method currently used), returning immediately on the first
successful connection; if all attempts fail return the last dial error (include
the target address in error context). Ensure error messages distinguish between
resolve errors and per-address dial errors and preserve existing context
wrapping.


// buildResolver returns a *net.Resolver configured to query the NetBird DNS
// server via the provided dial function. If the DNS server address is not
// yet available, the default system resolver is returned so that the caller
// can still attempt resolution (useful during client startup).
func buildResolver(
getDNSAddr func() (netip.AddrPort, bool),
dial func(ctx context.Context, network, addr string) (net.Conn, error),
) *net.Resolver {
dnsAddr, ok := getDNSAddr()
if !ok {
return net.DefaultResolver
}

addrStr := dnsAddr.String()
return &net.Resolver{
PreferGo: true,
Dial: func(ctx context.Context, _, _ string) (net.Conn, error) {
// Always use UDP toward the DNS server. The network and address
// arguments passed by net.Resolver are intentionally ignored;
// we route through the WireGuard netstack instead.
return dial(ctx, "udp", addrStr)
Comment on lines +84 to +88
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

buildResolver's net.Resolver.Dial ignores the network argument and always dials UDP. net.Resolver may request TCP (e.g., on truncated UDP responses), so forcing UDP can make hostname resolution fail for larger DNS replies. Use the provided network (udp/tcp and v4/v6 variants) when calling dial, while still overriding the address to the NetBird DNS server.

Suggested change
Dial: func(ctx context.Context, _, _ string) (net.Conn, error) {
// Always use UDP toward the DNS server. The network and address
// arguments passed by net.Resolver are intentionally ignored;
// we route through the WireGuard netstack instead.
return dial(ctx, "udp", addrStr)
Dial: func(ctx context.Context, network, _ string) (net.Conn, error) {
// Preserve the transport requested by net.Resolver (udp/tcp and
// v4/v6 variants), but always route to the NetBird DNS server via
// the WireGuard netstack.
return dial(ctx, network, addrStr)

Copilot uses AI. Check for mistakes.
},
Comment on lines +84 to +89
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

In Go's net.Resolver, can Resolver.Dial be invoked with network "tcp" (for example after UDP truncation), and should custom Resolver.Dial implementations preserve the provided network?

💡 Result:

Yes, Go's net.Resolver.Dial can be invoked with network "tcp", for example when the resolver falls back to TCP after receiving a truncated UDP response (TC bit set), as per standard DNS behavior described in RFC 1035 and confirmed in Go issues discussing UDP truncation handling. Custom Resolver.Dial implementations must preserve the provided network parameter, returning a Conn of the exact type requested: a *TCPConn for "tcp" (or variants like "tcp4", "tcp6") or *UDPConn for UDP networks. This is explicitly stated in the official documentation and original CL adding the Dial field.

Citations:


🏁 Script executed:

# Inspect the DNS dial code context
fd -n 'dns.go' proxy/internal/roundtrip | head -5

Repository: netbirdio/netbird

Length of output: 291


🏁 Script executed:

# Read the full file to understand context
cat -n proxy/internal/roundtrip/dns.go | head -150

Repository: netbirdio/netbird

Length of output: 3879


🏁 Script executed:

# Search for any comments or docs explaining DNS transport strategy
rg -i "udp|tcp|fallback|transport" proxy/internal/roundtrip/dns.go -A 2 -B 2

Repository: netbirdio/netbird

Length of output: 625


🏁 Script executed:

# Check for any README or design docs in proxy directory
find proxy -type f \( -name "*.md" -o -name "*.txt" -o -name "README*" \) | head -10

Repository: netbirdio/netbird

Length of output: 130


🏁 Script executed:

# Search for DNS over TCP or TCP DNS references
rg -i "dns.*tcp|tcp.*dns" --type go proxy/

Repository: netbirdio/netbird

Length of output: 139


🏁 Script executed:

# Check for WireGuard netstack DNS documentation or implementation
rg -i "netstack|wireguard" proxy/internal/roundtrip/ -A 3 -B 3

Repository: netbirdio/netbird

Length of output: 4035


🏁 Script executed:

# Search for issues or TODOs about DNS TCP support
rg -i "todo|fixme|issue.*dns.*tcp" --type go

Repository: netbirdio/netbird

Length of output: 20953


🏁 Script executed:

# Look at the underlying dial function signature to understand constraints
rg -B 5 -A 10 "func dialWithDNSResolution" proxy/internal/roundtrip/dns.go

Repository: netbirdio/netbird

Length of output: 825


🏁 Script executed:

# Find callers of dialWithDNSResolution
rg "dialWithDNSResolution" --type go -B 3 -A 3

Repository: netbirdio/netbird

Length of output: 1844


🏁 Script executed:

# Check if there are other DNS dial implementations that use TCP
rg -i "tcp.*dial|dial.*tcp" proxy/internal/roundtrip/ --type go

Repository: netbirdio/netbird

Length of output: 43


🏁 Script executed:

# Look for WireGuard netstack DNS implementation
fd -type f -name "*.go" proxy/internal/roundtrip | xargs rg -l "netstack|wireguard" | head -5

Repository: netbirdio/netbird

Length of output: 396


🏁 Script executed:

# Check netbird.go for how the dial function is created and used
sed -n '1,100p' proxy/internal/roundtrip/netbird.go

Repository: netbirdio/netbird

Length of output: 3292


🏁 Script executed:

# Find the client.DialContext implementation in embedded client
find . -path "*/client/embed*" -name "*.go" | head -5

Repository: netbirdio/netbird

Length of output: 131


🏁 Script executed:

# Search for DialContext in embedded client
rg "func.*DialContext" client/embed --type go -A 5

Repository: netbirdio/netbird

Length of output: 458


🏁 Script executed:

# Check if the WireGuard netstack interface supports TCP
rg "DialContext|Dial.*tcp" client/iface --type go | head -20

Repository: netbirdio/netbird

Length of output: 185


Preserve Resolver.Dial network parameter to support TCP fallback after DNS truncation.

Go's net.Resolver invokes Resolver.Dial with network="tcp" when a truncated UDP response (TC bit) requires TCP fallback per RFC 1035. Forcing UDP ignores this contract and causes DNS failures on truncated responses. The underlying client.DialContext supports both TCP and UDP, so the fix is straightforward: accept the network parameter and default to UDP only if not specified.

Proposed fix
-		Dial: func(ctx context.Context, _, _ string) (net.Conn, error) {
+		Dial: func(ctx context.Context, network, _ string) (net.Conn, error) {
 			// Always use UDP toward the DNS server.  The network and address
 			// arguments passed by net.Resolver are intentionally ignored;
 			// we route through the WireGuard netstack instead.
-			return dial(ctx, "udp", addrStr)
+			if network == "" {
+				network = "udp"
+			}
+			return dial(ctx, network, addrStr)
 		},
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@proxy/internal/roundtrip/dns.go` around lines 84 - 89, The Resolver.Dial
currently ignores the network parameter and always uses "udp", which breaks TCP
fallback on truncated DNS responses; update the anonymous Dial func in
proxy/internal/roundtrip/dns.go to accept and use the incoming network parameter
(e.g., name it "network") when calling dial(ctx, network, addrStr), falling back
to "udp" only if the provided network is empty, so net.Resolver's TCP fallback
behavior is preserved and dial(ctx, ...) can still handle both "tcp" and "udp".

}
}
2 changes: 1 addition & 1 deletion proxy/internal/roundtrip/netbird.go
Original file line number Diff line number Diff line change
Expand Up @@ -276,7 +276,7 @@ func (n *NetBird) createClientEntry(ctx context.Context, accountID types.Account
// the client's HTTPClient to avoid issues with request validation that do
// not work with reverse proxied requests.
transport := &http.Transport{
DialContext: dialWithTimeout(client.DialContext),
DialContext: dialWithDNSResolution(client.GetDNSAddrPort, dialWithTimeout(client.DialContext)),
ForceAttemptHTTP2: true,
MaxIdleConns: n.transportCfg.maxIdleConns,
MaxIdleConnsPerHost: n.transportCfg.maxIdleConnsPerHost,
Expand Down
Loading