Skip to content

[client] Add TCP DNS support for local listener#5758

Merged
lixmal merged 8 commits intomainfrom
local-dns-tcp-listener
Apr 8, 2026
Merged

[client] Add TCP DNS support for local listener#5758
lixmal merged 8 commits intomainfrom
local-dns-tcp-listener

Conversation

@lixmal
Copy link
Copy Markdown
Collaborator

@lixmal lixmal commented Apr 1, 2026

Describe your changes

Add TCP DNS support for the local DNS listener in userspace mode. Previously, DNS queries over TCP were silently dropped because the internal DNS server only handled UDP. This matters for large DNS responses (e.g. TXT records, DNSSEC) that exceed the UDP message size limit.

TCP DNS handling:

  • In userspace mode, add a gvisor-based TCP stack that intercepts TCP packets via a firewall hook and serves DNS over TCP. The stack starts lazily on the first TCP DNS packet and shuts down after 30s of inactivity.
  • In listener mode, start a real TCP dns.Server alongside the existing UDP one, plus DNAT rules when eBPF handles UDP redirection.
  • When a truncated UDP response is about to be sent, pre-start the TCP stack so it's ready for the client's TCP retry.

Upstream protocol awareness:

  • Propagate the inbound protocol (UDP/TCP) through context so upstream exchanges use TCP directly when the request arrived over TCP.
  • When upstream returns a large TCP response but the client connected via UDP, truncate the response to fit the client's buffer size.

Firewall hook cleanup:

  • Replace AddUDPPacketHook/AddTCPPacketHook/RemovePacketHook with SetUDPPacketHook/SetTCPPacketHook (pass nil to remove). Only one hook per protocol is supported, so IDs were unnecessary overhead.
  • Switch from RLock + map lookup to lock-free atomic pointers for hook matching.

Observability:

  • Log client=IP:port in the handler chain and DNS forwarder for all queries (previously only available for the listener path).
  • Log protocol=udp|tcp and upstream_protocol=udp|tcp in response metadata.
  • service.Stop() now returns error with accumulated errors instead of logging inline.

Example log lines

TCP stack lifecycle (debug level):

TCP DNS stack started on 100.4.255.254:53 (triggered by 100.4.131.65:36191)
TCP DNS stack stopped

Handler chain now includes client address and protocol info (trace level):

[dns_id: f68b, client: 100.4.131.65:36191, request_id: 57cf2beb]
  question: domain=google.com. type=TXT class=IN -> handler=Upstream [9.9.9.9:53]

[client: 100.4.131.65:36191, request_id: 57cf2beb, dns_id: f68b]
  response: domain=google.com. rcode=NOERROR answers=[TXT x13] size=1090B
  protocol=tcp upstream=9.9.9.9:53 upstream_protocol=tcp took=22ms

Packet tracer shows hook interception for both protocols:

$ netbird debug trace out 100.4.131.65 100.4.255.254 -p tcp --dport 53 --syn
Completed: Packet dropped by outgoing hook
Final disposition: DENIED

$ netbird debug trace out 100.4.131.65 100.4.255.254 -p udp --dport 53
Completed: Packet dropped by outgoing hook
Final disposition: DENIED

Issue ticket number and link

Stack

Checklist

  • Is it a bug fix
  • Is a typo/documentation fix
  • Is a feature enhancement
  • It is a refactor
  • Created tests that fail without the change (if possible)

By submitting this pull request, you confirm that you have read and agree to the terms of the Contributor License Agreement.

Documentation

Select exactly one:

  • I added/updated documentation for this change
  • Documentation is not needed for this change (explain why)

Internal implementation detail, no user-facing configuration changes.

Docs PR URL (required if "docs added" is checked)

Paste the PR link from https://github.com/netbirdio/docs here:

https://github.com/netbirdio/docs/pull/__

Summary by CodeRabbit

  • New Features

    • DNS now serves over TCP as well as UDP, with on‑demand TCP handling and automatic TCP fallback for truncated UDP responses.
    • Firewall can manage OUTPUT DNAT for locally‑generated traffic.
    • Added outbound per-destination TCP packet hook and unified single‑slot UDP hook API (set/clear).
  • Improvements

    • DNS tracing/logging now includes client address, upstream protocol, and response size.
    • Outbound hook matching simplified to exact destination matches.
  • Tests

    • Added/updated tests covering TCP+UDP DNS, transport selection/fallback, and hook behavior.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 1, 2026

📝 Walkthrough

Walkthrough

Replaces ID-based packet hook API with outbound-only atomic setters, adds on-demand gVisor-backed TCP DNS stack and TCP forwarding, propagates DNS transport context for upstream selection, introduces OUTPUT DNAT support across firewall backends, and updates tests, mocks, and logging accordingly.

Changes

Cohort / File(s) Summary
Packet hook system & filtering
client/firewall/uspfilter/filter.go, client/firewall/uspfilter/filter_test.go, client/firewall/uspfilter/rule.go, client/firewall/uspfilter/tracer_test.go, client/iface/device/device_filter.go, client/iface/mocks/filter.go
Removed ID-based hook API; added atomic single-slot outbound SetUDPPacketHook/SetTCPPacketHook, exact-match outbound hook evaluation, moved hook clearing into resetState, removed PeerRule.udpHook; tests and mocks updated to use new setters.
Removed generated mock
client/iface/mocks/iface/mocks/filter.go
Deleted autogenerated GoMock MockPacketFilter and recorder implementation.
DNS server & firewall wiring
client/internal/dns/server.go, client/internal/dns/server_test.go, client/internal/dns/mock_server.go
Added Server.SetFirewall(Firewall) and DefaultServer implementation; tests/mocks updated to call new setters; MockServer gained SetFirewall; service.Stop now returns error.
Listener-based DNS service (UDP+TCP)
client/internal/dns/service_listener.go, client/internal/dns/service_listener_test.go
Listener now serves UDP and TCP on same addr:port, Stop returns error and coordinates shutdown and optional TCP DNAT removal; binding probe requires both protocols; added integration test for UDP+TCP.
In-memory DNS service & response writer
client/internal/dns/service_memory.go, client/internal/dns/response_writer.go
ServiceViaMemory uses SetUDPPacketHook/SetTCPPacketHook to install/clear hooks, tracks tcpHookSet and tcpDNS; filterDNSTraffic now returns error only; added helper to derive remote UDP addr from packet.
TCP DNS stack
client/internal/dns/tcpstack.go
New gVisor-backed on-demand TCP DNS server with lazy startup, InjectPacket handling, framed DNS I/O, per-flow handling, idle shutdown timer, and packet-address helpers.
Upstream protocol selection & tests
client/internal/dns/upstream.go, client/internal/dns/upstream_test.go, client/internal/dns/upstream_android.go
Propagates inbound transport via context; ExchangeWithFallback short-circuits to TCP when inbound is TCP, caps EDNS0 UDP size by MTU, and truncates TCP responses to client UDP buffer when required; added tests and adapted Android path to use fallback wrapper.
DNS logging and tracing
client/internal/dns/handler_chain.go, client/internal/dnsfwd/forwarder.go
Added optional client log field from RemoteAddr(), include DNS response size (size=%dB) in trace logs, and build log fields from mutable maps.
Firewall OUTPUT DNAT across implementations
client/firewall/manager/firewall.go, client/firewall/iptables/manager_linux.go, client/firewall/iptables/router_linux.go, client/firewall/nftables/manager_linux.go, client/firewall/nftables/router_linux.go, client/firewall/uspfilter/nat.go
Added AddOutputDNAT/RemoveOutputDNAT to Manager interface and implementations; iptables/nftables add OUTPUT NAT chain and manage DNAT rules; uspfilter delegates to native firewall when available.
Engine integration
client/internal/engine.go
Wires created firewall into DNS server startup via e.dnsServer.SetFirewall(e.firewall).

Sequence Diagram(s)

sequenceDiagram
    participant Device as Device/Packet Pipeline
    participant Hook as PacketHook (atomic)
    participant DNSMux as DNS Mux
    participant TruncWriter as Truncation-Aware Writer
    participant TCPStack as TCP DNS Stack (gVisor)
    participant Upstream as Upstream Resolver

    Device->>Hook: Outbound packet arrives (IP:port, bytes)
    Hook->>Hook: Check exact ip:port against udpHookOut/tcpHookOut
    alt Hook matches and returns true (drop/intercept)
        Hook->>DNSMux: Decode DNS and call ServeDNS
        DNSMux->>TruncWriter: ServeDNS(request)
        TruncWriter->>TruncWriter: If msg.Truncated and tcpDNS != nil
        alt Truncated
            TruncWriter->>TCPStack: Ensure running / InjectPacket()
            TCPStack->>TruncWriter: Handle TCP DNS flows, reply
        end
        TruncWriter->>Upstream: ExchangeWithFallback(ctx)
        Upstream->>TruncWriter: Return response
        TruncWriter->>Device: Write response (UDP/TCP framed)
    else No hook match
        Hook->>Device: Let packet flow normally
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • pappz
  • crn4

Poem

🐰 I twitched my whiskers at hooks set just so,
TCP woke from slumber when truncated winds blow,
OUTPUT chains hop tunnels under the ground,
Logs count bytes and clients while I nibble a round,
Hooray for new paths—I'm thumping joyfully now.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 39.53% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The PR title '[client] Add TCP DNS support for local listener' clearly and concisely summarizes the primary feature being added—TCP DNS support—and specifies the context (local listener), making it immediately clear to reviewers.
Description check ✅ Passed The PR description comprehensively covers all required sections from the template: detailed change description explaining TCP DNS support, firewall hook changes, upstream protocol awareness, and observability improvements; checkbox selection (feature enhancement + tests created); CLA confirmation; and documentation justification (internal implementation detail).

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch local-dns-tcp-listener

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
client/internal/dns/service_memory.go (1)

120-141: ⚠️ Potential issue | 🟡 Minor

Potential panic if UDP layer assertion fails.

At line 124, udpLayer.(*layers.UDP) will panic if udpLayer is nil (which happens if the packet doesn't have a UDP layer). The hook is registered for UDP traffic, but defensive coding would add a nil check.

🛡️ Proposed defensive check
 hook := func(packetData []byte) bool {
 	packet := gopacket.NewPacket(packetData, firstLayerDecoder, gopacket.Default)

 	udpLayer := packet.Layer(layers.LayerTypeUDP)
+	if udpLayer == nil {
+		log.Tracef("packet has no UDP layer")
+		return true
+	}
 	udp := udpLayer.(*layers.UDP)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/internal/dns/service_memory.go` around lines 120 - 141, The hook's
direct type assertion udp := udpLayer.(*layers.UDP) can panic when
packet.Layer(layers.LayerTypeUDP) returns nil; update the hook (the closure
assigned to variable hook) to defensively check the UDP layer before asserting —
e.g., obtain udpLayer := packet.Layer(layers.LayerTypeUDP); if udpLayer == nil {
return true } and perform a safe type assertion (udp, ok :=
udpLayer.(*layers.UDP)); if !ok { return true } — then proceed to construct
truncationAwareWriter/responseWriter and call s.dnsMux.ServeDNS(writer, msg) as
before.
🧹 Nitpick comments (6)
client/firewall/uspfilter/hooks_bench_test.go (1)

88-90: Move address parsing out of the timed loop.

These benchmarks currently pay for netip.MustParseAddr(...) on every iteration, so the numbers include string parsing in addition to hook lookup. Precompute the netip.Addr before the timer starts if you want the hook fast path in isolation.

Suggested fix
+dst := netip.MustParseAddr("100.10.255.254")
+
 b.ResetTimer()
 b.ReportAllocs()
 for b.Loop() {
-	m.udpHooksDrop(53, netip.MustParseAddr("100.10.255.254"), pkt)
+	m.udpHooksDrop(53, dst, pkt)
 }

Also applies to: 103-105, 117-119, 131-133, 145-148

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/firewall/uspfilter/hooks_bench_test.go` around lines 88 - 90, The
benchmark is parsing addresses inside the timed loop, so move the
netip.MustParseAddr(...) calls out of the for b.Loop() body and precompute
netip.Addr variables (e.g., addr100 := netip.MustParseAddr("100.10.255.254"))
before the loop or before b.ResetTimer(), then call m.udpHooksDrop(53, addr100,
pkt) inside the loop; apply the same change for the other occurrences mentioned
(lines around 103–105, 117–119, 131–133, 145–148) so the benchmark measures only
the hook path and not address parsing.
client/internal/dns/tcpstack.go (3)

385-387: Add brief comments to empty TSIG stubs per static analysis.

These are intentional no-ops since TSIG is not supported.

📝 Proposed documentation
-func (w *tcpResponseWriter) TsigStatus() error   { return nil }
-func (w *tcpResponseWriter) TsigTimersOnly(bool) {}
-func (w *tcpResponseWriter) Hijack()             {}
+func (w *tcpResponseWriter) TsigStatus() error   { return nil } // TSIG not supported
+func (w *tcpResponseWriter) TsigTimersOnly(bool) {}             // TSIG not supported
+func (w *tcpResponseWriter) Hijack()             {}             // not applicable
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/internal/dns/tcpstack.go` around lines 385 - 387, Add brief
explanatory comments to the three empty TSIG stub methods on tcpResponseWriter
to satisfy static analysis: above TsigStatus(), TsigTimersOnly(bool), and
Hijack() add one-line comments stating these are intentional no-ops because TSIG
is not supported by the TCP stack (e.g., "TsigStatus is a no-op; TSIG not
supported in TCP response writer"). Keep comments short and directly reference
the method names so reviewers can see these are intentional stubs.

271-292: Consider adding a write deadline for TCP DNS responses.

The read timeout is set at line 272, but there's no write deadline. If the client stops reading, WriteMsg could block indefinitely. This is a minor concern since DNS responses are small, but a write deadline would improve robustness.

💡 Suggested improvement
 for {
-	if err := conn.SetDeadline(time.Now().Add(dnsTCPReadTimeout)); err != nil {
+	deadline := time.Now().Add(dnsTCPReadTimeout)
+	if err := conn.SetDeadline(deadline); err != nil {
 		log.Debugf("TCP DNS: set deadline for %s: %v", remoteAddr, err)
 		break
 	}

Note: SetDeadline already applies to both reads and writes, so the current code is actually correct. Disregard this if that was intentional.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/internal/dns/tcpstack.go` around lines 271 - 292, The loop that reads
TCP DNS messages sets a read deadline with conn.SetDeadline but does not set an
explicit write deadline before sending responses; update tcpResponseWriter or
the serving path (e.g., right before calling t.mux.ServeDNS in the loop or
inside tcpResponseWriter.WriteMsg) to call
conn.SetWriteDeadline(time.Now().Add(dnsTCPWriteTimeout)) (or conn.SetDeadline
accordingly using a write-specific timeout constant) so that WriteMsg cannot
block indefinitely if the client stops reading; reference tcpResponseWriter,
readTCPDNSMessage, and the ServeDNS call to locate where to add the write
deadline.

307-314: Add brief comments to empty interface methods per static analysis.

SonarCloud flagged these empty implementations. Adding a single-line comment clarifies these are intentional no-ops required by the stack.LinkEndpoint interface.

📝 Proposed documentation
-func (e *dnsEndpoint) Wait()                                        {}
+func (e *dnsEndpoint) Wait()                                        {} // no-op: synchronous writes
 func (e *dnsEndpoint) ARPHardwareType() header.ARPHardwareType      { return header.ARPHardwareNone }
-func (e *dnsEndpoint) AddHeader(*stack.PacketBuffer)                {}
-func (e *dnsEndpoint) ParseHeader(*stack.PacketBuffer) bool         { return true }
-func (e *dnsEndpoint) Close()                                       {}
-func (e *dnsEndpoint) SetLinkAddress(tcpip.LinkAddress)             {}
+func (e *dnsEndpoint) AddHeader(*stack.PacketBuffer)                {} // no link-layer header
+func (e *dnsEndpoint) ParseHeader(*stack.PacketBuffer) bool         { return true } // no link-layer header to parse
+func (e *dnsEndpoint) Close()                                       {} // cleanup handled by stack.Close
+func (e *dnsEndpoint) SetLinkAddress(tcpip.LinkAddress)             {} // not applicable for TUN
 func (e *dnsEndpoint) SetMTU(mtu uint32)                            { e.mtu.Store(mtu) }
-func (e *dnsEndpoint) SetOnCloseAction(func())                      {}
+func (e *dnsEndpoint) SetOnCloseAction(func())                      {} // not needed
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/internal/dns/tcpstack.go` around lines 307 - 314, The empty
implementations on dnsEndpoint (methods Wait, AddHeader, ParseHeader, Close,
SetLinkAddress, SetOnCloseAction) are flagged by static analysis; update each of
these methods to include a single-line comment inside the function body stating
they are intentional no-ops to satisfy stack.LinkEndpoint (e.g., "// no-op:
required by stack.LinkEndpoint" or similar), and for ParseHeader keep the
existing true return but precede it with a comment explaining it intentionally
accepts all packets; leave SetMTU and ARPHardwareType unchanged.
client/internal/dns/service_listener.go (1)

90-103: TCP server goroutine doesn't track listener status.

The UDP goroutine sets listenerIsRunning via setListenerStatus, but the TCP goroutine does not. If the TCP server fails to start while UDP succeeds, listenerIsRunning will still be true. This is minor since both share the same address, but worth noting.

Consider logging or tracking TCP server startup failure separately, or documenting that listenerIsRunning only tracks the UDP server lifecycle.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/internal/dns/service_listener.go` around lines 90 - 103, The TCP
goroutine currently doesn't update listenerIsRunning while the UDP one does;
modify the TCP goroutine that calls s.tcpServer.ListenAndServe() to mirror the
UDP goroutine by calling s.setListenerStatus(true) at start and defer
s.setListenerStatus(false) so listenerIsRunning reflects TCP lifecycle too, and
ensure the existing log.Errorf for the ListenAndServe error remains (or add a
distinct log message for TCP startup failure) to make failures explicit;
reference s.tcpServer, s.server, ListenAndServe, and setListenerStatus when
making the change.
client/internal/dns/upstream.go (1)

79-83: Consider documenting the naming convention for unexported function.

contextWithupstreamProtocolResult uses inconsistent casing (upstream vs typical Go camelCase Upstream). While this is internal, consistent naming improves readability.

📝 Naming suggestion
-// contextWithupstreamProtocolResult stores a mutable result holder in the context.
-func contextWithupstreamProtocolResult(ctx context.Context) (context.Context, *upstreamProtocolResult) {
+// contextWithUpstreamProtocolResult stores a mutable result holder in the context.
+func contextWithUpstreamProtocolResult(ctx context.Context) (context.Context, *upstreamProtocolResult) {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/internal/dns/upstream.go` around lines 79 - 83, The function name
contextWithupstreamProtocolResult uses inconsistent camelCase; rename it to
contextWithUpstreamProtocolResult (preserving unexported visibility) and update
its declaration, any callers, tests, and associated comment to match; ensure the
related types upstreamProtocolResult and upstreamProtocolKey{} remain unchanged
and update any references to the old name so builds pass and godoc/comment
capitalization aligns with the new identifier.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@client/firewall/uspfilter/filter_test.go`:
- Around line 190-211: The new hook tests (e.g., TestSetUDPPacketHook) create a
manager with Create(...) but never shut it down; add a manager.Close(nil) at the
end of each new test that calls Create (including the other test at lines
213-234) to stop background timers/goroutines and prevent bleed into later
tests—place the call after the final assertions (after
manager.SetUDPPacketHook(..., nil) and nil-check).

In `@client/firewall/uspfilter/hooks_bench_test.go`:
- Around line 69-76: newBenchManager creates a Manager via Create but never
tears it down, leaving goroutines/timers running; fix by registering a b.Cleanup
that closes the manager (e.g. call m.Close() or Manager's shutdown method)
immediately after require.NoError(b, err) in newBenchManager so the Manager is
stopped between benchmark cases.

In `@client/internal/dns/service_listener_test.go`:
- Line 67: The deferred call defer svc.Stop() in service_listener_test.go
swallows any shutdown error; update the defer to call svc.Stop() and assert its
returned error is nil (e.g., defer func(){ if err := svc.Stop(); err != nil {
t.Fatalf("svc.Stop failed: %v", err) } }()), so the test fails immediately if
Stop() returns an error; reference the svc variable and the Stop() method in the
test.
- Around line 32-70: The test currently closes udpLn and tcpLn then starts
svc.server.ListenAndServe()/svc.tcpServer.ListenAndServe(), causing a TOCTOU
race; instead keep the acquired udpLn/tcpLn open, assign udpLn to the UDP
server's PacketConn field and tcpLn to the TCP server's Listener field
(references: variables udpLn, tcpLn and struct instances svc.server,
svc.tcpServer), then call the deterministic startup method ActivateAndServe() on
each server (instead of ListenAndServe()) so the already-bound sockets are used
directly; update svc.listenerIsRunning and defer svc.Stop() as before and remove
the artificial time.Sleep.

In `@client/internal/dns/service_memory.go`:
- Around line 132-140: The code dereferences s.wgInterface.GetDevice().Device
when constructing responseWriter which can panic if GetDevice() returns nil;
before using it in truncationAwareWriter, check the result of
s.wgInterface.GetDevice() (or its Device field) for nil and handle that case
(e.g., log and skip starting the goroutine or provide a safe zero value) so
truncationAwareWriter/responseWriter never receives a nil Device; update the
block creating truncationAwareWriter and the call to s.dnsMux.ServeDNS to guard
against a nil device.

In `@client/internal/dns/tcpstack.go`:
- Around line 373-379: The tcpResponseWriter.Write implementation returns the
number of bytes written by w.conn.Write(buf), which includes the 2-byte length
prefix; update tcpResponseWriter.Write to write the prefixed buffer to w.conn
but return the logical count of payload bytes (len(data)) per dns.ResponseWriter
semantics, and propagate any write error; if conn.Write writes fewer than
len(buf) bytes return io.ErrShortWrite (or the error) while still ensuring the
returned byte count is len(data) on success.

In `@client/internal/dns/upstream_test.go`:
- Around line 542-602: TestExchangeWithFallback_UDPFallbackToTCP currently only
exercises UDP success; change the UDP handler to force a truncated UDP response
so ExchangeWithFallback must retry over TCP and then assert the TCP result is
returned. Concretely, in the test's handler (used by udpServer and tcpServer)
return a truncated UDP reply (either set m.Truncated = true when w is a
dns.PacketConn/udp writer or construct an oversized UDP answer by appending many
A records for the UDP path) and ensure the tcpServer returns a full,
non-truncated answer; then call ExchangeWithFallback(ctx, client, r, addr) and
assert no error, rm is non-nil, rm.Truncated == false and the Answer contains
"10.0.0.3" to verify the TCP fallback path in ExchangeWithFallback is exercised.

In `@client/internal/engine.go`:
- Around line 1809-1811: The DNS server is being created with a nil firewall
because e.firewall is passed into DefaultServerConfig before createFirewall()
runs; update initialization so the firewall is available to newDnsServer():
either move the createFirewall() call to run before constructing
DefaultServerConfig/newDnsServer(), or allow post-construction injection by
adding a SetFirewall(fw Firewall) method on the DNS server type and call
dnsServer.SetFirewall(e.firewall) immediately after createFirewall(); adjust
usages of DefaultServerConfig and newDnsServer() accordingly to ensure the DNS
listener is always given a non-nil firewall reference.

---

Outside diff comments:
In `@client/internal/dns/service_memory.go`:
- Around line 120-141: The hook's direct type assertion udp :=
udpLayer.(*layers.UDP) can panic when packet.Layer(layers.LayerTypeUDP) returns
nil; update the hook (the closure assigned to variable hook) to defensively
check the UDP layer before asserting — e.g., obtain udpLayer :=
packet.Layer(layers.LayerTypeUDP); if udpLayer == nil { return true } and
perform a safe type assertion (udp, ok := udpLayer.(*layers.UDP)); if !ok {
return true } — then proceed to construct truncationAwareWriter/responseWriter
and call s.dnsMux.ServeDNS(writer, msg) as before.

---

Nitpick comments:
In `@client/firewall/uspfilter/hooks_bench_test.go`:
- Around line 88-90: The benchmark is parsing addresses inside the timed loop,
so move the netip.MustParseAddr(...) calls out of the for b.Loop() body and
precompute netip.Addr variables (e.g., addr100 :=
netip.MustParseAddr("100.10.255.254")) before the loop or before b.ResetTimer(),
then call m.udpHooksDrop(53, addr100, pkt) inside the loop; apply the same
change for the other occurrences mentioned (lines around 103–105, 117–119,
131–133, 145–148) so the benchmark measures only the hook path and not address
parsing.

In `@client/internal/dns/service_listener.go`:
- Around line 90-103: The TCP goroutine currently doesn't update
listenerIsRunning while the UDP one does; modify the TCP goroutine that calls
s.tcpServer.ListenAndServe() to mirror the UDP goroutine by calling
s.setListenerStatus(true) at start and defer s.setListenerStatus(false) so
listenerIsRunning reflects TCP lifecycle too, and ensure the existing log.Errorf
for the ListenAndServe error remains (or add a distinct log message for TCP
startup failure) to make failures explicit; reference s.tcpServer, s.server,
ListenAndServe, and setListenerStatus when making the change.

In `@client/internal/dns/tcpstack.go`:
- Around line 385-387: Add brief explanatory comments to the three empty TSIG
stub methods on tcpResponseWriter to satisfy static analysis: above
TsigStatus(), TsigTimersOnly(bool), and Hijack() add one-line comments stating
these are intentional no-ops because TSIG is not supported by the TCP stack
(e.g., "TsigStatus is a no-op; TSIG not supported in TCP response writer"). Keep
comments short and directly reference the method names so reviewers can see
these are intentional stubs.
- Around line 271-292: The loop that reads TCP DNS messages sets a read deadline
with conn.SetDeadline but does not set an explicit write deadline before sending
responses; update tcpResponseWriter or the serving path (e.g., right before
calling t.mux.ServeDNS in the loop or inside tcpResponseWriter.WriteMsg) to call
conn.SetWriteDeadline(time.Now().Add(dnsTCPWriteTimeout)) (or conn.SetDeadline
accordingly using a write-specific timeout constant) so that WriteMsg cannot
block indefinitely if the client stops reading; reference tcpResponseWriter,
readTCPDNSMessage, and the ServeDNS call to locate where to add the write
deadline.
- Around line 307-314: The empty implementations on dnsEndpoint (methods Wait,
AddHeader, ParseHeader, Close, SetLinkAddress, SetOnCloseAction) are flagged by
static analysis; update each of these methods to include a single-line comment
inside the function body stating they are intentional no-ops to satisfy
stack.LinkEndpoint (e.g., "// no-op: required by stack.LinkEndpoint" or
similar), and for ParseHeader keep the existing true return but precede it with
a comment explaining it intentionally accepts all packets; leave SetMTU and
ARPHardwareType unchanged.

In `@client/internal/dns/upstream.go`:
- Around line 79-83: The function name contextWithupstreamProtocolResult uses
inconsistent camelCase; rename it to contextWithUpstreamProtocolResult
(preserving unexported visibility) and update its declaration, any callers,
tests, and associated comment to match; ensure the related types
upstreamProtocolResult and upstreamProtocolKey{} remain unchanged and update any
references to the old name so builds pass and godoc/comment capitalization
aligns with the new identifier.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f3bd686d-771b-49bc-a8e5-ebbd0f6b0270

📥 Commits

Reviewing files that changed from the base of the PR and between e5914e4 and db02d7c.

📒 Files selected for processing (22)
  • client/firewall/uspfilter/filter.go
  • client/firewall/uspfilter/filter_test.go
  • client/firewall/uspfilter/hooks_bench_test.go
  • client/firewall/uspfilter/rule.go
  • client/firewall/uspfilter/tracer_test.go
  • client/iface/device/device_filter.go
  • client/iface/mocks/filter.go
  • client/iface/mocks/iface/mocks/filter.go
  • client/internal/dns/handler_chain.go
  • client/internal/dns/response_writer.go
  • client/internal/dns/server.go
  • client/internal/dns/server_test.go
  • client/internal/dns/service.go
  • client/internal/dns/service_listener.go
  • client/internal/dns/service_listener_test.go
  • client/internal/dns/service_memory.go
  • client/internal/dns/tcpstack.go
  • client/internal/dns/upstream.go
  • client/internal/dns/upstream_android.go
  • client/internal/dns/upstream_test.go
  • client/internal/dnsfwd/forwarder.go
  • client/internal/engine.go
💤 Files with no reviewable changes (1)
  • client/iface/mocks/iface/mocks/filter.go

Comment thread client/firewall/uspfilter/filter_test.go
Comment thread client/firewall/uspfilter/hooks_bench_test.go Outdated
Comment thread client/internal/dns/service_listener_test.go Outdated
Comment thread client/internal/dns/service_listener_test.go Outdated
Comment thread client/internal/dns/service_memory.go Outdated
Comment thread client/internal/dns/tcpstack.go
Comment thread client/internal/dns/upstream_test.go
Comment thread client/internal/engine.go
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
client/internal/dns/tcpstack.go (2)

316-338: Document the TUN packet offset constant.

The tunPacketOffset = 40 is unexplained. Adding a brief comment would help future maintainers understand its purpose (e.g., TUN header space, alignment requirement, or buffer padding for wireguard-go).

+// tunPacketOffset provides padding at the start of the buffer for TUN device writes.
+// This matches the offset used by wireguard-go for packet alignment.
 const tunPacketOffset = 40
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/internal/dns/tcpstack.go` around lines 316 - 338, Add a brief doc
comment above the tunPacketOffset constant explaining why it is 40, e.g., that
it reserves 40 bytes of TUN header/padding required by the underlying TUN device
or for alignment/compatibility with wireguard-go when calling
dnsEndpoint.WritePackets (which uses tunPacketOffset as the write offset into
the buffer passed to dnsEndpoint.tunDev.Write). Mention the unit (bytes) and the
intent (header space / buffer alignment) so future maintainers can understand
and safely change it.

130-196: Consider using defer for cleanup to reduce repetition.

The cleanup pattern (s.Close(); s.Wait()) is repeated on every error path. A helper or defer-based approach would reduce duplication.

♻️ Optional refactor using defer
 func (t *tcpDNSServer) startLocked() error {
 	s := stack.New(stack.Options{
 		NetworkProtocols:   []stack.NetworkProtocolFactory{ipv4.NewProtocol},
 		TransportProtocols: []stack.TransportProtocolFactory{tcp.NewProtocol},
 		HandleLocal:        false,
 	})
+	success := false
+	defer func() {
+		if !success {
+			s.Close()
+			s.Wait()
+		}
+	}()

 	nicID := tcpip.NICID(1)
 	ep := &dnsEndpoint{
 		tunDev: t.tunDev,
 	}
 	ep.mtu.Store(uint32(t.mtu))

 	if err := s.CreateNIC(nicID, ep); err != nil {
-		s.Close()
-		s.Wait()
 		return fmt.Errorf("create NIC: %v", err)
 	}
 	// ... similar for other error paths ...
 
+	success = true
 	t.s = s
 	t.ep = ep
 	return nil
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/internal/dns/tcpstack.go` around lines 130 - 196, The startLocked
method repeats s.Close(); s.Wait() on each error; define a deferred cleanup to
eliminate duplication: after s := stack.New(...) create a local var closeStack
:= func(){ s.Close(); s.Wait() } and defer closeStack(), replace all explicit
s.Close()/s.Wait() error returns to just return the fmt.Errorf(...) (remove
those calls), and when startup succeeds (before returning nil) set closeStack =
func(){} to cancel the deferred cleanup; reference tcpDNSServer.startLocked,
variable s, and the existing s.Close()/s.Wait() calls when making this change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@client/internal/dns/tcpstack.go`:
- Around line 316-338: Add a brief doc comment above the tunPacketOffset
constant explaining why it is 40, e.g., that it reserves 40 bytes of TUN
header/padding required by the underlying TUN device or for
alignment/compatibility with wireguard-go when calling dnsEndpoint.WritePackets
(which uses tunPacketOffset as the write offset into the buffer passed to
dnsEndpoint.tunDev.Write). Mention the unit (bytes) and the intent (header space
/ buffer alignment) so future maintainers can understand and safely change it.
- Around line 130-196: The startLocked method repeats s.Close(); s.Wait() on
each error; define a deferred cleanup to eliminate duplication: after s :=
stack.New(...) create a local var closeStack := func(){ s.Close(); s.Wait() }
and defer closeStack(), replace all explicit s.Close()/s.Wait() error returns to
just return the fmt.Errorf(...) (remove those calls), and when startup succeeds
(before returning nil) set closeStack = func(){} to cancel the deferred cleanup;
reference tcpDNSServer.startLocked, variable s, and the existing
s.Close()/s.Wait() calls when making this change.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 99303097-83f9-416d-8ec0-5238e12e9e2d

📥 Commits

Reviewing files that changed from the base of the PR and between db02d7c and 8ecb7a9.

📒 Files selected for processing (1)
  • client/internal/dns/tcpstack.go

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@client/internal/dns/service_listener.go`:
- Around line 89-103: The current startup spawns two goroutines that call
s.server.ListenAndServe() and s.tcpServer.ListenAndServe() and only log TCP
errors, which lets Listen() return successfully before TCP has actually bound;
change startup to pre-bind or synchronously verify both transports before
setting listener status: call the underlying Listen/ListenTCP (or ListenAndServe
variants that return binding errors) for both s.server and s.tcpServer (or use
net.Listen to obtain UDP and TCP listeners) and return or propagate any bind
errors upstream instead of only logging them; only call
s.setListenerStatus(true) after both binds succeed and ensure deferred
s.setListenerStatus(false) still runs on failure.
- Around line 105-114: The current branch that configures TCP DNAT when
s.ebpfService != nil and s.listenPort != DefaultPort only logs failures and
leaves TCP DNS unreachable; change it to refuse the eBPF fallback when TCP/53
can't be forwarded: inside the branch that checks s.ebpfService, s.listenPort
and DefaultPort, first guard against s.firewall == nil by either returning an
error from the enclosing startup/configure function or disabling s.ebpfService
(so the eBPF path is skipped), and if s.firewall != nil call
s.firewall.AddInboundDNAT(...) and if that call returns an error do not silently
log—propagate the error (return it) or disable s.ebpfService and set
tcpDNATConfigured = false; use the existing symbols (s.ebpfService, s.firewall,
s.listenPort, DefaultPort, AddInboundDNAT, tcpDNATConfigured) so startup fails
or the eBPF fallback is skipped rather than leaving TCP broken.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b58e9f0e-5cef-414c-ae69-5e79f040d067

📥 Commits

Reviewing files that changed from the base of the PR and between 8ecb7a9 and 37f60ac.

📒 Files selected for processing (10)
  • client/firewall/uspfilter/filter_test.go
  • client/firewall/uspfilter/hooks_bench_test.go
  • client/internal/dns/mock_server.go
  • client/internal/dns/server.go
  • client/internal/dns/service.go
  • client/internal/dns/service_listener.go
  • client/internal/dns/service_listener_test.go
  • client/internal/dns/service_memory.go
  • client/internal/dns/tcpstack.go
  • client/internal/engine.go
✅ Files skipped from review due to trivial changes (2)
  • client/firewall/uspfilter/hooks_bench_test.go
  • client/internal/dns/tcpstack.go
🚧 Files skipped from review as they are similar to previous changes (3)
  • client/internal/dns/service.go
  • client/firewall/uspfilter/filter_test.go
  • client/internal/dns/service_listener_test.go

Comment thread client/internal/dns/service_listener.go
Comment thread client/internal/dns/service_listener.go
@lixmal lixmal force-pushed the local-dns-tcp-listener branch 4 times, most recently from 6b259a8 to d72fa43 Compare April 1, 2026 11:10
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
client/internal/dns/service_listener.go (2)

105-114: ⚠️ Potential issue | 🟠 Major

Don't continue with the eBPF fallback when TCP/53 forwarding is missing.

On the non-53 eBPF path, local resolvers still retry over TCP/53. If firewall is nil or AddOutputDNAT fails, Listen() still succeeds and silently leaves TCP DNS unreachable while UDP keeps working. This branch should fail startup (and tear down anything already started) or skip the eBPF fallback entirely.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/internal/dns/service_listener.go` around lines 105 - 114, The current
eBPF TCP DNAT branch can leave TCP/53 unreachable without failing startup;
change Listen() so that when s.ebpfService != nil and s.listenPort !=
DefaultPort you require s.firewall != nil (otherwise disable/skip the eBPF
fallback path) and if AddOutputDNAT(...) returns an error you propagate that
error out of Listen() and perform teardown (call the service cleanup used on
failure, e.g. s.Close() or the existing shutdown sequence) instead of logging
and continuing; keep s.tcpDNATConfigured set true only on successful
AddOutputDNAT and ensure no half-started eBPF path is left running when DNAT
cannot be installed.

70-76: ⚠️ Potential issue | 🟠 Major

Record listener startup state synchronously after both transports are known-good.

listenerIsRunning is only updated from the UDP goroutine, after Listen() has already returned. That leaves a race where a second Listen() can start another server pair, or Stop() can short-circuit while TCP, DNAT, and eBPF cleanup are still outstanding. It also lets UDP startup failures be reduced to logs even if callers were told startup succeeded.

Also applies to: 90-103, 123-125

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/internal/dns/service_listener.go` around lines 70 - 76, The Listen()
path sets listenerIsRunning asynchronously in the UDP goroutine, causing races
where Listen() returns success before transports are actually ready; update the
implementation so Listen() only returns after both transports (TCP and UDP) have
been started and verified good, and set listenerIsRunning synchronously under
the same listenerFlagLock before returning. Concretely: have the UDP startup
routine surface success/failure to Listen() (e.g., via channel or error return),
perform TCP/DNAT/eBPF startup and validation inline or wait for their ready
signals, and only when all start steps succeed acquire listenerFlagLock and set
listenerIsRunning = true; if any startup fails, propagate the error instead of
logging and leaving the flag unset, and ensure Stop() checks the flag under
listenerFlagLock so it won’t short-circuit cleanup.
🧹 Nitpick comments (1)
client/firewall/uspfilter/filter_test.go (1)

214-235: Add a packet-path test for TCP hooks.

This only validates the stored hook metadata. The new TCP DNS path depends on filterOutbound matching a real TCP packet and invoking that hook, so a regression in the TCP branch would still pass. Please mirror TestProcessOutgoingHooks with SetTCPPacketHook and a serialized TCP packet.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/firewall/uspfilter/filter_test.go` around lines 214 - 235,
TestSetTCPPacketHook currently only checks stored hook metadata but doesn't
exercise the TCP packet path; add a parallel test to TestProcessOutgoingHooks
that constructs/serializes a real TCP packet and passes it through
manager.filterOutbound so the SetTCPPacketHook hook on manager (verify via
manager.tcpHookOut and the fn) is actually invoked; specifically, in
TestSetTCPPacketHook use the same packet-building/serialization approach as
TestProcessOutgoingHooks, call manager.filterOutbound with that packet, and
assert the hook's fn was called and returned the expected result, ensuring you
reference SetTCPPacketHook, manager.tcpHookOut, filterOutbound and mirror
TestProcessOutgoingHooks' assertions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@client/firewall/uspfilter/nat.go`:
- Around line 470-484: The AddOutputDNAT and RemoveOutputDNAT methods currently
return nil when m.nativeFirewall == nil, which signals success to callers;
change both to return errNatNotSupported instead so behavior matches
AddDNATRule/DeleteDNATRule and keeps client/internal/dns/service_listener.go
from marking tcpDNATConfigured true when no native DNAT was created; locate
Manager.AddOutputDNAT and Manager.RemoveOutputDNAT and replace the nil-return
branch with returning errNatNotSupported.

---

Duplicate comments:
In `@client/internal/dns/service_listener.go`:
- Around line 105-114: The current eBPF TCP DNAT branch can leave TCP/53
unreachable without failing startup; change Listen() so that when s.ebpfService
!= nil and s.listenPort != DefaultPort you require s.firewall != nil (otherwise
disable/skip the eBPF fallback path) and if AddOutputDNAT(...) returns an error
you propagate that error out of Listen() and perform teardown (call the service
cleanup used on failure, e.g. s.Close() or the existing shutdown sequence)
instead of logging and continuing; keep s.tcpDNATConfigured set true only on
successful AddOutputDNAT and ensure no half-started eBPF path is left running
when DNAT cannot be installed.
- Around line 70-76: The Listen() path sets listenerIsRunning asynchronously in
the UDP goroutine, causing races where Listen() returns success before
transports are actually ready; update the implementation so Listen() only
returns after both transports (TCP and UDP) have been started and verified good,
and set listenerIsRunning synchronously under the same listenerFlagLock before
returning. Concretely: have the UDP startup routine surface success/failure to
Listen() (e.g., via channel or error return), perform TCP/DNAT/eBPF startup and
validation inline or wait for their ready signals, and only when all start steps
succeed acquire listenerFlagLock and set listenerIsRunning = true; if any
startup fails, propagate the error instead of logging and leaving the flag
unset, and ensure Stop() checks the flag under listenerFlagLock so it won’t
short-circuit cleanup.

---

Nitpick comments:
In `@client/firewall/uspfilter/filter_test.go`:
- Around line 214-235: TestSetTCPPacketHook currently only checks stored hook
metadata but doesn't exercise the TCP packet path; add a parallel test to
TestProcessOutgoingHooks that constructs/serializes a real TCP packet and passes
it through manager.filterOutbound so the SetTCPPacketHook hook on manager
(verify via manager.tcpHookOut and the fn) is actually invoked; specifically, in
TestSetTCPPacketHook use the same packet-building/serialization approach as
TestProcessOutgoingHooks, call manager.filterOutbound with that packet, and
assert the hook's fn was called and returned the expected result, ensuring you
reference SetTCPPacketHook, manager.tcpHookOut, filterOutbound and mirror
TestProcessOutgoingHooks' assertions.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d77d5b8f-3639-41ab-9284-761d382005b7

📥 Commits

Reviewing files that changed from the base of the PR and between 37f60ac and e2dad24.

📒 Files selected for processing (16)
  • client/firewall/iptables/manager_linux.go
  • client/firewall/iptables/router_linux.go
  • client/firewall/manager/firewall.go
  • client/firewall/nftables/manager_linux.go
  • client/firewall/nftables/router_linux.go
  • client/firewall/uspfilter/filter_test.go
  • client/firewall/uspfilter/hooks_bench_test.go
  • client/firewall/uspfilter/nat.go
  • client/internal/dns/mock_server.go
  • client/internal/dns/server.go
  • client/internal/dns/service.go
  • client/internal/dns/service_listener.go
  • client/internal/dns/service_listener_test.go
  • client/internal/dns/service_memory.go
  • client/internal/dns/tcpstack.go
  • client/internal/engine.go
✅ Files skipped from review due to trivial changes (3)
  • client/internal/dns/mock_server.go
  • client/firewall/uspfilter/hooks_bench_test.go
  • client/internal/dns/tcpstack.go
🚧 Files skipped from review as they are similar to previous changes (2)
  • client/internal/dns/service.go
  • client/internal/dns/service_listener_test.go

Comment thread client/firewall/uspfilter/nat.go
@lixmal lixmal force-pushed the local-dns-tcp-listener branch 2 times, most recently from f6d360f to 3db6e15 Compare April 1, 2026 11:20
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
client/internal/dns/service_memory.go (1)

101-166: ⚠️ Potential issue | 🟠 Major

Don't make TCP DNS setup a one-shot check during Listen().

tcpDNS and the TCP hook are only created in filterDNSTraffic(). If GetDevice() is nil at that moment, the UDP path can still recover later because the hook re-checks GetDevice() per packet, but TCP never does—there is no later path that installs tcpDNS/SetTCPPacketHook(...). In that startup sequence, truncated UDP replies will keep sending clients into TCP retries that this service still drops until a full restart.

Introduce a small ensureTCPDNS() path that can run once the device becomes available, and call it from the packet-handling path before building the writer / serving a truncated response.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/internal/dns/service_memory.go` around lines 101 - 166,
filterDNSTraffic() currently only creates s.tcpDNS and the TCP packet hook once
at startup, so if GetDevice() was nil then the TCP path is never installed
later; add an ensureTCPDNS() helper (method on ServiceViaMemory) that checks
wgInterface.GetDevice(), initializes s.tcpDNS via newTCPDNSServer(...) and
installs the TCP hook with filter.SetTCPPacketHook(s.runtimeIP,
uint16(s.runtimePort), ...) and sets s.tcpHookSet atomically/ idempotently; then
call ensureTCPDNS() from inside the UDP packet hook (before constructing
truncationAwareWriter / calling dnsMux.ServeDNS) so truncated UDP responses can
spawn a TCP server when the device becomes available.
🧹 Nitpick comments (1)
client/firewall/uspfilter/filter_test.go (1)

214-235: Exercise the TCP dispatch path here too.

This test only loads tcpHookOut and calls the stored callback directly, so it would still pass if outbound TCP packets never reached the hook. A filterOutbound() assertion like TestProcessOutgoingHooks would cover the path this PR actually depends on.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/firewall/uspfilter/filter_test.go` around lines 214 - 235,
TestSetTCPPacketHook currently only inspects tcpHookOut and invokes the stored
callback directly, so it doesn't validate the actual dispatch path; update the
test (TestSetTCPPacketHook) to also exercise the outbound dispatch by invoking
the manager's outbound processing (the same path used in
TestProcessOutgoingHooks) with a crafted outbound TCP packet that matches
netip.MustParseAddr("10.168.0.1") and port 53 so the stored hook is executed;
ensure you call manager.SetTCPPacketHook(...) with a sentinel side-effect, run
the outbound filter/dispatch method used for outgoing packets (the
filterOutbound/process outgoing hook path), and assert the hook was called and
tcpHookOut is cleared after setting nil.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@client/firewall/iptables/router_linux.go`:
- Around line 985-1004: ensureNATOutputChain can leave a dangling chain when
NewChain succeeded but Insert and ClearAndDeleteChain both fail; make the check
resilient by first querying the kernel for the chain and treating an
already-existing chain as okay: call the iptables client chain-existence helper
(e.g. r.iptablesClient.ChainExists or ExistsChain on tableNat, chainNATOutput)
before calling NewChain and if the chain already exists, skip NewChain and
proceed to ensure the OUTPUT jump rule is present (insert it if missing) and
then set r.rules[jumpNatOutput] and call r.updateState(); also, if NewChain
returns an error indicating “already exists”, handle that the same way instead
of failing. Ensure you reference and update r.rules[jumpNatOutput], use
r.iptablesClient.NewChain, r.iptablesClient.Insert and
r.iptablesClient.ClearAndDeleteChain, and keep r.updateState() behavior
unchanged.

---

Outside diff comments:
In `@client/internal/dns/service_memory.go`:
- Around line 101-166: filterDNSTraffic() currently only creates s.tcpDNS and
the TCP packet hook once at startup, so if GetDevice() was nil then the TCP path
is never installed later; add an ensureTCPDNS() helper (method on
ServiceViaMemory) that checks wgInterface.GetDevice(), initializes s.tcpDNS via
newTCPDNSServer(...) and installs the TCP hook with
filter.SetTCPPacketHook(s.runtimeIP, uint16(s.runtimePort), ...) and sets
s.tcpHookSet atomically/ idempotently; then call ensureTCPDNS() from inside the
UDP packet hook (before constructing truncationAwareWriter / calling
dnsMux.ServeDNS) so truncated UDP responses can spawn a TCP server when the
device becomes available.

---

Nitpick comments:
In `@client/firewall/uspfilter/filter_test.go`:
- Around line 214-235: TestSetTCPPacketHook currently only inspects tcpHookOut
and invokes the stored callback directly, so it doesn't validate the actual
dispatch path; update the test (TestSetTCPPacketHook) to also exercise the
outbound dispatch by invoking the manager's outbound processing (the same path
used in TestProcessOutgoingHooks) with a crafted outbound TCP packet that
matches netip.MustParseAddr("10.168.0.1") and port 53 so the stored hook is
executed; ensure you call manager.SetTCPPacketHook(...) with a sentinel
side-effect, run the outbound filter/dispatch method used for outgoing packets
(the filterOutbound/process outgoing hook path), and assert the hook was called
and tcpHookOut is cleared after setting nil.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3635d239-6acf-444c-a734-861fdda08286

📥 Commits

Reviewing files that changed from the base of the PR and between e2dad24 and d72fa43.

📒 Files selected for processing (15)
  • client/firewall/iptables/manager_linux.go
  • client/firewall/iptables/router_linux.go
  • client/firewall/manager/firewall.go
  • client/firewall/nftables/manager_linux.go
  • client/firewall/nftables/router_linux.go
  • client/firewall/uspfilter/filter_test.go
  • client/firewall/uspfilter/nat.go
  • client/internal/dns/mock_server.go
  • client/internal/dns/server.go
  • client/internal/dns/service.go
  • client/internal/dns/service_listener.go
  • client/internal/dns/service_listener_test.go
  • client/internal/dns/service_memory.go
  • client/internal/dns/tcpstack.go
  • client/internal/engine.go
✅ Files skipped from review due to trivial changes (3)
  • client/internal/engine.go
  • client/firewall/nftables/manager_linux.go
  • client/internal/dns/tcpstack.go
🚧 Files skipped from review as they are similar to previous changes (6)
  • client/internal/dns/mock_server.go
  • client/firewall/iptables/manager_linux.go
  • client/internal/dns/service.go
  • client/internal/dns/service_listener_test.go
  • client/firewall/nftables/router_linux.go
  • client/internal/dns/service_listener.go

Comment thread client/firewall/iptables/router_linux.go
lixmal added 4 commits April 1, 2026 14:41
- Rename DNSFirewall to Firewall (revive: stuttering name in dns package)
- Check Stop() error in service_listener_test (errcheck)
- Fix firewall nil at DNS construction: add SetFirewall to Server
  interface, inject from engine after createFirewall completes
- Close managers in hook tests and benchmarks to prevent goroutine leaks
- Fix tcpResponseWriter.Write returning wire byte count instead of data
- Add nil checks for UDP layer and GetDevice in memory service hook
@lixmal lixmal force-pushed the local-dns-tcp-listener branch from 3db6e15 to 63cf286 Compare April 1, 2026 12:44
- Make ensureNATOutputChain retry-safe by checking ChainExists before
  NewChain, handling orphaned chains from partial failures
- Fix TestExchangeWithFallback_UDPFallbackToTCP to actually test the
  truncation-to-TCP retry path
@pappz pappz self-requested a review April 2, 2026 09:12
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
client/internal/dns/tcpstack.go (2)

457-468: IPv6 header parsing doesn't handle extension headers.

srcIPv6 returns a fixed offset of header.IPv6MinimumSize (40 bytes) for the transport header. If IPv6 extension headers are present, this offset would be incorrect. However, for the current use case (logging source address), this is acceptable since the source IP extraction is correct.

For future-proofing, consider adding a comment noting this limitation:

 func srcIPv6(pkt []byte) (netip.Addr, int) {
 	if len(pkt) < header.IPv6MinimumSize {
 		return netip.Addr{}, 0
 	}
 	hdr := header.IPv6(pkt)
 	src := hdr.SourceAddress()
 	ip, ok := netip.AddrFromSlice(src.AsSlice())
 	if !ok {
 		return netip.Addr{}, 0
 	}
+	// Note: This assumes no extension headers. For logging purposes only.
 	return ip, header.IPv6MinimumSize
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/internal/dns/tcpstack.go` around lines 457 - 468, The srcIPv6 function
extracts the IPv6 source correctly but always returns a fixed transport offset
of header.IPv6MinimumSize (40 bytes) and does not parse IPv6 extension headers;
update the code by adding a concise comment above srcIPv6 stating this
limitation (that extension headers are not handled so the returned offset may be
incorrect if extensions exist) and note that this is acceptable for the current
logging-only use case, so leave the implementation unchanged for now; reference
the srcIPv6 function and header.IPv6MinimumSize in the comment for clarity.

400-403: Message length validation could be tighter.

The upper bound check msgLen > 65535 is redundant since msgLen is computed from two bytes and can never exceed 65535. The lower bound msgLen == 0 is correct. Consider also validating against a reasonable minimum DNS message size (12 bytes for header).

♻️ Suggested tighter validation
 	msgLen := int(lenBuf[0])<<8 | int(lenBuf[1])
-	if msgLen == 0 || msgLen > 65535 {
+	if msgLen < 12 {
+		return nil, fmt.Errorf("message too short: %d bytes", msgLen)
+	}
+	if msgLen > 65535 {
 		return nil, fmt.Errorf("invalid message length: %d", msgLen)
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/internal/dns/tcpstack.go` around lines 400 - 403, The length check on
msgLen should reject messages smaller than a valid DNS header; replace the
current condition on msgLen with a tighter validation such as checking if msgLen
< 12 (and optionally also guard msgLen > 65535 for clarity) instead of only
msgLen == 0, and update the error message to reflect the minimum required size;
locate the check that computes msgLen from lenBuf (the msgLen variable in tcp
read logic in tcpstack.go) and change the if to e.g. if msgLen < 12 { return
nil, fmt.Errorf("invalid message length: %d (must be >= 12)", msgLen) }.
client/internal/dns/upstream.go (1)

587-605: Duplicate truncation logic in ExchangeWithNetstack.

The truncation logic (lines 597-603) is identical to ExchangeWithFallback (lines 554-560). Consider extracting this into a helper function to reduce duplication.

♻️ Suggested refactor to reduce duplication
+// truncateForUDPClient truncates a TCP-fetched response to fit the client's UDP buffer.
+func truncateForUDPClient(rm *dns.Msg, r *dns.Msg) {
+	maxSize := dns.MinMsgSize
+	if opt := r.IsEdns0(); opt != nil {
+		maxSize = int(opt.UDPSize())
+	}
+	if rm.Len() > maxSize {
+		rm.Truncate(maxSize)
+	}
+}

Then use truncateForUDPClient(rm, r) in both ExchangeWithFallback and ExchangeWithNetstack.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/internal/dns/upstream.go` around lines 587 - 605, Extract the
duplicated UDP truncation logic into a helper (e.g., truncateForUDPClient(resp
*dns.Msg, req *dns.Msg)) and call it from both ExchangeWithNetstack and
ExchangeWithFallback; the helper should compute maxSize = dns.MinMsgSize or use
req.IsEdns0().UDPSize() when present and call resp.Truncate(maxSize) if
resp.Len() > maxSize, preserving existing behavior when the request came in over
UDP but was fetched via TCP.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@client/internal/dns/tcpstack.go`:
- Around line 457-468: The srcIPv6 function extracts the IPv6 source correctly
but always returns a fixed transport offset of header.IPv6MinimumSize (40 bytes)
and does not parse IPv6 extension headers; update the code by adding a concise
comment above srcIPv6 stating this limitation (that extension headers are not
handled so the returned offset may be incorrect if extensions exist) and note
that this is acceptable for the current logging-only use case, so leave the
implementation unchanged for now; reference the srcIPv6 function and
header.IPv6MinimumSize in the comment for clarity.
- Around line 400-403: The length check on msgLen should reject messages smaller
than a valid DNS header; replace the current condition on msgLen with a tighter
validation such as checking if msgLen < 12 (and optionally also guard msgLen >
65535 for clarity) instead of only msgLen == 0, and update the error message to
reflect the minimum required size; locate the check that computes msgLen from
lenBuf (the msgLen variable in tcp read logic in tcpstack.go) and change the if
to e.g. if msgLen < 12 { return nil, fmt.Errorf("invalid message length: %d
(must be >= 12)", msgLen) }.

In `@client/internal/dns/upstream.go`:
- Around line 587-605: Extract the duplicated UDP truncation logic into a helper
(e.g., truncateForUDPClient(resp *dns.Msg, req *dns.Msg)) and call it from both
ExchangeWithNetstack and ExchangeWithFallback; the helper should compute maxSize
= dns.MinMsgSize or use req.IsEdns0().UDPSize() when present and call
resp.Truncate(maxSize) if resp.Len() > maxSize, preserving existing behavior
when the request came in over UDP but was fetched via TCP.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 878fa4d9-153c-4eb4-b832-41ecba9ec690

📥 Commits

Reviewing files that changed from the base of the PR and between e2dad24 and 0d8074d.

📒 Files selected for processing (28)
  • client/firewall/iptables/manager_linux.go
  • client/firewall/iptables/router_linux.go
  • client/firewall/manager/firewall.go
  • client/firewall/nftables/manager_linux.go
  • client/firewall/nftables/router_linux.go
  • client/firewall/uspfilter/filter.go
  • client/firewall/uspfilter/filter_test.go
  • client/firewall/uspfilter/nat.go
  • client/firewall/uspfilter/rule.go
  • client/firewall/uspfilter/tracer_test.go
  • client/iface/device/device_filter.go
  • client/iface/mocks/filter.go
  • client/iface/mocks/iface/mocks/filter.go
  • client/internal/dns/handler_chain.go
  • client/internal/dns/mock_server.go
  • client/internal/dns/response_writer.go
  • client/internal/dns/server.go
  • client/internal/dns/server_test.go
  • client/internal/dns/service.go
  • client/internal/dns/service_listener.go
  • client/internal/dns/service_listener_test.go
  • client/internal/dns/service_memory.go
  • client/internal/dns/tcpstack.go
  • client/internal/dns/upstream.go
  • client/internal/dns/upstream_android.go
  • client/internal/dns/upstream_test.go
  • client/internal/dnsfwd/forwarder.go
  • client/internal/engine.go
💤 Files with no reviewable changes (1)
  • client/iface/mocks/iface/mocks/filter.go
✅ Files skipped from review due to trivial changes (4)
  • client/internal/dns/upstream_android.go
  • client/firewall/iptables/manager_linux.go
  • client/internal/engine.go
  • client/firewall/iptables/router_linux.go
🚧 Files skipped from review as they are similar to previous changes (10)
  • client/internal/dns/mock_server.go
  • client/firewall/uspfilter/rule.go
  • client/internal/dnsfwd/forwarder.go
  • client/internal/dns/handler_chain.go
  • client/firewall/nftables/manager_linux.go
  • client/internal/dns/upstream_test.go
  • client/internal/dns/server_test.go
  • client/internal/dns/service_listener_test.go
  • client/firewall/uspfilter/filter_test.go
  • client/firewall/uspfilter/nat.go

Comment thread client/internal/dns/service_listener.go
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (5)
client/firewall/nftables/router_linux.go (1)

39-39: Minor: Consider cleanup of empty NAT output chain.

The chainNameNATOutput chain is created lazily but not cleaned up when empty (e.g., after all OUTPUT DNAT rules are removed). This is minor—empty chains don't cause functional issues—but you may want to track this for future cleanup if the firewall lifecycle becomes more dynamic.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/firewall/nftables/router_linux.go` at line 39, The NAT output chain
netbird-nat-output (chainNameNATOutput) is created lazily but never removed;
update the code paths that remove OUTPUT DNAT rules (e.g., the function(s) that
delete DNAT rules or the routine that tears down NAT rules) to check the chain's
rule count and, if zero, delete the chain and its hook from nftables.
Concretely: after removing an OUTPUT DNAT rule, call into the nftables client to
list rules in chainNameNATOutput, and if none remain call the existing
chain/table delete functions (or add a cleanup function like
cleanupNATOutputChain) to remove the chain and its associated hook/chain entry
to avoid leaving empty chains behind.
client/internal/dns/server.go (1)

155-156: Verify late SetFirewall() calls cannot miss TCP DNAT setup.

newServiceViaListener(..., nil) now creates the listener with no firewall, and SetFirewall() only swaps svc.firewall. client/internal/dns/service_listener.go adds the TCP DNAT rule during Listen(), so if any caller wires the firewall after the first Listen(), custom-port TCP/53 redirection never gets backfilled. Please either make SetFirewall() program/remove DNAT on a running listener or confirm every call site sets it before the first Listen().

Also applies to: 378-385

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/internal/dns/server.go` around lines 155 - 156, The current call site
creates the service with newServiceViaListener(..., nil) so the listener is
created without firewall rules and later SetFirewall only swaps svc.firewall,
which misses the TCP DNAT added during Listen(); update SetFirewall on the
service (method SetFirewall) to apply or remove the TCP DNAT immediately on any
already-running listener (i.e., invoke the same DNAT programming/removal logic
that Listen() uses when a firewall is present) so late wiring backfills
redirection, or alternatively ensure every caller of newServiceViaListener
passes a non-nil firewall before Listen() is ever called; specifically modify
SetFirewall to detect svc.listener (from service_listener.go) and call the
listener’s DNAT install/remove routine (the same codepath Listen uses) so
custom-port TCP/53 redirection is consistent.
client/firewall/uspfilter/filter_test.go (1)

214-236: Add a packet-level TCP hook test.

This verifies the setter/clear semantics, but the new TCP branch in client/firewall/uspfilter/filter.go at Line 727 still has no end-to-end coverage for matching dstIP:dport and short-circuiting filterOutbound(). A TCP test mirroring TestProcessOutgoingHooks would better protect the DNS-over-TCP path.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/firewall/uspfilter/filter_test.go` around lines 214 - 236, Add an
end-to-end test that mirrors TestProcessOutgoingHooks but exercises the TCP
branch added in filter.go (around the new TCP handling at the dstIP:dport
match), using SetTCPPacketHook to register a hook for a target dstIP and port
and asserting the hook is invoked and that filterOutbound short-circuits (i.e.,
the packet is accepted/blocked per the hook) for DNS-over-TCP; specifically
create a test that constructs the manager (as in TestSetTCPPacketHook),
registers a TCP packet hook via SetTCPPacketHook, synthesizes an outgoing TCP
packet matching dstIP:dport, calls the same processing path used in production
(the outgoing packet processing function that contains the TCP branch /
filterOutbound), and assert the hook was called and the processing returned the
short-circuited result, then clear the hook and assert normal processing
resumes.
client/internal/dns/upstream.go (1)

552-560: Extract the UDP-client truncation block into one helper.

The same EDNS0 → maxSize → Truncate sequence now lives in both exchange paths. Centralizing it will keep the standard client path and the netstack path from drifting.

♻️ Suggested extraction
-	// Request came in over UDP but response was fetched via TCP.
-	// Truncate to fit the client's UDP buffer.
-	maxSize := dns.MinMsgSize
-	if opt := r.IsEdns0(); opt != nil {
-		maxSize = int(opt.UDPSize())
-	}
-	if rm.Len() > maxSize {
-		rm.Truncate(maxSize)
-	}
+	truncateToClientUDPSize(r, rm)
...
-		// Request came in over UDP but response was fetched via TCP.
-		// Truncate to fit the client's UDP buffer.
-		maxSize := dns.MinMsgSize
-		if opt := r.IsEdns0(); opt != nil {
-			maxSize = int(opt.UDPSize())
-		}
-		if rm.Len() > maxSize {
-			rm.Truncate(maxSize)
-		}
+		truncateToClientUDPSize(r, rm)
func truncateToClientUDPSize(req, resp *dns.Msg) {
	maxSize := dns.MinMsgSize
	if opt := req.IsEdns0(); opt != nil {
		maxSize = int(opt.UDPSize())
	}
	if resp.Len() > maxSize {
		resp.Truncate(maxSize)
	}
}

Also applies to: 595-603

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/internal/dns/upstream.go` around lines 552 - 560, Extract the
EDNS0→maxSize→Truncate logic into a helper (e.g., truncateToClientUDPSize(req
*dns.Msg, resp *dns.Msg)) and replace the duplicated blocks in both exchange
paths: the standard client path that currently uses variables r and rm and the
netstack path (the second block at lines ~595-603). The helper should compute
maxSize starting from dns.MinMsgSize, override from req.IsEdns0().UDPSize() when
present, and call resp.Truncate(maxSize) only if resp.Len() > maxSize; then call
this helper in place of the inline truncation logic in both locations.
client/internal/dns/tcpstack.go (1)

355-381: Deduplicate the TCP framing logic.

WriteMsg and Write both rebuild the same 2-byte length prefix. A small shared helper would keep framing, error handling, and future changes from drifting between the two paths.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/internal/dns/tcpstack.go` around lines 355 - 381, Both WriteMsg and
Write duplicate the DNS-over-TCP 2-byte length framing and write logic; extract
that into a single helper on tcpResponseWriter (e.g., writeTCPFrame or
frameAndWrite) that accepts a []byte, builds the 2-byte length-prefixed buffer,
writes to w.conn, and returns appropriate results/errors; then have WriteMsg
call msg.Pack(), check the pack error as before, and pass the packed data into
the helper, and have Write call the same helper and return len(data) on success
to preserve existing semantics. Ensure the helper is used by both
tcpResponseWriter.WriteMsg and tcpResponseWriter.Write so framing and error
handling are centralized.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@client/internal/dns/tcpstack.go`:
- Around line 272-275: The code uses
conn.SetDeadline(time.Now().Add(dnsTCPReadTimeout)) which applies the timeout to
both reads and writes; change this to
conn.SetReadDeadline(time.Now().Add(dnsTCPReadTimeout)) so only the initial read
is time-bounded (reference conn and dnsTCPReadTimeout in tcpstack.go).
Additionally, add explicit write deadlines (e.g., conn.SetWriteDeadline(...)
using a separate write timeout) immediately before the response write operations
that currently occur later in the handler (the writes around the response send
points) so writes aren’t constrained by the read deadline.

---

Nitpick comments:
In `@client/firewall/nftables/router_linux.go`:
- Line 39: The NAT output chain netbird-nat-output (chainNameNATOutput) is
created lazily but never removed; update the code paths that remove OUTPUT DNAT
rules (e.g., the function(s) that delete DNAT rules or the routine that tears
down NAT rules) to check the chain's rule count and, if zero, delete the chain
and its hook from nftables. Concretely: after removing an OUTPUT DNAT rule, call
into the nftables client to list rules in chainNameNATOutput, and if none remain
call the existing chain/table delete functions (or add a cleanup function like
cleanupNATOutputChain) to remove the chain and its associated hook/chain entry
to avoid leaving empty chains behind.

In `@client/firewall/uspfilter/filter_test.go`:
- Around line 214-236: Add an end-to-end test that mirrors
TestProcessOutgoingHooks but exercises the TCP branch added in filter.go (around
the new TCP handling at the dstIP:dport match), using SetTCPPacketHook to
register a hook for a target dstIP and port and asserting the hook is invoked
and that filterOutbound short-circuits (i.e., the packet is accepted/blocked per
the hook) for DNS-over-TCP; specifically create a test that constructs the
manager (as in TestSetTCPPacketHook), registers a TCP packet hook via
SetTCPPacketHook, synthesizes an outgoing TCP packet matching dstIP:dport, calls
the same processing path used in production (the outgoing packet processing
function that contains the TCP branch / filterOutbound), and assert the hook was
called and the processing returned the short-circuited result, then clear the
hook and assert normal processing resumes.

In `@client/internal/dns/server.go`:
- Around line 155-156: The current call site creates the service with
newServiceViaListener(..., nil) so the listener is created without firewall
rules and later SetFirewall only swaps svc.firewall, which misses the TCP DNAT
added during Listen(); update SetFirewall on the service (method SetFirewall) to
apply or remove the TCP DNAT immediately on any already-running listener (i.e.,
invoke the same DNAT programming/removal logic that Listen() uses when a
firewall is present) so late wiring backfills redirection, or alternatively
ensure every caller of newServiceViaListener passes a non-nil firewall before
Listen() is ever called; specifically modify SetFirewall to detect svc.listener
(from service_listener.go) and call the listener’s DNAT install/remove routine
(the same codepath Listen uses) so custom-port TCP/53 redirection is consistent.

In `@client/internal/dns/tcpstack.go`:
- Around line 355-381: Both WriteMsg and Write duplicate the DNS-over-TCP 2-byte
length framing and write logic; extract that into a single helper on
tcpResponseWriter (e.g., writeTCPFrame or frameAndWrite) that accepts a []byte,
builds the 2-byte length-prefixed buffer, writes to w.conn, and returns
appropriate results/errors; then have WriteMsg call msg.Pack(), check the pack
error as before, and pass the packed data into the helper, and have Write call
the same helper and return len(data) on success to preserve existing semantics.
Ensure the helper is used by both tcpResponseWriter.WriteMsg and
tcpResponseWriter.Write so framing and error handling are centralized.

In `@client/internal/dns/upstream.go`:
- Around line 552-560: Extract the EDNS0→maxSize→Truncate logic into a helper
(e.g., truncateToClientUDPSize(req *dns.Msg, resp *dns.Msg)) and replace the
duplicated blocks in both exchange paths: the standard client path that
currently uses variables r and rm and the netstack path (the second block at
lines ~595-603). The helper should compute maxSize starting from dns.MinMsgSize,
override from req.IsEdns0().UDPSize() when present, and call
resp.Truncate(maxSize) only if resp.Len() > maxSize; then call this helper in
place of the inline truncation logic in both locations.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: be315438-5e02-4c2d-a5d7-d46b197b6d3f

📥 Commits

Reviewing files that changed from the base of the PR and between e2dad24 and 0d8074d.

📒 Files selected for processing (28)
  • client/firewall/iptables/manager_linux.go
  • client/firewall/iptables/router_linux.go
  • client/firewall/manager/firewall.go
  • client/firewall/nftables/manager_linux.go
  • client/firewall/nftables/router_linux.go
  • client/firewall/uspfilter/filter.go
  • client/firewall/uspfilter/filter_test.go
  • client/firewall/uspfilter/nat.go
  • client/firewall/uspfilter/rule.go
  • client/firewall/uspfilter/tracer_test.go
  • client/iface/device/device_filter.go
  • client/iface/mocks/filter.go
  • client/iface/mocks/iface/mocks/filter.go
  • client/internal/dns/handler_chain.go
  • client/internal/dns/mock_server.go
  • client/internal/dns/response_writer.go
  • client/internal/dns/server.go
  • client/internal/dns/server_test.go
  • client/internal/dns/service.go
  • client/internal/dns/service_listener.go
  • client/internal/dns/service_listener_test.go
  • client/internal/dns/service_memory.go
  • client/internal/dns/tcpstack.go
  • client/internal/dns/upstream.go
  • client/internal/dns/upstream_android.go
  • client/internal/dns/upstream_test.go
  • client/internal/dnsfwd/forwarder.go
  • client/internal/engine.go
💤 Files with no reviewable changes (1)
  • client/iface/mocks/iface/mocks/filter.go
✅ Files skipped from review due to trivial changes (4)
  • client/internal/dns/upstream_android.go
  • client/internal/engine.go
  • client/internal/dns/upstream_test.go
  • client/internal/dns/service_listener_test.go
🚧 Files skipped from review as they are similar to previous changes (12)
  • client/internal/dns/mock_server.go
  • client/internal/dnsfwd/forwarder.go
  • client/firewall/uspfilter/rule.go
  • client/internal/dns/handler_chain.go
  • client/internal/dns/response_writer.go
  • client/firewall/nftables/manager_linux.go
  • client/firewall/iptables/manager_linux.go
  • client/internal/dns/service.go
  • client/iface/device/device_filter.go
  • client/firewall/iptables/router_linux.go
  • client/internal/dns/service_listener.go
  • client/internal/dns/service_memory.go

Comment thread client/internal/dns/tcpstack.go Outdated
Comment thread client/internal/dns/server.go
@lixmal lixmal force-pushed the local-dns-tcp-listener branch from 4f31445 to 45a7839 Compare April 2, 2026 12:45
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
client/internal/dns/service_listener.go (2)

110-113: ⚠️ Potential issue | 🟠 Major

Do not publish the eBPF path after TCP DNAT setup fails.

Once this branch is active, a failed AddOutputDNAT means TCP retries to port 53 will never reach the local listener, but Listen() still returns success and the service continues as if TCP DNS were available. Make this a startup error and unwind the already-started servers instead of warning and continuing.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/internal/dns/service_listener.go` around lines 110 - 113, The current
branch swallows AddOutputDNAT failures (in the s.ebpfService && s.firewall &&
s.listenPort != DefaultPort path) and only logs a warning, leaving Listen() to
return success; change this to treat AddOutputDNAT errors as fatal startup
errors: when firewall.AddOutputDNAT(...) returns an error, stop publishing the
eBPF path, call the server shutdown/unwind logic used elsewhere in the listener
(reverse any started servers/listeners started before this point), and return
the error from Listen() (or return a wrapped startup error) instead of logging
and continuing; update the code paths around s.ebpfService, s.firewall,
AddOutputDNAT, and Listen() to perform cleanup on failure so the service does
not advertise TCP DNS when DNAT setup failed.

92-106: ⚠️ Potential issue | 🟠 Major

Use one fatal-exit path for both DNS servers.

These goroutines diverge on failure: the UDP path only flips the running flag and shuts TCP down, while the TCP path only logs. If either server exits outside Stop(), the listener can stay partially up and any DNAT/eBPF state installed later in this method is never unwound. Route both serve loops through a shared teardown path so an unexpected exit fully closes the listener once.

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@client/firewall/manager/firewall.go`:
- Around line 172-176: The AddOutputDNAT and RemoveOutputDNAT APIs accept
netip.Addr but underlying implementations (iptables, nftables using
TableFamilyIPv4) are IPv4-only; add an explicit IPv4 guard: validate
localAddr.Is4() before dispatching (e.g., where s.listenIP is passed) and return
a clear error if not IPv4, or alternatively add a contract comment on the
methods stating they only accept IPv4 addresses; reference the
AddOutputDNAT/RemoveOutputDNAT signatures, the call site passing s.listenIP, and
the nftables use of TableFamilyIPv4/iptables string conversion when making the
change.

In `@client/firewall/uspfilter/filter.go`:
- Around line 143-146: resetState() currently only clears the old rule/route
state and does not clear the new per-protocol hook pointers, leaving stale hooks
(udpHookOut and tcpHookOut) active after shutdown; update resetState() to
explicitly clear/reset both atomic.Pointer[packetHook] fields (udpHookOut and
tcpHookOut) so that any installed outbound hooks are removed on firewall
reset/reuse, ensuring outgoingRules cleanup is complemented by clearing these
hook pointers.

---

Duplicate comments:
In `@client/internal/dns/service_listener.go`:
- Around line 110-113: The current branch swallows AddOutputDNAT failures (in
the s.ebpfService && s.firewall && s.listenPort != DefaultPort path) and only
logs a warning, leaving Listen() to return success; change this to treat
AddOutputDNAT errors as fatal startup errors: when firewall.AddOutputDNAT(...)
returns an error, stop publishing the eBPF path, call the server shutdown/unwind
logic used elsewhere in the listener (reverse any started servers/listeners
started before this point), and return the error from Listen() (or return a
wrapped startup error) instead of logging and continuing; update the code paths
around s.ebpfService, s.firewall, AddOutputDNAT, and Listen() to perform cleanup
on failure so the service does not advertise TCP DNS when DNAT setup failed.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b9fa6831-322d-44f8-a652-c10f5f3cbb03

📥 Commits

Reviewing files that changed from the base of the PR and between e2dad24 and 4f31445.

📒 Files selected for processing (28)
  • client/firewall/iptables/manager_linux.go
  • client/firewall/iptables/router_linux.go
  • client/firewall/manager/firewall.go
  • client/firewall/nftables/manager_linux.go
  • client/firewall/nftables/router_linux.go
  • client/firewall/uspfilter/filter.go
  • client/firewall/uspfilter/filter_test.go
  • client/firewall/uspfilter/nat.go
  • client/firewall/uspfilter/rule.go
  • client/firewall/uspfilter/tracer_test.go
  • client/iface/device/device_filter.go
  • client/iface/mocks/filter.go
  • client/iface/mocks/iface/mocks/filter.go
  • client/internal/dns/handler_chain.go
  • client/internal/dns/mock_server.go
  • client/internal/dns/response_writer.go
  • client/internal/dns/server.go
  • client/internal/dns/server_test.go
  • client/internal/dns/service.go
  • client/internal/dns/service_listener.go
  • client/internal/dns/service_listener_test.go
  • client/internal/dns/service_memory.go
  • client/internal/dns/tcpstack.go
  • client/internal/dns/upstream.go
  • client/internal/dns/upstream_android.go
  • client/internal/dns/upstream_test.go
  • client/internal/dnsfwd/forwarder.go
  • client/internal/engine.go
💤 Files with no reviewable changes (1)
  • client/iface/mocks/iface/mocks/filter.go
✅ Files skipped from review due to trivial changes (2)
  • client/internal/dnsfwd/forwarder.go
  • client/internal/dns/tcpstack.go
🚧 Files skipped from review as they are similar to previous changes (12)
  • client/internal/dns/mock_server.go
  • client/firewall/uspfilter/rule.go
  • client/internal/dns/upstream_android.go
  • client/internal/dns/handler_chain.go
  • client/internal/dns/server_test.go
  • client/firewall/nftables/manager_linux.go
  • client/internal/dns/service.go
  • client/internal/dns/response_writer.go
  • client/iface/device/device_filter.go
  • client/internal/dns/upstream_test.go
  • client/internal/dns/service_memory.go
  • client/iface/mocks/filter.go

Comment thread client/firewall/manager/firewall.go
Comment thread client/firewall/uspfilter/filter.go
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@client/internal/dns/server.go`:
- Around line 411-415: The defer currently swallows s.service.Stop() errors by
only logging them; change disableDNS() so the deferred stop call returns its
error and is joined into the function's returned error. Replace the logging-only
defer that calls s.service.Stop() with a deferred closure that captures the
named return error (e.g., err), calls stopErr := s.service.Stop(), and if
stopErr != nil merges it into the returned error (use errors.Join(err, stopErr)
if available or set err = fmt.Errorf("%w; %v", err, stopErr) / fmt.Errorf("%v:
%v", err, stopErr) as appropriate). Keep the reference to s.service.Stop() and
ensure disableDNS() has a named error return so the defer can modify it.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6370a19d-2abc-44c0-944c-b7bcdf390e78

📥 Commits

Reviewing files that changed from the base of the PR and between 4f31445 and 45a7839.

📒 Files selected for processing (3)
  • client/internal/dns/server.go
  • client/internal/dns/service_listener.go
  • client/internal/dns/tcpstack.go
🚧 Files skipped from review as they are similar to previous changes (2)
  • client/internal/dns/service_listener.go
  • client/internal/dns/tcpstack.go

Comment thread client/internal/dns/server.go
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
client/internal/dns/server.go (1)

410-430: ⚠️ Potential issue | 🟠 Major

disableDNS() still drops some teardown failures.

The named return now captures s.service.Stop(), but restoreHostDNS() and DeleteState() are still logged and discarded. Callers can still observe shutdown as successful after partial cleanup.

💡 Proposed fix
 func (s *DefaultServer) disableDNS() (retErr error) {
 	defer func() {
 		if err := s.service.Stop(); err != nil {
 			retErr = errors.Join(retErr, fmt.Errorf("stop DNS service: %w", err))
 		}
 	}()
@@
-	if err := s.hostManager.restoreHostDNS(); err != nil {
-		log.Errorf("failed to restore host DNS settings: %v", err)
-	} else if err := s.stateManager.DeleteState(&ShutdownState{}); err != nil {
-		log.Errorf("failed to delete shutdown dns state: %v", err)
+	if err := s.hostManager.restoreHostDNS(); err != nil {
+		retErr = errors.Join(retErr, fmt.Errorf("restore host DNS settings: %w", err))
+	} else if err := s.stateManager.DeleteState(&ShutdownState{}); err != nil {
+		retErr = errors.Join(retErr, fmt.Errorf("delete shutdown dns state: %w", err))
 	}
@@
-	return nil
+	return retErr
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/internal/dns/server.go` around lines 410 - 430, The disableDNS
function currently only captures s.service.Stop() into the named return but logs
and discards errors from s.hostManager.restoreHostDNS() and
s.stateManager.DeleteState(&ShutdownState{}); update disableDNS so both
restoreHostDNS() and DeleteState(...) errors are appended to the named return
(retErr) using errors.Join (or fmt.Errorf wrapping) instead of just log.Errorf,
preserving the existing log messages if desired (e.g., log.Errorf(...); retErr =
errors.Join(retErr, fmt.Errorf("restore host DNS: %w", err))). Ensure the defer
that joins s.service.Stop() remains and that all three error sources
(restoreHostDNS, DeleteState, and service.Stop) are combined into retErr before
returning.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@client/internal/dns/server.go`:
- Around line 410-430: The disableDNS function currently only captures
s.service.Stop() into the named return but logs and discards errors from
s.hostManager.restoreHostDNS() and s.stateManager.DeleteState(&ShutdownState{});
update disableDNS so both restoreHostDNS() and DeleteState(...) errors are
appended to the named return (retErr) using errors.Join (or fmt.Errorf wrapping)
instead of just log.Errorf, preserving the existing log messages if desired
(e.g., log.Errorf(...); retErr = errors.Join(retErr, fmt.Errorf("restore host
DNS: %w", err))). Ensure the defer that joins s.service.Stop() remains and that
all three error sources (restoreHostDNS, DeleteState, and service.Stop) are
combined into retErr before returning.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ca089fcf-9624-4759-ad09-dbee6cc0aaf6

📥 Commits

Reviewing files that changed from the base of the PR and between 45a7839 and 36024a1.

📒 Files selected for processing (3)
  • client/firewall/manager/firewall.go
  • client/firewall/uspfilter/filter.go
  • client/internal/dns/server.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • client/firewall/manager/firewall.go

pappz
pappz previously approved these changes Apr 2, 2026
@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented Apr 2, 2026

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
client/internal/dns/tcpstack.go (2)

331-358: Consider validating message size before encoding length prefix.

If len(data) exceeds 65535, the 2-byte big-endian encoding will silently overflow/wrap, producing an incorrect length prefix and corrupting the response. While DNS responses rarely approach this size, a defensive check would improve robustness.

🛡️ Optional: add bounds check
 func (w *tcpResponseWriter) WriteMsg(msg *dns.Msg) error {
 	data, err := msg.Pack()
 	if err != nil {
 		return fmt.Errorf("pack: %w", err)
 	}
+	if len(data) > 65535 {
+		return fmt.Errorf("message too large for TCP DNS: %d bytes", len(data))
+	}
 
 	// DNS TCP: 2-byte length prefix + message
 	buf := make([]byte, 2+len(data))
 func (w *tcpResponseWriter) Write(data []byte) (int, error) {
+	if len(data) > 65535 {
+		return 0, fmt.Errorf("message too large for TCP DNS: %d bytes", len(data))
+	}
 	buf := make([]byte, 2+len(data))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/internal/dns/tcpstack.go` around lines 331 - 358, The WriteMsg and
Write methods of tcpResponseWriter must validate that len(data) fits into the
2-byte length prefix (<= 65535) before creating the buffer; add a pre-check in
tcpResponseWriter.WriteMsg and tcpResponseWriter.Write that returns a clear
error (e.g., "message too large for TCP DNS length prefix") when len(data) >
65535 to avoid silent overflow/corruption, and only proceed to allocate the
2+len(data) buffer and write when the size is valid.

433-444: IPv6 transport offset ignores extension headers (acceptable for logging).

srcIPv6 returns a fixed offset of header.IPv6MinimumSize (40), which is incorrect when extension headers are present. Since this is only used for debug logging and IPv6 support isn't complete yet (per the TODO at line 107), this is acceptable for now.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/internal/dns/tcpstack.go` around lines 433 - 444, srcIPv6 currently
returns a fixed transport offset of header.IPv6MinimumSize (40) which ignores
IPv6 extension headers; either implement correct parsing of NextHeader/extension
headers to compute the true transport offset or explicitly document the
limitation so future maintainers know it’s intentional. Update the srcIPv6
function to either (A) parse extension headers (walk NextHeader, read each
extension header length and advance the offset before returning) so the returned
int is the real transport header offset, or (B) keep the current return value
but add a clear comment/TODO in srcIPv6 explaining that extension headers are
ignored for now and reference header.IPv6MinimumSize and the TODO at the top of
the file so this is revisited when IPv6 support is completed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@client/internal/dns/tcpstack.go`:
- Around line 331-358: The WriteMsg and Write methods of tcpResponseWriter must
validate that len(data) fits into the 2-byte length prefix (<= 65535) before
creating the buffer; add a pre-check in tcpResponseWriter.WriteMsg and
tcpResponseWriter.Write that returns a clear error (e.g., "message too large for
TCP DNS length prefix") when len(data) > 65535 to avoid silent
overflow/corruption, and only proceed to allocate the 2+len(data) buffer and
write when the size is valid.
- Around line 433-444: srcIPv6 currently returns a fixed transport offset of
header.IPv6MinimumSize (40) which ignores IPv6 extension headers; either
implement correct parsing of NextHeader/extension headers to compute the true
transport offset or explicitly document the limitation so future maintainers
know it’s intentional. Update the srcIPv6 function to either (A) parse extension
headers (walk NextHeader, read each extension header length and advance the
offset before returning) so the returned int is the real transport header
offset, or (B) keep the current return value but add a clear comment/TODO in
srcIPv6 explaining that extension headers are ignored for now and reference
header.IPv6MinimumSize and the TODO at the top of the file so this is revisited
when IPv6 support is completed.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1aaf5751-388d-4472-8961-f65cf197e1c6

📥 Commits

Reviewing files that changed from the base of the PR and between 36024a1 and c5932ef.

📒 Files selected for processing (6)
  • client/internal/dns/handler_chain.go
  • client/internal/dns/response_writer.go
  • client/internal/dns/service_memory.go
  • client/internal/dns/tcpstack.go
  • client/internal/dns/upstream.go
  • client/internal/dns/upstream_test.go
✅ Files skipped from review due to trivial changes (2)
  • client/internal/dns/response_writer.go
  • client/internal/dns/upstream_test.go
🚧 Files skipped from review as they are similar to previous changes (3)
  • client/internal/dns/handler_chain.go
  • client/internal/dns/upstream.go
  • client/internal/dns/service_memory.go

@lixmal lixmal merged commit cb73b94 into main Apr 8, 2026
42 checks passed
@lixmal lixmal deleted the local-dns-tcp-listener branch April 8, 2026 05:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants