Skip to content

[relay] Replace net.Conn with context-aware Conn interface#5770

Merged
pappz merged 2 commits intomainfrom
fix/relay-read-timeouts
Apr 8, 2026
Merged

[relay] Replace net.Conn with context-aware Conn interface#5770
pappz merged 2 commits intomainfrom
fix/relay-read-timeouts

Conversation

@pappz
Copy link
Copy Markdown
Collaborator

@pappz pappz commented Apr 1, 2026

Describe your changes

Replace net.Conn with context-aware Conn interface

Introduce a listener.Conn interface with context-based Read/Write methods, replacing net.Conn throughout the relay server. This enables proper timeout propagation (e.g. handshake timeout) without goroutine-based workarounds and removes unused LocalAddr/SetDeadline methods from WS and QUIC conns.

Issue ticket number and link

Stack

Checklist

  • Is it a bug fix
  • Is a typo/documentation fix
  • Is a feature enhancement
  • It is a refactor
  • Created tests that fail without the change (if possible)

By submitting this pull request, you confirm that you have read and agree to the terms of the Contributor License Agreement.

Documentation

Select exactly one:

  • I added/updated documentation for this change
  • Documentation is not needed for this change (explain why)

Docs PR URL (required if "docs added" is checked)

Paste the PR link from https://github.com/netbirdio/docs here:

https://github.com/netbirdio/docs/pull/__

Summary by CodeRabbit

  • Refactor

    • Unified connection abstraction across transports and made I/O context-aware for better cancellation and resource management.
    • Simplified QUIC and WebSocket implementations to reduce internal deadlines and local-address handling.
  • Bug Fixes

    • Added a 10s handshake timeout to prevent hung relay connections and improve reliability.

…transports

Introduce a listener.Conn interface with context-based Read/Write methods,
replacing net.Conn throughout the relay server. This enables proper timeout
propagation (e.g. handshake timeout) without goroutine-based workarounds
and removes unused LocalAddr/SetDeadline methods from WS and QUIC conns.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 1, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6abf7b41-00ec-44a5-a8db-c8f1a7105d82

📥 Commits

Reviewing files that changed from the base of the PR and between 4eb4c04 and 02b54f4.

📒 Files selected for processing (1)
  • relay/server/peer.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • relay/server/peer.go

📝 Walkthrough

Walkthrough

This PR replaces net.Conn usage with a new context-aware listener.Conn across relay, listener implementations (QUIC, WebSocket), handshake and peer code, threads context/timeouts through handshake and peer I/O, and adjusts listener APIs and server wiring accordingly. (47 words)

Changes

Cohort / File(s) Summary
Listener Abstraction
relay/server/listener/conn.go, relay/server/listener/listener.go
Adds exported listener.Conn interface (context-aware Read/Write, RemoteAddr, Close) and removes the previous public listener.Listener interface definition.
QUIC Transport
relay/server/listener/quic/conn.go, relay/server/listener/quic/listener.go
QUIC Conn Read/Write signatures now accept context.Context; removed internal context management, LocalAddr and deadline methods; Listen callback now uses relaylistener.Conn.
WebSocket Transport
relay/server/listener/ws/conn.go, relay/server/listener/ws/listener.go
NewConn simplified to accept only remote addr; Read/Write become context-aware; removed LocalAddr and deadline methods; listener acceptFn now uses relaylistener.Conn and removed local listener-address resolution.
Handshake & Relay Core
relay/server/handshake.go, relay/server/relay.go
Handshake I/O switched to listener.Conn with context-aware Read/Write, handshake operations use a per-accept timeout context (handshakeTimeout) and calls pass that context through receive/response. Relay exposes a local Listener interface and Accept now takes listener.Conn.
Peer & Server Integration
relay/server/peer.go, relay/server/server.go, combined/cmd/root.go
Peer stores listener.Conn, peer methods use context-aware I/O and a peer-scoped context; server stores local Listener type and RelayAccept() returns func(conn listener.Conn); root command updated to accept new conn type.

Sequence Diagram(s)

sequenceDiagram
    participant Client as External Client
    participant Listener as Listener (WS / QUIC)
    participant Relay as Relay
    participant HS as Handshake
    participant Peer as Peer

    Client->>Listener: Establish connection
    Listener->>Relay: Accept(conn listener.Conn)
    Relay->>HS: Start handshake (with ctx timeout)
    HS->>Listener: Read(ctx, buf)
    Listener-->>HS: Handshake request
    HS->>HS: Validate request
    HS->>Listener: Write(ctx, response)
    Listener-->>HS: Ack
    Relay->>Peer: NewPeer(conn listener.Conn)
    Peer->>Listener: Read(ctx, loop)
    Listener-->>Peer: Message bytes
    Peer->>Listener: Write(ctx, response)
    Listener-->>Peer: Ack
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • lixmal

Poem

🐰
I hopped through sockets, soft and spry,
Context in paw, timeouts held high.
QUIC and WS now share one tongue,
Listeners sing — the relay's sprung!
Hooray — abstraction's work is done.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 62.50% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title directly and clearly describes the main change: replacing net.Conn with a context-aware Conn interface throughout the relay server implementation.
Description check ✅ Passed The PR description covers the main change, provides rationale, includes checklist items, and selects a documentation option, though the issue ticket link is empty.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/relay-read-timeouts

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@pappz pappz changed the title [relay-server] Replace net.Conn with context-aware Conn interface [relay] Replace net.Conn with context-aware Conn interface Apr 1, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
relay/server/peer.go (1)

35-36: ctx and ctxCancel are nil until Work() is called.

The ctx and ctxCancel fields are not initialized in NewPeer but only later in Work() (line 62). If any code path accesses p.ctx before Work() is called, it will be nil, potentially causing a panic.

Looking at the call sites in relay.go, the flow is:

  1. NewPeer() creates the peer
  2. r.store.AddPeer(peer) adds it to the store
  3. r.notifier.PeerCameOnline(peer.ID()) notifies
  4. peer.Work() is started in a goroutine
  5. h.handshakeResponse(hsCtx) may call peer.Close() on failure

If Close() were to use p.ctx, it would panic. Currently Close() doesn't use p.ctx, so this is safe. However, this is a subtle invariant that could break with future changes.

Consider initializing context in NewPeer for defensive coding
 func NewPeer(metrics *metrics.Metrics, id messages.PeerID, conn listener.Conn, store *store.Store, notifier *store.PeerNotifier) *Peer {
+	ctx, ctxCancel := context.WithCancel(context.Background())
 	p := &Peer{
 		metrics:  metrics,
 		log:      log.WithField("peer_id", id.String()),
 		id:       id,
 		conn:     conn,
 		store:    store,
 		notifier: notifier,
+		ctx:      ctx,
+		ctxCancel: ctxCancel,
 	}
 
 	return p
 }

Then in Work(), you could either skip re-initialization or replace the context if needed.

Also applies to: 45-56

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@relay/server/peer.go` around lines 35 - 36, NewPeer currently leaves the peer
fields ctx and ctxCancel nil which is fragile; initialize them in NewPeer by
creating a background context and cancel func (e.g., ctx, cancel :=
context.WithCancel(context.Background()) and assign to p.ctx and p.ctxCancel) so
any early callers (including Close) can safely reference them, then update Work
to not assume ctx is nil (either skip reinitialization or derive a child context
from the existing p.ctx) and make Close defensively call p.ctxCancel only after
a nil-check to avoid double cancels.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@relay/server/listener/quic/conn.go`:
- Around line 34-38: Conn.Write currently ignores the provided context (breaking
the listener.Conn contract) and may send QUIC datagrams after cancellation;
modify Conn.Write to honor ctx by checking ctx.Done()/ctx.Err() before
attempting SendDatagram and by applying the same timeout/cancellation semantics
as the WS implementation (e.g., respect write timeout or use select on
ctx.Done() with a send deadline), then call c.session.SendDatagram only if the
context is still active and translate errors via c.remoteCloseErrHandling as
before.

In `@relay/server/listener/ws/conn.go`:
- Around line 55-60: The current Conn.Write implementation returns len(b) even
when c.Conn.Write fails; modify Conn.Write (the method on Conn that uses
writeTimeout and calls c.Conn.Write with websocket.MessageBinary) to check the
error from c.Conn.Write and return 0, err on failure, otherwise return len(b),
nil, keeping the existing context timeout/cancel logic intact so behavior
matches relay/server/listener/quic/conn.go.

In `@relay/server/peer.go`:
- Around line 226-231: The crash occurs because dp.ctx can be nil when another
peer calls dp.Write (which calls context.WithTimeout) before peer.Work runs; fix
by ensuring the peer's context is created before the peer is added to the store
— initialize the context in the Peer constructor (so dp.ctx is non-nil) or,
alternatively, stop starting peer.Work in a goroutine and call peer.Work
synchronously immediately after creating the Peer so dp.ctx is set before
Add/Store; update the code paths that create Peers (where you currently add the
peer to the store) to either (a) set dp.ctx inside NewPeer or (b) call
peer.Work() inline instead of go peer.Work().

---

Nitpick comments:
In `@relay/server/peer.go`:
- Around line 35-36: NewPeer currently leaves the peer fields ctx and ctxCancel
nil which is fragile; initialize them in NewPeer by creating a background
context and cancel func (e.g., ctx, cancel :=
context.WithCancel(context.Background()) and assign to p.ctx and p.ctxCancel) so
any early callers (including Close) can safely reference them, then update Work
to not assume ctx is nil (either skip reinitialization or derive a child context
from the existing p.ctx) and make Close defensively call p.ctxCancel only after
a nil-check to avoid double cancels.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6a719b5b-d5a3-4140-81b0-1df1df289b49

📥 Commits

Reviewing files that changed from the base of the PR and between aaf813f and 4eb4c04.

📒 Files selected for processing (11)
  • combined/cmd/root.go
  • relay/server/handshake.go
  • relay/server/listener/conn.go
  • relay/server/listener/listener.go
  • relay/server/listener/quic/conn.go
  • relay/server/listener/quic/listener.go
  • relay/server/listener/ws/conn.go
  • relay/server/listener/ws/listener.go
  • relay/server/peer.go
  • relay/server/relay.go
  • relay/server/server.go
💤 Files with no reviewable changes (1)
  • relay/server/listener/listener.go

Comment thread relay/server/listener/quic/conn.go
Comment thread relay/server/listener/ws/conn.go
Comment thread relay/server/peer.go
Integrate context creation (`context.WithCancel`) directly in `NewPeer` and remove redundant initialization in `Work`. Add `ctxCancel` calls to ensure context is properly canceled during `Close` operations.
@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented Apr 1, 2026

@pappz pappz merged commit 96806bf into main Apr 8, 2026
41 of 42 checks passed
@pappz pappz deleted the fix/relay-read-timeouts branch April 8, 2026 07:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants