http: enable TCP keepalive on fetch sockets#30627
Conversation
Bun's fetch() HTTP client does not set SO_KEEPALIVE on its sockets. When a connection becomes half-open — the peer closed but the FIN/RST never reached us (NAT timeout, wifi/cellular handoff, middlebox state eviction, VPN disconnect) — the kernel has no way to discover it without keepalive probes. A streaming reader.read() on such a socket blocks until an application-level timeout, if any. Node's fetch (undici) sets SO_KEEPALIVE with TCP_KEEPIDLE=60s, so a half-open connection is detected at ~70s (60s idle + 10 probes × 1s). This change makes Bun's fetch do the same, via the existing socket.setKeepAlive() → bsd_socket_keepalive() plumbing that already sets TCP_KEEPINTVL=1 and TCP_KEEPCNT=10. Placed in onOpen() alongside the existing client.setTimeout(socket) (#30376) — socket-level, fires once per connection, inherited by keep-alive-reused requests. The test reads /proc/self/net/tcp to verify the kernel's keepalive timer is armed on the client socket (timer_active=02). Linux-only.
|
Updated 9:57 AM PT - May 13th, 2026
❌ @autofix-ci[bot], your commit d4fb0d0 has 2 failures in
🧪 To try this PR locally: bunx bun-pr 30627That installs a local version of the PR into your bun-30627 --bun |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (1)
WalkthroughTCP keepalive is enabled on fetch client sockets with a 60-second idle timeout to detect stalled connections. A socket configuration change in ChangesTCP Keepalive for Fetch
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@test/js/web/fetch/fetch-tcp-keepalive.test.ts`:
- Around line 63-69: The assertion is too broad: instead of checking timerActive
is not "00", update the test to assert that timerActive specifically indicates
the keepalive timer code "02" (the PR notes "02:<jiffies>"). Locate the
timerActive check in fetch-tcp-keepalive.test.ts and replace the negative check
using timerActive with a positive check that timerActive begins with or matches
the "02" timer code (e.g., startsWith "02" or a regex matching the "02" prefix)
so the test only passes for the keepalive timer.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: a0d3c677-476b-43fe-afce-edcd04cfab0e
📒 Files selected for processing (2)
src/http/http.zigtest/js/web/fetch/fetch-tcp-keepalive.test.ts
|
Found 1 issue this PR may fix:
🤖 Generated with Claude Code |
|
@robobun adopt |
|
✅ Adopted. Built locally, test passes with the fix and fails without it ( Pushed 6d5e33c:
CI status: the diff is green. The remaining red on build 54044 is pre-existing and unrelated:
Skipping a retrigger since the Windows 2019 timeout is systematic across all PR builds and would recur. Ready for review/merge. |
- Tighten assertion from `not.toBe("00")` to `toBe("02")` so only the
keepalive timer (sk_timer) satisfies it, not a retransmit (01) or
zero-window probe (04).
- Await response headers + first chunk instead of `sleep(50)` so the
socket is guaranteed ESTABLISHED and the outbound GET is ACKed before
reading /proc/self/net/tcp — avoids a race where a retransmit timer
would mask the keepalive timer.
- Cancel the reader instead of draining the full body.
### What does this PR do? Ports the TCP keepalive call from `src/http/http.zig` `onOpen` (added in #30627, gated in #30640) into the Rust `on_open` in `src/http/lib.rs`. The Rust HTTP client landed in #30412 from a branch cut before #30627 merged, so the `set_keepalive` call was never carried over. This is the cause of `test/js/web/fetch/fetch-tcp-keepalive.test.ts` failing on every Linux test job since #30412 merged. The test reads `/proc/self/net/tcp` and expects `timer_active=02` (keepalive armed) but gets `00` because the Rust client never calls `set_keepalive`. Build #54331 (and every Linux job since): 2 pass / 2 fail; build #54099 (#30640's PR, Zig build): 4 pass / 0 fail; build #54202 (#30412's PR): test file not in tree, so it never ran. The change is the same shape as the Zig reference: after `self.set_timeout(socket)`, guarded by `!self.flags.disable_keepalive` so `fetch(url, { keepalive: false })` and `node:http` non-keepalive agents skip it (matching undici's `buildConnector`). `set_keep_alive` already exists on `NewSocketHandler` (`src/uws_sys/socket.rs:466`) and the `disable_keepalive` flag is already wired through. ### How did you verify your code works? I have not built or run this — opening it directly per request so CI validates. The covering test is `test/js/web/fetch/fetch-tcp-keepalive.test.ts` (Linux-only), which is the test currently failing on every PR.
What does this PR do?
Enables TCP keepalive (
SO_KEEPALIVE+TCP_KEEPIDLE=60s) onfetch()client sockets.Without this, when a connection becomes half-open — the peer is gone but the FIN/RST never reached us (NAT timeout, wifi/cellular handoff, middlebox state eviction, VPN disconnect) — the kernel never discovers it. A streaming
reader.read()on such a socket blocks forever (or until an application-level timeout).Node's fetch (undici) sets
SO_KEEPALIVEwithTCP_KEEPIDLE=60s, so a half-open connection is detected at ~70s (60s idle + 10 probes × 1s). This makes Bun match that behavior via the existingsocket.setKeepAlive()→bsd_socket_keepalive()path, which already hardcodesTCP_KEEPINTVL=1andTCP_KEEPCNT=10.The call is placed in
onOpen()next toclient.setTimeout(socket)(#30376) — socket-level, fires once per connection, inherited by keep-alive-reused requests.How did you verify your code works?
Added
test/js/web/fetch/fetch-tcp-keepalive.test.ts(Linux-only) that:fetch()to it/proc/self/net/tcpand finds the client socket (ESTABLISHED, remote port = server port)00:00000000— i.e. the kernel'ssk_timer(keepalive) is armed (timer_active=02)Without this patch the timer field is
00; with it,02:<jiffies>.