http: support custom socket-producing agents (tunnel) in node:http client#31798
http: support custom socket-producing agents (tunnel) in node:http client#31798robobun wants to merge 2 commits into
Conversation
…ient node:http / node:https dispatched every request through native fetch and never invoked a user agent's addRequest/createSocket/onSocket hooks. Agents like the `tunnel` package — which establish an HTTP CONNECT tunnel by overriding addRequest and handing back the tunneled socket via req.onSocket() — were silently ignored, so Bun connected directly to the target and bypassed the proxy entirely. Detect when an agent overrides addRequest (and exposes no `.proxy` href, so https-proxy-agent / http-proxy-agent keep using the native fetch proxy path) and drive the request over the socket the agent produces, speaking HTTP/1.x with the existing HTTPParser the way node:http does (tickOnSocket / socketOnData / parserOnIncomingClient). Adds ClientRequest.onSocket() and IncomingMessage._addHeaderLines/_addHeaderLine for the socket-driven path. Builds on the CONNECT-method support from #31574, which the tunnel package's inner CONNECT request relies on.
|
Updated 2:20 AM PT - Jun 4th, 2026
❌ @autofix-ci[bot], your commit 2ffde14 has 2 failures in
🧪 To try this PR locally: bunx bun-pr 31798That installs a local version of the PR into your bun-31798 --bun |
|
Caution Review failedPull request was closed or merged during review No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (6)
WalkthroughThis PR adds HTTP CONNECT proxy support to Bun's Node.js HTTP client. The implementation enables custom agents (like tunnel) to supply their own sockets for proxied connections, includes full CONNECT request/response negotiation, and is validated with comprehensive tests using the tunnel library. ChangesHTTP CONNECT Proxy Support
Possibly related issues
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Warning Review ran into problems🔥 ProblemsStopped waiting for pipeline failures after 30000ms. One of your pipelines takes longer than our 30000ms fetch window to run, so review may not consider pipeline-failure results for inline comments if any failures occurred after the fetch window. Increase the timeout if you want to wait longer or run a Comment |
|
This PR may be a duplicate of:
🤖 Generated with Claude Code |
|
Closing in favor of #31587, which rewrites the whole This PR was a narrower, targeted layer on top of #31574's CONNECT support; it's redundant next to the full port. I've offered the |
| if (!hasContentLength && !hasTransferEncoding) { | ||
| const chunks = req[kBodyChunks]; | ||
| let bodyLength = 0; | ||
| if (chunks) for (let i = 0; i < chunks.length; i++) bodyLength += chunks[i].length; | ||
| if (bodyLength > 0) head += `Content-Length: ${bodyLength}${CRLF}`; |
There was a problem hiding this comment.
🔴 The auto-computed Content-Length sums chunks[i].length, but body chunks buffered here can be JS strings (write_() keeps UTF-8 strings as-is), and String.length counts UTF-16 code units rather than the UTF-8 bytes that socket.write(chunk) will emit. A POST through a tunnel agent with a non-ASCII body (e.g. req.write('héllo')) therefore sends Content-Length: 5 followed by 6 bytes, producing a malformed request — use Buffer.byteLength(chunks[i]) instead.
Extended reasoning...
What the bug is
In the new socket-driven custom-agent path, buildRequestHead() auto-fills Content-Length when the user hasn't set one and the body has been buffered before the socket arrived. It computes the length as:
if (chunks) for (let i = 0; i < chunks.length; i++) bodyLength += chunks[i].length;But entries in req[kBodyChunks] can be JavaScript strings, not just Buffers. For a string, .length is the number of UTF-16 code units, which for any non-ASCII content is smaller than the number of bytes socket.write(chunk) will actually emit (since socket.write defaults to UTF-8 encoding when no encoding is passed).
How a string ends up in kBodyChunks
In write_():
const canSkipReEncodingData =
(typeof chunk === "string" && (encoding === "utf-8" || encoding === "utf8" || !encoding)) ||
($isTypedArrayView(chunk) && (...));
if (!canSkipReEncodingData) {
chunk = Buffer.from(chunk, encoding);
}So a UTF-8 string chunk is left as a string. Then pushChunk() for the custom-socket path does:
if (this[kCustomSocketPath]) {
if (this[kHandle]) { ... } else {
this[kBodyChunks].push(chunk); // raw string buffered
}
return;
}Since agent.addRequest() initiates an async CONNECT and onSocket() fires on a later tick, this[kHandle] is not yet set when the user calls req.write(...) / req.end(...) synchronously, so the string is buffered. When the socket arrives, _flushToSocket() calls buildRequestHead(this) (which sums .length) and then socket.write(chunks[i]) with no encoding argument — i.e. UTF-8.
Step-by-step proof
const agent = tunnel.httpOverHttp({ proxy: ... })const req = http.request({ method: 'POST', host, port, path: '/', agent })→kCustomSocketPath = true,agent.addRequest()kicks off an async CONNECT.req.write('héllo')→canSkipReEncodingDatais true (string, default encoding), chunk stays'héllo';kHandleis unset, sokBodyChunks = ['héllo'].req.end()→finished = true;kHandlestill unset, so nothing is flushed yet.- Tunnel completes, agent calls
req.onSocket(socket)→onSocketNT→tickOnSocket→req._flushToSocket(). buildRequestHead()sees no userContent-Length/Transfer-Encoding, computesbodyLength = 'héllo'.length === 5, and writesContent-Length: 5\r\n._flushToSocket()then callssocket.write('héllo')which emits the UTF-8 bytes68 c3 a9 6c 6c 6f— 6 bytes.- The server receives a request whose declared body length is 5 but whose body is 6 bytes.
Why nothing prevents it
The only re-encoding happens in write_(), and it deliberately skips UTF-8 strings. buildRequestHead() has no special-casing for string chunks, and _flushToSocket() writes them with the default (UTF-8) encoding. Node.js's OutgoingMessage uses Buffer.byteLength() for exactly this reason.
Impact
The HTTP request is malformed: the server reads 5 bytes of body and then sees a stray byte. Depending on the server it will truncate the body (data corruption), reject the request with a 400, or — on a keep-alive connection — try to parse the trailing byte as the start of the next request line (request smuggling / desync risk). This affects any POST/PUT through a custom socket-producing agent (e.g. tunnel) whose body contains non-ASCII characters and doesn't set an explicit Content-Length or Transfer-Encoding.
Fix
Use byte length instead of .length:
if (chunks)
for (let i = 0; i < chunks.length; i++)
bodyLength += typeof chunks[i] === "string" ? Buffer.byteLength(chunks[i]) : chunks[i].length;(or unconditionally Buffer.byteLength(chunks[i]), which handles both strings and Buffers).
| _addHeaderLine(field, value, dest) { | ||
| const lower = field.toLowerCase(); | ||
| const existing = dest[lower]; | ||
|
|
||
| if (lower === "set-cookie") { | ||
| if (existing !== undefined) { | ||
| existing.push(value); | ||
| } else { | ||
| dest[lower] = [value]; | ||
| } | ||
| return; | ||
| } | ||
|
|
||
| if (existing === undefined) { | ||
| dest[lower] = value; |
There was a problem hiding this comment.
🟡 When the parser delivers a header whose lowercased name collides with an Object.prototype key (e.g. Constructor, toString, valueOf), dest[lower] reads the inherited function instead of undefined, so the first occurrence falls into the comma-join branch and res.headers.constructor becomes 'function Object() { [native code] }, value'. Node's _addHeaderLine avoids this by checking typeof dest[field] === 'string' before joining — mirroring that check (or using Object.hasOwn) fixes it without changing the headers object's prototype.
Extended reasoning...
What the bug is
IncomingMessage.prototype._addHeaderLine (added in this PR for the socket-driven client path) folds parser-delivered headers into this.headers, which is a plain {} (matching Node — Bun's own test at node-http.test.ts asserts req.headers.__proto__ === {}.__proto__). The function reads:
const existing = dest[lower];
...
if (existing === undefined) {
dest[lower] = value;
return;
}
...
default:
dest[lower] = existing + ", " + value;Because dest inherits from Object.prototype, dest['constructor'] is the Object function, dest['tostring'] is undefined but dest['toString'] would be a function — and crucially, after lowercasing, 'constructor', 'valueof', 'hasownproperty', 'isprototypeof', 'propertyisenumerable', 'tolocalestring' all resolve to inherited members rather than undefined. (Of those, only 'constructor' survives lowercasing exactly — but 'constructor' is the realistic one and is even covered by this PR's own CONNECT-path test.)
Code path
- A request goes through a custom socket-producing agent (e.g.
tunnel), sothis[kCustomSocketPath] = trueandtickOnSocketwires up anHTTPParserin RESPONSE mode. parserOnHeadersComplete(in_http_common.ts) constructsnew IncomingMessage(socket), which takes the new branch in_http_incoming.tsthat setsthis.headers = {}.- The parser calls
incoming._addHeaderLines(headers, n)→_addHeaderLine('Constructor', 'own', this.headers). lower = 'constructor';existing = dest['constructor']isObject(the inherited constructor), notundefined.'constructor'is not'set-cookie'and is not in the singleton switch list, so the default branch runs:dest['constructor'] = Object + ', ' + 'own'→'function Object() { [native code] }, own'.
Why existing code doesn't prevent it
The function checks existing === undefined to decide "first occurrence". On a regular-prototype object that test is false for any key that shadows an Object.prototype member. The CONNECT fast path in _http_client.ts (also added in this PR) explicitly avoids this by using { __proto__: null } for parsedHeaders and even has a test asserting headers['constructor'] === 'own' — but _addHeaderLine operates on this.headers, which intentionally keeps Object.prototype for Node compatibility, so the same === undefined guard is insufficient there.
Step-by-step proof
With a tunneled response of:
HTTP/1.1 200 OK\r\nConstructor: own\r\n\r\n
field = 'Constructor',value = 'own',dest = {}.lower = 'constructor'.existing = dest['constructor']→[Function: Object](inherited).lower !== 'set-cookie';existing !== undefined;switchhas no'constructor'case → default:dest['constructor'] = String(Object) + ', own'.- Bun result:
res.headers.constructor === 'function Object() { [native code] }, own'. - Node result: Node's
_addHeaderLinechecksif (typeof dest[field] === 'string')before joining;typeof Object === 'function', so it takes the "first occurrence" branch and assignsdest['constructor'] = 'own'.
Note that two parts of the original report were inaccurate and are not divergences: (a) Node's IncomingMessage.headers is a regular object, not null-prototype; (b) a __proto__: x header is silently dropped in both Node and Bun (assigning a string via the __proto__ setter no-ops). Only the constructor-style collision actually differs.
Impact
A response header literally named Constructor (or another Object.prototype key that survives lowercasing) is corrupted on the custom-agent / tunnel path, diverging from Node. This is an extreme edge case — such header names essentially never appear in real HTTP — hence nit.
Fix
Match Node's guard — replace the existing === undefined test with a string-type check (or own-property check), e.g.:
if (typeof existing !== 'string') {
dest[lower] = lower === 'set-cookie' ? [value] : value;
return;
}(or keep the current structure and just swap existing === undefined → typeof existing !== 'string' in the non-set-cookie branch). Do not make this.headers null-prototype — that would break the existing req.headers.__proto__ === {}.__proto__ test and diverge from Node.
| if (!hasHost) { | ||
| const host = req[kHost]; | ||
| head += `Host: ${req[kUseDefaultPort] ? host : `${host}:${req[kPort]}`}${CRLF}`; | ||
| } |
There was a problem hiding this comment.
🟡 When no Host header is set, buildRequestHead() emits Host: ${host}:${port} without bracketing IPv6 literals, so a request like http.request({ host: '::1', port: 8080, agent: tunnelAgent }) sends Host: ::1:8080 instead of Host: [::1]:8080. Node brackets IPv6 in the default Host header (see the commented-out reference block in this file), and this PR's own startConnect() path does it via isIPv6(hostHeader) — the same check should be applied here.
Extended reasoning...
What the bug is
buildRequestHead() serializes the request line and headers that are written over a custom agent's socket (the new socket-driven path). When the caller has not set a Host header, it synthesizes one:
if (!hasHost) {
const host = req[kHost];
head += `Host: ${req[kUseDefaultPort] ? host : `${host}:${req[kPort]}`}${CRLF}`;
}req[kHost] is the raw options.host / options.hostname value. When that is an IPv6 literal such as '::1' or 'fe80::1', this produces Host: ::1:8080. RFC 7230 §5.4 / RFC 3986 §3.2.2 require IPv6 literals in an authority to be enclosed in square brackets so the port colon is unambiguous: the correct header is Host: [::1]:8080 (or Host: [::1] for the default port).
Code path that triggers it
This is reachable on the new custom-agent path introduced by this PR. The constructor sets this[kCustomSocketPath] = true when the agent overrides addRequest and exposes no .proxy href, and stores the unmodified host in this[kHost]. When the agent later calls req.onSocket(socket), onSocketNT → tickOnSocket → req._flushToSocket() runs socket.write(buildRequestHead(this), 'latin1'), hitting the !hasHost branch above whenever the user did not set an explicit Host header.
Why nothing else prevents it
Bun's normal client path goes through native fetch, which constructs the URL itself (and brackets IPv6 in getURL). The startConnect() path added in this same PR explicitly handles it:
let hostHeader = this[kHost];
if (isIPv6(hostHeader)) {
hostHeader = `[${hostHeader}]`;
}But buildRequestHead() — the third place that emits a default Host header — has no equivalent check. The commented-out Node-derived block in this file ("For the Host header, ensure that IPv6 addresses are enclosed in square brackets…") confirms Node's _storeHeader does bracket IPv6, so this is also a Node-compat divergence.
Step-by-step proof
- User calls
http.request({ host: '::1', port: 8080, agent: tunnel.httpOverHttp({ proxy: {...} }) })with noheaders.host. - The constructor:
this[kHost] = '::1',this[kPort] = 8080,this[kUseDefaultPort] = false(8080 ≠ 80).tunnel's agent overridesaddRequestand has no.proxyhref, sothis[kCustomSocketPath] = trueandagent.addRequest(this, opts)is invoked. tunnelestablishes a CONNECT tunnel to the proxy and callsreq.onSocket(tunneledSocket).onSocketNT → _flushToSocket → buildRequestHead(this)runs.req.getHeaders()has nohost, sohasHoststaysfalse.- The default-Host branch executes:
host = '::1',req[kUseDefaultPort]is false, so it emitsHost: ::1:8080\r\n. - Wire bytes:
GET / HTTP/1.1\r\nHost: ::1:8080\r\nConnection: close\r\n\r\n. A strict origin server (or anything that parses the authority per RFC 3986) may reject this with 400, and even with the default port the unbracketedHost: ::1diverges from Node's output.
Impact
Limited to the new custom-agent socket path with an IPv6-literal target host and no explicit Host header — uncommon, but a real spec violation and an internal inconsistency with both Node and startConnect() in this same PR.
Fix
Mirror startConnect():
if (!hasHost) {
let host = req[kHost];
if (isIPv6(host)) host = `[${host}]`;
head += `Host: ${req[kUseDefaultPort] ? host : `${host}:${req[kPort]}`}${CRLF}`;
}
What
Makes
node:http/node:httpshonor custom socket-producing agents — in particular thetunnelpackage, which establishes an HTTPCONNECTtunnel through a proxy.Fixes #31795.
Repro
proxy.mjs+client.mjsfrom the issue (atunnel.httpsOverHttp()agent pointed at a local proxy):CONNECT; Bun connected directly toexample.com:443and returned200/403, ignoring the agent.CONNECT example.com:443 HTTP/1.1to the proxy, and when the proxy replies502the request errors withtunneling socket could not be established, statusCode=502.Verified byte-for-byte against Node v24 (the
CONNECTline, headers, and the error all match).Cause
ClientRequestdispatched every request through the nativefetch(nodeHttpClient) and only ever readagent.proxyas an href (forhttps-proxy-agent/proxy-agents). A user agent'saddRequest/createSocket/onSockethooks were never invoked, sotunnel— which performs aCONNECTby overridingaddRequestand callingreq.onSocket(socket)with the tunneled socket — was bypassed entirely.Fix
In
src/js/node/_http_client.ts, detect when an agent overridesaddRequestand exposes no.proxyhref (sohttps-proxy-agent/http-proxy-agentkeep using the native fetch proxy path), and drive the request over the socket the agent produces instead offetch. This mirrors node:http's socket path:agent.addRequest(this, options)at the end of the constructor;ClientRequest.onSocket(socket, err)→tickOnSocket/socketOnData/parserOnIncomingClient, wiring anHTTPParser(RESPONSE mode) and the socket listeners, then writing the request line + headers + body over the socket;IncomingMessage._addHeaderLines/_addHeaderLineso the parser can build the response headers for the socket-driven path.tunnel's innerCONNECTrequest (to the proxy) relies on the HTTPCONNECTmethod support added in #31574, whose commits this branch includes. If #31574 lands first I'll rebase this down to just the custom-agent layer.Tests
test/js/node/http/node-http.test.ts— added to the existingnode:httpsuite (runs in both Bun and Node):tunnel.httpsOverHttp()sends aCONNECTto the proxy and surfaces the proxy's502(the exact issue scenario);tunnel.httpOverHttp()tunnels a request end-to-end through a realCONNECTproxy and receives the target's response, asserting the proxy'sconnecthandler actually ran (so a direct connection can't pass the test).Both fail on the released binary and pass with this change; behavior was compared directly against Node v24.