Speed up FormData multipart serialization#31379
Conversation
Blob::from_dom_form_data copied every field name, value, filename,
content type and in-memory blob body into the StringJoiner, and built
the boundary and content-type strings through core::fmt. Now:
- boundary: single-pass hex encode (bun_core::fmt::bytes_to_hex_lower)
instead of 16 write!("{:02x}") calls through core::fmt
- names/values/filenames: borrowed into the joiner when already UTF-8
(the DOMFormData outlives the join), ownership transferred when a
UTF-16/Latin-1 conversion already allocated, copied only when
escaping is required
- file part content types and in-memory blob bytes: borrowed instead of
cloned (removes an O(size) copy when serializing large blobs)
- "multipart/form-data; boundary=...": one exact-size allocation
- joiner node list reserved up front via the new StringJoiner::reserve
Serialized output is byte-for-byte unchanged.
|
Updated 5:03 AM PT - May 25th, 2026
❌ @robobun, your commit 75a3ba0 has 1 failures in
🧪 To try this PR locally: bunx bun-pr 31379That installs a local version of the PR into your bun-31379 --bun |
|
Warning Review limit reached
Your plan includes 5 reviews of capacity. Refill in 15 minutes and 47 seconds. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more review capacity refills, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than trial, open-source, and free plans. In all cases, review capacity refills continuously over time. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (1)
WalkthroughThis PR reduces allocations and tightens ownership in FormData multipart serialization: StringJoiner gains reserve(), multipart boundary and Content-Type are built without formatting, escaping is centralized in a push helper, blob bodies are emitted as borrowed slices, and tests verify wire-level format and round-trips. ChangesFormData multipart serialization optimization
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/runtime/webcore/Blob.rs`:
- Around line 3958-3974: push_string_slice currently uses matches!(slice, ...)
which moves the ZigStringSlice value and makes subsequent calls like
slice.into_vec() or slice.slice() invalid; change the branch tests to match on a
reference (e.g., if let ZigStringSlice::Owned(_) = &slice { ... } else if let
ZigStringSlice::Static(..) = &slice { ... } else { ... }) so you only borrow for
the pattern checks and still can consume slice with slice.into_vec() in the
Owned branch; keep the same calls to
joiner.push_owned(slice.into_vec().into_boxed_slice()),
joiner.push_static(slice.slice()), and joiner.push_cloned(slice.slice()) and
preserve the comment about WTF-backed slices.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: f55100ee-8085-4bfe-9167-b0b7f6c2623d
📒 Files selected for processing (3)
src/bun_core/string/StringJoiner.rssrc/runtime/webcore/Blob.rstest/js/web/html/FormData-multipart-serialization.test.ts
There was a problem hiding this comment.
I didn't find any bugs and the lifetime reasoning checks out against DOMFormData__forEach on the C++ side, but since this swaps owned copies for raw-pointer borrows held across the FFI iteration (names/filenames/content-type/shared_view()) it'd be good to have a human who knows the WTF::String / Blob ownership model confirm the borrow contract.
Extended reasoning...
Overview
This PR optimizes Blob::from_dom_form_data (the FormData → multipart serializer) by eliminating per-piece heap copies: field names/values/filenames, blob content-types, and in-memory blob bytes are now pushed into the StringJoiner as borrowed push_static slices instead of push_cloned/push_owned copies, the boundary is hex-encoded in one pass via bytes_to_hex_lower, the content-type header is built with one exact-size allocation, and a new trivial StringJoiner::reserve presizes the node list. escape_form_data_name now returns Option so the no-escape fast path avoids allocation. A new test file locks the wire format and round-trips a 64 KB binary blob.
Security risks
No new attack surface — output is byte-for-byte identical and the WHATWG %22/%0D/%0A name/filename escaping is preserved (and now covered by an exact-bytes test). The relevant risk is memory safety: push_static stores a RawSlice<u8> raw pointer with no compile-time lifetime, so correctness depends entirely on the borrowed bytes outliving joiner.done(). I traced this through DOMFormData__forEach in src/jsc/bindings/DOMFormData.cpp: toZigString(item.name) / toZigString(*value) / toZigString(blob->fileName()) produce views into WTF::StringImpl buffers ref-held by form->m_items, and blob->impl()'s content-type/store are held by the same RefPtr<Blob> entry. form_data is &mut DOMFormData for the whole function and isn't mutated between DOMFormData__forEach and joiner.done() at line ~958, so the contract holds — and it's the same contract the Zig implementation used. ASAN reportedly passes.
Level of scrutiny
Medium-high. The mechanical pieces (reserve, hex encoding, content-type Vec build, Option-returning escape) are straightforward and obviously correct. But the core change replaces defensive copies with raw-pointer borrows across an FFI callback boundary in a hot, user-facing path (new Response(formData)). The ZigStringSlice::Static variant name is a bit of a misnomer here (it's "borrowed, caller-guarantees-lifetime", not 'static), and the safety hinges on C++-side invariants (WTF::String refcounting, fileName() sharing the blob's StringImpl) that aren't visible from the Rust diff. A maintainer familiar with the WTF/Blob ownership model should sign off on that specific assumption rather than have it auto-approved.
Other factors
The PR is exceptionally well-documented (every borrow site has a comment citing the Zig precedent), adds targeted tests including a 64 KB binary round-trip that would trip ASAN on a UAF, and the existing FormData/body/blob suites pass. The bug-hunting pass found nothing. I'm deferring purely on the "raw-pointer lifetime change in production runtime code" axis, not because I see a problem.
There was a problem hiding this comment.
I didn't find any bugs and the lifetime reasoning checks out against DOMFormData__forEach on the C++ side, but since this swaps owned copies for raw-pointer borrows held across the FFI iteration (names/filenames/content-type/shared_view()) it'd be good to have a human who knows the WTF::String / Blob ownership model confirm the borrow contract.
Extended reasoning...
Overview
This PR optimizes Blob::from_dom_form_data (the FormData → multipart serializer) by eliminating per-piece heap copies: field names/values/filenames, blob content-types, and in-memory blob bytes are now pushed into the StringJoiner as borrowed push_static slices instead of push_cloned/push_owned copies, the boundary is hex-encoded in one pass via bytes_to_hex_lower, the content-type header is built with one exact-size allocation, and a new trivial StringJoiner::reserve presizes the node list. escape_form_data_name now returns Option so the no-escape fast path avoids allocation. A new test file locks the wire format and round-trips a 64 KB binary blob.
Security risks
No new attack surface — output is byte-for-byte identical and the WHATWG %22/%0D/%0A name/filename escaping is preserved (and now covered by an exact-bytes test). The relevant risk is memory safety: push_static stores a RawSlice<u8> raw pointer with no compile-time lifetime, so correctness depends entirely on the borrowed bytes outliving joiner.done(). I traced this through DOMFormData__forEach in src/jsc/bindings/DOMFormData.cpp: toZigString(item.name) / toZigString(*value) / toZigString(blob->fileName()) produce views into WTF::StringImpl buffers ref-held by form->m_items, and blob->impl()'s content-type/store are held by the same RefPtr<Blob> entry. form_data is &mut DOMFormData for the whole function and isn't mutated between DOMFormData__forEach and joiner.done() at line ~958, so the contract holds — and it's the same contract the Zig implementation used. ASAN reportedly passes.
Level of scrutiny
Medium-high. The mechanical pieces (reserve, hex encoding, content-type Vec build, Option-returning escape) are straightforward and obviously correct. But the core change replaces defensive copies with raw-pointer borrows across an FFI callback boundary in a hot, user-facing path (new Response(formData)). The ZigStringSlice::Static variant name is a bit of a misnomer here (it's "borrowed, caller-guarantees-lifetime", not 'static), and the safety hinges on C++-side invariants (WTF::String refcounting, fileName() sharing the blob's StringImpl) that aren't visible from the Rust diff. A maintainer familiar with the WTF/Blob ownership model should sign off on that specific assumption rather than have it auto-approved.
Other factors
The PR is exceptionally well-documented (every borrow site has a comment citing the Zig precedent), adds targeted tests including a 64 KB binary round-trip that would trip ASAN on a UAF, and the existing FormData/body/blob suites pass. The bug-hunting pass found nothing. I'm deferring purely on the "raw-pointer lifetime change in production runtime code" axis, not because I see a problem.
Serializing an in-memory blob entry now borrows its bytes, so the only blob-sized allocation in new Response(formData) is the joined multipart body (~1x the blob size). The test measures peak RSS (VmHWM) across the Response constructor in a child process and fails if it reaches ~2x, i.e. if a full copy of the blob bytes is ever reintroduced on this path. Linux-only; the existing format and round-trip tests are unchanged.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@test/js/web/html/FormData-multipart-serialization.test.ts`:
- Around line 145-147: Move the exitCode assertion to the end of the test: first
assert stdout/stderr contents (use expect(stderr).toBe('...') or
expect(stderr).toContain(...) as appropriate) and only after all output checks
call expect(exitCode).toBe(0); remove the console.error(stderr) call and surface
stderr via assertions so test failures show output before the exit-code
assertion; locate and update the lines that reference exitCode and stderr in the
test around the current assertions.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 827526b8-9591-4f56-a487-de6fd8d9b313
📒 Files selected for processing (1)
test/js/web/html/FormData-multipart-serialization.test.ts
There was a problem hiding this comment.
No bugs found, but this swaps copies for raw-pointer borrows (push_static on ZigStringSlice::Static name/value bytes, blob.shared_view(), blob.content_type_slice()) whose validity depends on DOMFormData/blob-store lifetimes that the type system doesn't check — worth a human pass on the lifetime reasoning, and the macOS build-rust/build-bun lanes are currently red.
Extended reasoning...
Overview
Touches three files: a trivial StringJoiner::reserve helper in src/bun_core/string/StringJoiner.rs, ~100 lines in src/runtime/webcore/Blob.rs (from_dom_form_data, escape_form_data_name, new FormDataContext::push_string_slice, on_entry), and a new 158-line test file. The Blob.rs changes replace per-piece heap copies with borrowed views: field names/values/filenames via ZigStringSlice::Static → push_static, blob content-type and shared_view() bytes via push_static instead of push_cloned, plus boundary/content-type built without core::fmt.
Security risks
No new attack surface (input is still WHATWG-escaped for "/CR/LF; output is asserted byte-identical). The risk is memory safety: StringJoiner::push_static stores a RawSlice<u8> raw pointer with the documented contract "data is expected to live until .done is called." The new borrows rely on (a) ZigString::to_slice() returning Static pointers into WTF storage that the &mut DOMFormData keeps alive across the for_each → joiner.done() span, and (b) each entry's blob.store / content_type outliving done(). The reasoning matches the Zig original and is well-commented, and the PR reports ASAN-debug passes — but these invariants are not enforced by Rust lifetimes, so a use-after-free here would be silent in release builds.
Level of scrutiny
Medium-high. This is a hot user-facing serialization path (new Response(formData) / fetch bodies), and the change crosses an unsafe-adjacent boundary (raw fat pointers in StringJoiner) by design. It is not a mechanical/config change; the correctness hinges on cross-module ownership contracts (DOMFormData ↔ WTF string impl ↔ blob store) that a reviewer familiar with the JSC/WTF lifetime model should confirm.
Other factors
- Buildkite shows macOS x64 build-rust and aarch64 build-bun failing on the latest commit; the other red items are pre-existing
-no-pielinker warnings, but the two macOS failures look real and should be green before merge. - All prior review threads (CodeRabbit's
matches!false positive, my two test-convention nits) are resolved; the author adopted the stderr-assertion ordering and gave a reasonable rationale for keeping the separate test file. - Test coverage is good (exact wire format, round-trip, 64 KB binary integrity, Linux peak-RSS regression guard), and the bug-hunting system found nothing.
Given the unchecked-lifetime borrows and the red macOS lanes, this warrants a human look rather than auto-approval.
|
CI note: the only red on this PR is Diff-wise this is ready for review: serialization is ~17% faster than the Zig implementation on |
* oven/main (10 new commits): Optimize TextEncoder.encode: restore SIMD ASCII fast paths lost in the Rust port (oven-sh#31385) js_parser: sanitize auto-generated default export name for digit-named modules (oven-sh#31403) fetch: run checkServerIdentity before writing the request (oven-sh#31325) ffi: avoid copying the threadsafe callback wrapper on the calling thread (oven-sh#31332) install: gate the exit-callback cache teardown to the main thread (oven-sh#31376) fix(node:module): don't register native helpers as their own constructors (oven-sh#31393) css: escape custom pseudo-class/element names when printing (oven-sh#31404) Deepen the lots-of-for-loop fixture so the transpiler stack-overflow tests throw on Windows (oven-sh#31382) Hardening: input validation and bounds tightening across 36 subsystems (round 4) (oven-sh#31339) Speed up FormData multipart serialization (oven-sh#31379) Auto-merged: src/install/PackageManager.rs, src/runtime/cli/upgrade_command.rs, src/runtime/webcore/Blob.rs, src/sys/lib.rs
What / why
bench/snippets/form-data.mjsregressed vs the Zig implementation:new Response(formData).formData()−52%,.text()−28% (whileFormData.appenditself got faster), i.e. the multipart serialization path is what slowed down.The port of
Blob::from_dom_form_data(the FormData → multipart body serializer) did a heap allocation + copy for every piece it joined, and went throughcore::fmtfor the boundary and content-type strings:StringJoiner(ZigStringSlice::into_vec()copies when the slice is borrowed; the Zig code borrowed ASCII strings and only allocated for UTF-16/non-ASCII)push_cloned(Zig borrowed both — for blob parts this is an extra O(size) copy)write!("{:02x}")calls throughcore::fmtmultipart/form-data; boundary=…string was grown from an emptyVecthroughcore::fmt/bstrFix
src/runtime/webcore/Blob.rs:bun_core::fmt::bytes_to_hex_lowerFormDataContext::push_string_slice: borrow name/value/filename bytes into the joiner when they're already UTF-8 (theDOMFormDataoutlivesjoiner.done(), same contract the Zig code used), transfer ownership when a UTF-16/Latin-1→UTF-8 conversion already allocated, and only copy when"/CR/LF escaping is actually needed (escape_form_data_namenow returnsOption)push_static) instead of cloningStringJoiner::reserve, new small helper inbun_core)Serialized output is byte-for-byte unchanged (verified against current main and against bun v1.3.14; the only intentional difference vs 1.3.14 remains the already-shipped WHATWG
%22/%0D/%0Aname escaping).Benchmarks
bench/snippets/form-data.mjs, x64 Linux (Xeon 8375C), comparing the official bun v1.3.14 release (last Zig build), current main, and this PR (both local release builds):new Response(formData).text()(serialize)new Response(formData).formData()(round-trip)response.formData()(parse only)FormData.appendSerialization is now ~17% faster than the Zig implementation (was 34% slower) and ~38% faster than main. The round-trip improves ~20% but is still ~11% behind Zig: the remainder is a fixed cost in the
.formData()parse/consume path (it reproduces with URL-encoded bodies too, so it isn't the multipart parser; per-field parse cost is within ~6% of Zig). That pre-existing parse-side gap (the −12%response.formData()line in the original report) isn't addressed here.Tests
test/js/web/html/FormData-multipart-serialization.test.ts(new, focused file next to the existingFormData-file-error-leak.test.ts):%22/%0D/%0Aescaping in names & filenames,application/octet-streamfallback, blob/File content typesAll three pass under
bun bd test(debug+ASAN) and the release build.test/js/web/html/FormData.test.ts,form-data-boundary-crash.test.ts,body.test.ts,blob.test.tsandFormData-file-error-leak.test.tspass with this change (the leak fixture reports 0 MB growth before and after — ownership only got narrower).This is a performance-only change: the new tests document/lock the serialization format but necessarily pass on main too.