Skip to content

Speed up FormData multipart serialization#31379

Merged
Jarred-Sumner merged 4 commits into
mainfrom
farm/8abb3918/formdata-serialization-perf
May 25, 2026
Merged

Speed up FormData multipart serialization#31379
Jarred-Sumner merged 4 commits into
mainfrom
farm/8abb3918/formdata-serialization-perf

Conversation

@robobun

@robobun robobun commented May 25, 2026

Copy link
Copy Markdown
Collaborator

What / why

bench/snippets/form-data.mjs regressed vs the Zig implementation: new Response(formData).formData() −52%, .text() −28% (while FormData.append itself got faster), i.e. the multipart serialization path is what slowed down.

The port of Blob::from_dom_form_data (the FormData → multipart body serializer) did a heap allocation + copy for every piece it joined, and went through core::fmt for the boundary and content-type strings:

  • every field name / value / filename was copied into the StringJoiner (ZigStringSlice::into_vec() copies when the slice is borrowed; the Zig code borrowed ASCII strings and only allocated for UTF-16/non-ASCII)
  • the file part content type and the in-memory blob bytes were push_cloned (Zig borrowed both — for blob parts this is an extra O(size) copy)
  • the boundary was formatted with 16 separate write!("{:02x}") calls through core::fmt
  • the multipart/form-data; boundary=… string was grown from an empty Vec through core::fmt/bstr

Fix

src/runtime/webcore/Blob.rs:

  • hex-encode the boundary in one pass with bun_core::fmt::bytes_to_hex_lower
  • FormDataContext::push_string_slice: borrow name/value/filename bytes into the joiner when they're already UTF-8 (the DOMFormData outlives joiner.done(), same contract the Zig code used), transfer ownership when a UTF-16/Latin-1→UTF-8 conversion already allocated, and only copy when "/CR/LF escaping is actually needed (escape_form_data_name now returns Option)
  • borrow file-part content types and in-memory blob bytes (push_static) instead of cloning
  • build the content-type string with one exact-size allocation
  • reserve the joiner's node list up front (StringJoiner::reserve, new small helper in bun_core)

Serialized output is byte-for-byte unchanged (verified against current main and against bun v1.3.14; the only intentional difference vs 1.3.14 remains the already-shipped WHATWG %22/%0D/%0A name escaping).

Benchmarks

bench/snippets/form-data.mjs, x64 Linux (Xeon 8375C), comparing the official bun v1.3.14 release (last Zig build), current main, and this PR (both local release builds):

benchmark Zig 1.3.14 main this PR
new Response(formData).text() (serialize) 1.32 µs 1.77 µs 1.10 µs
new Response(formData).formData() (round-trip) 2.45 µs 3.43 µs 2.74 µs
response.formData() (parse only) 1.93 µs 2.23 µs 2.21 µs (unchanged)
FormData.append 0.58 µs 0.68 µs 0.68 µs (unchanged)

Serialization is now ~17% faster than the Zig implementation (was 34% slower) and ~38% faster than main. The round-trip improves ~20% but is still ~11% behind Zig: the remainder is a fixed cost in the .formData() parse/consume path (it reproduces with URL-encoded bodies too, so it isn't the multipart parser; per-field parse cost is within ~6% of Zig). That pre-existing parse-side gap (the −12% response.formData() line in the original report) isn't addressed here.

Tests

test/js/web/html/FormData-multipart-serialization.test.ts (new, focused file next to the existing FormData-file-error-leak.test.ts):

  • exact wire format: boundary shape, part layout/CRLFs, %22/%0D/%0A escaping in names & filenames, application/octet-stream fallback, blob/File content types
  • full serialize → parse round-trip for string fields, duplicate names, non-ASCII names/values/filenames, blobs and typed Files
  • 64 KB binary blob round-trips byte-for-byte (exercises the borrowed blob-bytes path; under the ASAN debug build this would catch any lifetime mistake in the new borrow-based joining)
  • a peak-memory check (Linux, child process, VmHWM) that serializing a 128 MB in-memory blob stays at ~1x the blob size — it fails at the ~2x that the extra per-entry copy on main produces, so the no-copy behavior is locked in and can't silently regress

All three pass under bun bd test (debug+ASAN) and the release build. test/js/web/html/FormData.test.ts, form-data-boundary-crash.test.ts, body.test.ts, blob.test.ts and FormData-file-error-leak.test.ts pass with this change (the leak fixture reports 0 MB growth before and after — ownership only got narrower).

This is a performance-only change: the new tests document/lock the serialization format but necessarily pass on main too.

Blob::from_dom_form_data copied every field name, value, filename,
content type and in-memory blob body into the StringJoiner, and built
the boundary and content-type strings through core::fmt. Now:

- boundary: single-pass hex encode (bun_core::fmt::bytes_to_hex_lower)
  instead of 16 write!("{:02x}") calls through core::fmt
- names/values/filenames: borrowed into the joiner when already UTF-8
  (the DOMFormData outlives the join), ownership transferred when a
  UTF-16/Latin-1 conversion already allocated, copied only when
  escaping is required
- file part content types and in-memory blob bytes: borrowed instead of
  cloned (removes an O(size) copy when serializing large blobs)
- "multipart/form-data; boundary=...": one exact-size allocation
- joiner node list reserved up front via the new StringJoiner::reserve

Serialized output is byte-for-byte unchanged.
@robobun

robobun commented May 25, 2026

Copy link
Copy Markdown
Collaborator Author
Updated 5:03 AM PT - May 25th, 2026

@robobun, your commit 75a3ba0 has 1 failures in Build #57897 (All Failures):


🧪   To try this PR locally:

bunx bun-pr 31379

That installs a local version of the PR into your bun-31379 executable, so you can run:

bun-31379 --bun

@coderabbitai

coderabbitai Bot commented May 25, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Warning

Review limit reached

@robobun, we couldn't start this review because you've used your available PR reviews for now.

Your plan includes 5 reviews of capacity. Refill in 15 minutes and 47 seconds.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more review capacity refills, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than trial, open-source, and free plans. In all cases, review capacity refills continuously over time.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: b1c01a3a-ed74-475a-9560-e13d7ee25a81

📥 Commits

Reviewing files that changed from the base of the PR and between 4ab58f9 and 75a3ba0.

📒 Files selected for processing (1)
  • test/js/web/html/FormData-multipart-serialization.test.ts

Walkthrough

This PR reduces allocations and tightens ownership in FormData multipart serialization: StringJoiner gains reserve(), multipart boundary and Content-Type are built without formatting, escaping is centralized in a push helper, blob bodies are emitted as borrowed slices, and tests verify wire-level format and round-trips.

Changes

FormData multipart serialization optimization

Layer / File(s) Summary
StringJoiner capacity reservation
src/bun_core/string/StringJoiner.rs
StringJoiner::reserve() method forwards capacity reservation to the internal nodes vector, enabling pre-allocation before pushing multiple slices.
Multipart boundary and Content-Type generation
src/runtime/webcore/Blob.rs
Boundary construction now uses a fixed stack buffer with constant prefix and UUID hex-encoding; Content-Type header is built by appending constant prefix and computed boundary bytes instead of formatting.
StringJoiner pre-sizing based on form entry count
src/runtime/webcore/Blob.rs
StringJoiner capacity is reserved before iterating form entries using a computed upper bound for slices-per-entry plus closing boundary slices.
Form-data escaping refactor and push_string_slice helper
src/runtime/webcore/Blob.rs
escape_form_data_name now borrows input and conditionally allocates only when quoting/control characters are present; new FormDataContext::push_string_slice centralizes how ZigStringSlice variants (owned/static/WTF-backed) are appended with optional percent-encoding for "/CR/LF and correct ownership semantics.
Blob view borrowing and metadata optimization
src/runtime/webcore/Blob.rs
Content-Type header and blob.shared_view() emission switched from cloned to borrowed/static slices, with surrounding comments asserting DOMFormData/blob lifetimes cover joiner.done().
Error handling documentation update
src/runtime/webcore/Blob.rs
Failure-path comment clarified to document that joiner drop frees all heap-owned slices already pushed.
Multipart serialization test suite
test/js/web/html/FormData-multipart-serialization.test.ts
New test file verifies wire-level multipart format including boundary format, plain/empty/Unicode/escaped field names, Blob/File metadata handling (filename escaping, Content-Type), and round-trip serialization via new Response(formData).formData() preserving duplicate fields, Unicode values, blob metadata, and binary payload integrity.
🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely captures the main optimization objective—speeding up FormData multipart serialization—which aligns with the primary change across all modified files.
Description check ✅ Passed The description comprehensively covers the What/Why and How sections with detailed context, changes, benchmarks, and verification; both template sections are thoroughly addressed.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/runtime/webcore/Blob.rs`:
- Around line 3958-3974: push_string_slice currently uses matches!(slice, ...)
which moves the ZigStringSlice value and makes subsequent calls like
slice.into_vec() or slice.slice() invalid; change the branch tests to match on a
reference (e.g., if let ZigStringSlice::Owned(_) = &slice { ... } else if let
ZigStringSlice::Static(..) = &slice { ... } else { ... }) so you only borrow for
the pattern checks and still can consume slice with slice.into_vec() in the
Owned branch; keep the same calls to
joiner.push_owned(slice.into_vec().into_boxed_slice()),
joiner.push_static(slice.slice()), and joiner.push_cloned(slice.slice()) and
preserve the comment about WTF-backed slices.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: f55100ee-8085-4bfe-9167-b0b7f6c2623d

📥 Commits

Reviewing files that changed from the base of the PR and between a49b308 and 469993f.

📒 Files selected for processing (3)
  • src/bun_core/string/StringJoiner.rs
  • src/runtime/webcore/Blob.rs
  • test/js/web/html/FormData-multipart-serialization.test.ts

Comment thread src/runtime/webcore/Blob.rs

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't find any bugs and the lifetime reasoning checks out against DOMFormData__forEach on the C++ side, but since this swaps owned copies for raw-pointer borrows held across the FFI iteration (names/filenames/content-type/shared_view()) it'd be good to have a human who knows the WTF::String / Blob ownership model confirm the borrow contract.

Extended reasoning...

Overview

This PR optimizes Blob::from_dom_form_data (the FormData → multipart serializer) by eliminating per-piece heap copies: field names/values/filenames, blob content-types, and in-memory blob bytes are now pushed into the StringJoiner as borrowed push_static slices instead of push_cloned/push_owned copies, the boundary is hex-encoded in one pass via bytes_to_hex_lower, the content-type header is built with one exact-size allocation, and a new trivial StringJoiner::reserve presizes the node list. escape_form_data_name now returns Option so the no-escape fast path avoids allocation. A new test file locks the wire format and round-trips a 64 KB binary blob.

Security risks

No new attack surface — output is byte-for-byte identical and the WHATWG %22/%0D/%0A name/filename escaping is preserved (and now covered by an exact-bytes test). The relevant risk is memory safety: push_static stores a RawSlice<u8> raw pointer with no compile-time lifetime, so correctness depends entirely on the borrowed bytes outliving joiner.done(). I traced this through DOMFormData__forEach in src/jsc/bindings/DOMFormData.cpp: toZigString(item.name) / toZigString(*value) / toZigString(blob->fileName()) produce views into WTF::StringImpl buffers ref-held by form->m_items, and blob->impl()'s content-type/store are held by the same RefPtr<Blob> entry. form_data is &mut DOMFormData for the whole function and isn't mutated between DOMFormData__forEach and joiner.done() at line ~958, so the contract holds — and it's the same contract the Zig implementation used. ASAN reportedly passes.

Level of scrutiny

Medium-high. The mechanical pieces (reserve, hex encoding, content-type Vec build, Option-returning escape) are straightforward and obviously correct. But the core change replaces defensive copies with raw-pointer borrows across an FFI callback boundary in a hot, user-facing path (new Response(formData)). The ZigStringSlice::Static variant name is a bit of a misnomer here (it's "borrowed, caller-guarantees-lifetime", not 'static), and the safety hinges on C++-side invariants (WTF::String refcounting, fileName() sharing the blob's StringImpl) that aren't visible from the Rust diff. A maintainer familiar with the WTF/Blob ownership model should sign off on that specific assumption rather than have it auto-approved.

Other factors

The PR is exceptionally well-documented (every borrow site has a comment citing the Zig precedent), adds targeted tests including a 64 KB binary round-trip that would trip ASAN on a UAF, and the existing FormData/body/blob suites pass. The bug-hunting pass found nothing. I'm deferring purely on the "raw-pointer lifetime change in production runtime code" axis, not because I see a problem.

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't find any bugs and the lifetime reasoning checks out against DOMFormData__forEach on the C++ side, but since this swaps owned copies for raw-pointer borrows held across the FFI iteration (names/filenames/content-type/shared_view()) it'd be good to have a human who knows the WTF::String / Blob ownership model confirm the borrow contract.

Extended reasoning...

Overview

This PR optimizes Blob::from_dom_form_data (the FormData → multipart serializer) by eliminating per-piece heap copies: field names/values/filenames, blob content-types, and in-memory blob bytes are now pushed into the StringJoiner as borrowed push_static slices instead of push_cloned/push_owned copies, the boundary is hex-encoded in one pass via bytes_to_hex_lower, the content-type header is built with one exact-size allocation, and a new trivial StringJoiner::reserve presizes the node list. escape_form_data_name now returns Option so the no-escape fast path avoids allocation. A new test file locks the wire format and round-trips a 64 KB binary blob.

Security risks

No new attack surface — output is byte-for-byte identical and the WHATWG %22/%0D/%0A name/filename escaping is preserved (and now covered by an exact-bytes test). The relevant risk is memory safety: push_static stores a RawSlice<u8> raw pointer with no compile-time lifetime, so correctness depends entirely on the borrowed bytes outliving joiner.done(). I traced this through DOMFormData__forEach in src/jsc/bindings/DOMFormData.cpp: toZigString(item.name) / toZigString(*value) / toZigString(blob->fileName()) produce views into WTF::StringImpl buffers ref-held by form->m_items, and blob->impl()'s content-type/store are held by the same RefPtr<Blob> entry. form_data is &mut DOMFormData for the whole function and isn't mutated between DOMFormData__forEach and joiner.done() at line ~958, so the contract holds — and it's the same contract the Zig implementation used. ASAN reportedly passes.

Level of scrutiny

Medium-high. The mechanical pieces (reserve, hex encoding, content-type Vec build, Option-returning escape) are straightforward and obviously correct. But the core change replaces defensive copies with raw-pointer borrows across an FFI callback boundary in a hot, user-facing path (new Response(formData)). The ZigStringSlice::Static variant name is a bit of a misnomer here (it's "borrowed, caller-guarantees-lifetime", not 'static), and the safety hinges on C++-side invariants (WTF::String refcounting, fileName() sharing the blob's StringImpl) that aren't visible from the Rust diff. A maintainer familiar with the WTF/Blob ownership model should sign off on that specific assumption rather than have it auto-approved.

Other factors

The PR is exceptionally well-documented (every borrow site has a comment citing the Zig precedent), adds targeted tests including a 64 KB binary round-trip that would trip ASAN on a UAF, and the existing FormData/body/blob suites pass. The bug-hunting pass found nothing. I'm deferring purely on the "raw-pointer lifetime change in production runtime code" axis, not because I see a problem.

Serializing an in-memory blob entry now borrows its bytes, so the only
blob-sized allocation in new Response(formData) is the joined multipart
body (~1x the blob size). The test measures peak RSS (VmHWM) across the
Response constructor in a child process and fails if it reaches ~2x,
i.e. if a full copy of the blob bytes is ever reintroduced on this path.
Linux-only; the existing format and round-trip tests are unchanged.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/js/web/html/FormData-multipart-serialization.test.ts`:
- Around line 145-147: Move the exitCode assertion to the end of the test: first
assert stdout/stderr contents (use expect(stderr).toBe('...') or
expect(stderr).toContain(...) as appropriate) and only after all output checks
call expect(exitCode).toBe(0); remove the console.error(stderr) call and surface
stderr via assertions so test failures show output before the exit-code
assertion; locate and update the lines that reference exitCode and stderr in the
test around the current assertions.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 827526b8-9591-4f56-a487-de6fd8d9b313

📥 Commits

Reviewing files that changed from the base of the PR and between 469993f and 4ab58f9.

📒 Files selected for processing (1)
  • test/js/web/html/FormData-multipart-serialization.test.ts

Comment thread test/js/web/html/FormData-multipart-serialization.test.ts Outdated
Comment thread test/js/web/html/FormData-multipart-serialization.test.ts Outdated
Comment thread test/js/web/html/FormData-multipart-serialization.test.ts

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No bugs found, but this swaps copies for raw-pointer borrows (push_static on ZigStringSlice::Static name/value bytes, blob.shared_view(), blob.content_type_slice()) whose validity depends on DOMFormData/blob-store lifetimes that the type system doesn't check — worth a human pass on the lifetime reasoning, and the macOS build-rust/build-bun lanes are currently red.

Extended reasoning...

Overview

Touches three files: a trivial StringJoiner::reserve helper in src/bun_core/string/StringJoiner.rs, ~100 lines in src/runtime/webcore/Blob.rs (from_dom_form_data, escape_form_data_name, new FormDataContext::push_string_slice, on_entry), and a new 158-line test file. The Blob.rs changes replace per-piece heap copies with borrowed views: field names/values/filenames via ZigStringSlice::Staticpush_static, blob content-type and shared_view() bytes via push_static instead of push_cloned, plus boundary/content-type built without core::fmt.

Security risks

No new attack surface (input is still WHATWG-escaped for "/CR/LF; output is asserted byte-identical). The risk is memory safety: StringJoiner::push_static stores a RawSlice<u8> raw pointer with the documented contract "data is expected to live until .done is called." The new borrows rely on (a) ZigString::to_slice() returning Static pointers into WTF storage that the &mut DOMFormData keeps alive across the for_eachjoiner.done() span, and (b) each entry's blob.store / content_type outliving done(). The reasoning matches the Zig original and is well-commented, and the PR reports ASAN-debug passes — but these invariants are not enforced by Rust lifetimes, so a use-after-free here would be silent in release builds.

Level of scrutiny

Medium-high. This is a hot user-facing serialization path (new Response(formData) / fetch bodies), and the change crosses an unsafe-adjacent boundary (raw fat pointers in StringJoiner) by design. It is not a mechanical/config change; the correctness hinges on cross-module ownership contracts (DOMFormData ↔ WTF string impl ↔ blob store) that a reviewer familiar with the JSC/WTF lifetime model should confirm.

Other factors

  • Buildkite shows macOS x64 build-rust and aarch64 build-bun failing on the latest commit; the other red items are pre-existing -no-pie linker warnings, but the two macOS failures look real and should be green before merge.
  • All prior review threads (CodeRabbit's matches! false positive, my two test-convention nits) are resolved; the author adopted the stderr-assertion ordering and gave a reasonable rationale for keeping the separate test file.
  • Test coverage is good (exact wire format, round-trip, 64 KB binary integrity, Linux peak-RSS regression guard), and the bug-hunting system found nothing.

Given the unchecked-lifetime borrows and the red macOS lanes, this warrants a human look rather than auto-approval.

@robobun

robobun commented May 25, 2026

Copy link
Copy Markdown
Collaborator Author

CI note: the only red on this PR is buildkite/bun/windows-2019-x64-test-bun, failing in test/bundler/transpiler/transpiler.test.js ("Bun.Transpiler.transformSync/transform stack overflows" no longer throw "Maximum call stack size exceeded" on Windows 2019). That's unrelated to this change — nothing here touches the transpiler — and it's a pre-existing issue on current main with its own fix open in #31382; once that lands, a re-run of this build should be fully green. Everything this PR touches is green: all build lanes on every platform, and the new FormData-multipart-serialization tests pass on every test lane that has completed (the other annotations are retried flakes in update_interactive_install and serve-body-leak, also unrelated).

Diff-wise this is ready for review: serialization is ~17% faster than the Zig implementation on bench/snippets/form-data.mjs (was 34% slower), output is byte-identical, and the test suite locks the wire format, round-trip, and the no-extra-copy memory behavior.

@Jarred-Sumner Jarred-Sumner merged commit fc26825 into main May 25, 2026
76 of 78 checks passed
@Jarred-Sumner Jarred-Sumner deleted the farm/8abb3918/formdata-serialization-perf branch May 25, 2026 20:11
springmin pushed a commit to springmin/bun that referenced this pull request May 26, 2026
* oven/main (10 new commits):
  Optimize TextEncoder.encode: restore SIMD ASCII fast paths lost in the Rust port (oven-sh#31385)
  js_parser: sanitize auto-generated default export name for digit-named modules (oven-sh#31403)
  fetch: run checkServerIdentity before writing the request (oven-sh#31325)
  ffi: avoid copying the threadsafe callback wrapper on the calling thread (oven-sh#31332)
  install: gate the exit-callback cache teardown to the main thread (oven-sh#31376)
  fix(node:module): don't register native helpers as their own constructors (oven-sh#31393)
  css: escape custom pseudo-class/element names when printing (oven-sh#31404)
  Deepen the lots-of-for-loop fixture so the transpiler stack-overflow tests throw on Windows (oven-sh#31382)
  Hardening: input validation and bounds tightening across 36 subsystems (round 4) (oven-sh#31339)
  Speed up FormData multipart serialization (oven-sh#31379)

Auto-merged: src/install/PackageManager.rs, src/runtime/cli/upgrade_command.rs, src/runtime/webcore/Blob.rs, src/sys/lib.rs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants