fix(formdata): preserve binary data with null bytes in multipart parsing#27483
Conversation
`Field.value` used `bun.Semver.String` which treats inline data (<=8 bytes) as null-terminated strings, truncating binary file content at the first 0x00 byte. Replace with a raw `[]const u8` slice into the input buffer. Closes #27478 Co-Authored-By: Claude <noreply@anthropic.com>
|
Updated 1:47 PM PT - Feb 26th, 2026
❌ @autofix-ci[bot], your commit 7d19568 has 4 failures in
🧪 To try this PR locally: bunx bun-pr 27483That installs a local version of the PR into your bun-27483 --bun |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review infoConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Disabled knowledge base sources:
📒 Files selected for processing (3)
WalkthroughChanges to enable proper null-byte preservation in multipart form-data handling by modifying the FormData field value type from a string type to a raw binary slice, along with documentation formatting updates and regression test coverage for binary file parsing. Changes
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
LGTM — targeted fix replacing Semver.String (which truncates at null bytes) with a raw slice for binary field values, well-contained and well-tested.
Extended reasoning...
Overview
This PR fixes a bug where Request.formData() truncated small binary files at the first 0x00 byte. The change touches two files: src/url.zig (the fix) and a new regression test. The fix replaces FormData.Field.value from bun.Semver.String to []const u8, and updates the two sites where it is assigned and read.
Security risks
No security concerns. This fix actually improves correctness by preserving binary data integrity — the previous behavior silently truncated data, which could be considered a data integrity issue.
Level of scrutiny
Production-critical code (multipart form data parsing), but the change is minimal and well-scoped. I verified that FormData.Field is only used within url.zig itself — forEachMultipartEntry is called only from toJSFromMultipartData in the same file. The body slice points into the input buffer passed to the parser, which has sufficient lifetime throughout the callback execution.
Other factors
- The
filenameandcontent_typefields correctly remain asbun.Semver.Stringsince they contain text, not arbitrary binary data. - Four regression test cases cover the original issue, all-null bytes, single null byte, and 8-byte files with interleaved nulls (the Semver.String inline storage boundary).
- Existing FormData test suites reportedly pass (342 + 110 + 2 tests).
There was a problem hiding this comment.
LGTM — targeted fix replacing Semver.String (C-string semantics) with a raw byte slice for FormData field values, with good regression tests.
Extended reasoning...
Overview
This PR fixes a bug where Request.formData() truncated small binary files at the first null byte. The core change is in src/url.zig: FormData.Field.value is changed from bun.Semver.String (which uses C-string null-termination for inline storage) to []const u8 (a raw byte slice). Two usage sites are updated accordingly. The PR also includes a benign markdown table formatting fix in docs/bundler/executables.mdx and a new regression test file.
Security risks
No security risks. The change actually fixes a data integrity issue — binary data was being silently truncated. The new code uses a direct slice into the input buffer, which has correct lifetime semantics (the input buffer outlives the callback).
Level of scrutiny
This is production-critical code (HTTP body parsing), so high scrutiny is warranted. However, the change is minimal (3 lines of Zig), the Field type is only used internally within url.zig (verified via grep — only two usage sites, both updated), and the fix is straightforward: replacing a string type that has null-termination semantics with a raw byte slice for binary data.
Other factors
- No external callers access
Field.valuedirectly — the type is internal to the multipart parsing logic. - Regression tests cover the original issue (gzip header bytes), all-null files, single null byte, and 8-byte files with interleaved nulls.
- The PR description confirms existing FormData tests continue to pass (342 + 110 + 2 tests).
- CodeRabbit found no actionable issues.
…ing (oven-sh#27483) ## Summary - `Request.formData()` truncated small binary files (≤8 bytes) at the first `0x00` (null) byte - Root cause: `FormData.Field.value` used `bun.Semver.String`, whose inline storage mode scans for null bytes to determine string length (C-string semantics) - Fix: Replace `Field.value` with a raw `[]const u8` slice into the input buffer, bypassing `Semver.String` entirely ## Test plan - [x] Added regression test `test/regression/issue/27478.test.ts` with 4 cases: - Gzip header bytes (`[0x1f, 0x8b, 0x08, 0x00]`) - original issue repro - All-null-byte file (`[0x00, 0x00, 0x00, 0x00]`) - Single null byte file (`[0x00]`) - 8-byte file with interleaved nulls (`[0x01, 0x00, 0x02, 0x00, ...]`) - [x] Tests pass with `bun bd test` and fail with `USE_SYSTEM_BUN=1 bun test` - [x] Existing FormData tests pass (342 pass in body.test.ts, 110 pass in FormData.test.ts, 2 pass in form-data-set-append.test.js) Closes oven-sh#27478 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Bot <claude-bot@bun.sh> Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
…ing (oven-sh#27483) ## Summary - `Request.formData()` truncated small binary files (≤8 bytes) at the first `0x00` (null) byte - Root cause: `FormData.Field.value` used `bun.Semver.String`, whose inline storage mode scans for null bytes to determine string length (C-string semantics) - Fix: Replace `Field.value` with a raw `[]const u8` slice into the input buffer, bypassing `Semver.String` entirely ## Test plan - [x] Added regression test `test/regression/issue/27478.test.ts` with 4 cases: - Gzip header bytes (`[0x1f, 0x8b, 0x08, 0x00]`) - original issue repro - All-null-byte file (`[0x00, 0x00, 0x00, 0x00]`) - Single null byte file (`[0x00]`) - 8-byte file with interleaved nulls (`[0x01, 0x00, 0x02, 0x00, ...]`) - [x] Tests pass with `bun bd test` and fail with `USE_SYSTEM_BUN=1 bun test` - [x] Existing FormData tests pass (342 pass in body.test.ts, 110 pass in FormData.test.ts, 2 pass in form-data-set-append.test.js) Closes oven-sh#27478 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Bot <claude-bot@bun.sh> Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
The multipart parser stored part name, filename, and content-type as bun.Semver.String, which packs offset and length into 32-bit fields (with bit 31 of length stolen as a tag). For any part whose header sat past 4 GiB in the request body, the offset wrapped and the parser read garbage — field names came back as bytes from the middle of the preceding file body and the part was unreachable by name. This is the remaining half of the bug behind #21490. The file body itself (Field.value) was already switched to a raw slice in #27483, which fixed the 2 GiB length truncation, but name/filename/content_type were still u32-indexed. Switch all Field slices to raw []const u8 and drop the subslicer. Also remove a leftover @as(u32, @intcast(bytes.len)) in ArrayBuffer.fromBytes whose target fields are usize — it panicked on >4 GiB buffers in debug builds on the client-side FormData serialization path. Fixes #21490
The multipart parser stored part name, filename, and content-type as bun.Semver.String, which packs offset and length into 32-bit fields (with bit 31 of length stolen as a tag). For any part whose header sat past 4 GiB in the request body, the offset wrapped and the parser read garbage — field names came back as bytes from the middle of the preceding file body and the part was unreachable by name. This is the remaining half of the bug behind #21490. The file body itself (Field.value) was already switched to a raw slice in #27483, which fixed the 2 GiB length truncation, but name/filename/content_type were still u32-indexed. Switch all Field slices to raw []const u8 and drop the subslicer. Also remove a leftover @as(u32, @intcast(bytes.len)) in ArrayBuffer.fromBytes whose target fields are usize — it panicked on >4 GiB buffers in debug builds on the client-side FormData serialization path. Fixes #21490
The multipart parser stored part name, filename, and content-type as bun.Semver.String, which packs offset and length into 32-bit fields (with bit 31 of length stolen as a tag). For any part whose header sat past 4 GiB in the request body, the offset wrapped and the parser read garbage — field names came back as bytes from the middle of the preceding file body and the part was unreachable by name. This is the remaining half of the bug behind #21490. The file body itself (Field.value) was already switched to a raw slice in #27483, which fixed the 2 GiB length truncation, but name/filename/content_type were still u32-indexed. Switch all Field slices to raw []const u8 and drop the subslicer. Also remove a leftover @as(u32, @intcast(bytes.len)) in ArrayBuffer.fromBytes whose target fields are usize — it panicked on >4 GiB buffers in debug builds on the client-side FormData serialization path. Fixes #21490
The multipart parser stored part name, filename, and content-type as bun.Semver.String, which packs offset and length into 32-bit fields (with bit 31 of length stolen as a tag). For any part whose header sat past 4 GiB in the request body, the offset wrapped and the parser read garbage — field names came back as bytes from the middle of the preceding file body and the part was unreachable by name. This is the remaining half of the bug behind #21490. The file body itself (Field.value) was already switched to a raw slice in #27483, which fixed the 2 GiB length truncation, but name/filename/content_type were still u32-indexed. Switch all Field slices to raw []const u8 and drop the subslicer. Also remove a leftover @as(u32, @intcast(bytes.len)) in ArrayBuffer.fromBytes whose target fields are usize — it panicked on >4 GiB buffers in debug builds on the client-side FormData serialization path. Fixes #21490
The multipart parser stored part name, filename, and content-type as bun.Semver.String, which packs offset and length into 32-bit fields (with bit 31 of length stolen as a tag). For any part whose header sat past 4 GiB in the request body, the offset wrapped and the parser read garbage — field names came back as bytes from the middle of the preceding file body and the part was unreachable by name. This is the remaining half of the bug behind #21490. The file body itself (Field.value) was already switched to a raw slice in #27483, which fixed the 2 GiB length truncation, but name/filename/content_type were still u32-indexed. Switch all Field slices to raw []const u8 and drop the subslicer. Also remove a leftover @as(u32, @intcast(bytes.len)) in ArrayBuffer.fromBytes whose target fields are usize — it panicked on >4 GiB buffers in debug builds on the client-side FormData serialization path. Fixes #21490
Summary
Request.formData()truncated small binary files (≤8 bytes) at the first0x00(null) byteFormData.Field.valueusedbun.Semver.String, whose inline storage mode scans for null bytes to determine string length (C-string semantics)Field.valuewith a raw[]const u8slice into the input buffer, bypassingSemver.StringentirelyTest plan
test/regression/issue/27478.test.tswith 4 cases:[0x1f, 0x8b, 0x08, 0x00]) - original issue repro[0x00, 0x00, 0x00, 0x00])[0x00])[0x01, 0x00, 0x02, 0x00, ...])bun bd testand fail withUSE_SYSTEM_BUN=1 bun testCloses #27478
🤖 Generated with Claude Code