Skip to content

fix(formdata): preserve binary data with null bytes in multipart parsing#27483

Merged
Jarred-Sumner merged 2 commits into
mainfrom
claude/fix-formdata-null-byte-truncation-27478
Feb 28, 2026
Merged

fix(formdata): preserve binary data with null bytes in multipart parsing#27483
Jarred-Sumner merged 2 commits into
mainfrom
claude/fix-formdata-null-byte-truncation-27478

Conversation

@robobun

@robobun robobun commented Feb 26, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • Request.formData() truncated small binary files (≤8 bytes) at the first 0x00 (null) byte
  • Root cause: FormData.Field.value used bun.Semver.String, whose inline storage mode scans for null bytes to determine string length (C-string semantics)
  • Fix: Replace Field.value with a raw []const u8 slice into the input buffer, bypassing Semver.String entirely

Test plan

  • Added regression test test/regression/issue/27478.test.ts with 4 cases:
    • Gzip header bytes ([0x1f, 0x8b, 0x08, 0x00]) - original issue repro
    • All-null-byte file ([0x00, 0x00, 0x00, 0x00])
    • Single null byte file ([0x00])
    • 8-byte file with interleaved nulls ([0x01, 0x00, 0x02, 0x00, ...])
  • Tests pass with bun bd test and fail with USE_SYSTEM_BUN=1 bun test
  • Existing FormData tests pass (342 pass in body.test.ts, 110 pass in FormData.test.ts, 2 pass in form-data-set-append.test.js)

Closes #27478

🤖 Generated with Claude Code

`Field.value` used `bun.Semver.String` which treats inline data (<=8 bytes)
as null-terminated strings, truncating binary file content at the first 0x00
byte. Replace with a raw `[]const u8` slice into the input buffer.

Closes #27478

Co-Authored-By: Claude <noreply@anthropic.com>
@robobun

robobun commented Feb 26, 2026

Copy link
Copy Markdown
Collaborator Author
Updated 1:47 PM PT - Feb 26th, 2026

@autofix-ci[bot], your commit 7d19568 has 4 failures in Build #38179 (All Failures):


🧪   To try this PR locally:

bunx bun-pr 27483

That installs a local version of the PR into your bun-27483 executable, so you can run:

bun-27483 --bun

@coderabbitai

coderabbitai Bot commented Feb 26, 2026

Copy link
Copy Markdown
Contributor

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 9e3330a and 7d19568.

📒 Files selected for processing (3)
  • docs/bundler/executables.mdx
  • src/url.zig
  • test/regression/issue/27478.test.ts

Walkthrough

Changes to enable proper null-byte preservation in multipart form-data handling by modifying the FormData field value type from a string type to a raw binary slice, along with documentation formatting updates and regression test coverage for binary file parsing.

Changes

Cohort / File(s) Summary
Documentation
docs/bundler/executables.mdx
Formatting adjustments to the Supported targets table (spacing and alignment) and expanded explanatory content for AVX2 baseline warning with details on cross-compilation and detection behavior.
Core FormData Binary Support
src/url.zig
Modified Field.value declaration from Semver.String to []const u8 to support binary data with null bytes; updated field value handling in multipart body parsing and entry processing to work directly with binary slices.
Test Coverage
test/regression/issue/27478.test.ts
Added regression tests verifying that multipart/form-data formData() preserves null bytes in binary files across various edge cases, including null-byte distributions and exact byte-length validation.
🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: fixing FormData to preserve binary data with null bytes during multipart parsing, which aligns with the core bug fix and code changes.
Description check ✅ Passed The description includes both required sections (What does this PR do and How did you verify your code works), with detailed explanations of the bug, root cause, fix, and comprehensive test coverage.
Linked Issues check ✅ Passed The PR fully addresses the linked issue #27478 by replacing Semver.String with raw []const u8 slices to preserve null bytes, adding regression tests covering the reported bug case and edge cases, and verifying backward compatibility.
Out of Scope Changes check ✅ Passed All changes are directly related to fixing issue #27478: src/url.zig modifies FormData.Field to preserve binary data, docs/bundler/executables.mdx contains only formatting updates unrelated to the fix, and test file adds regression tests.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — targeted fix replacing Semver.String (which truncates at null bytes) with a raw slice for binary field values, well-contained and well-tested.

Extended reasoning...

Overview

This PR fixes a bug where Request.formData() truncated small binary files at the first 0x00 byte. The change touches two files: src/url.zig (the fix) and a new regression test. The fix replaces FormData.Field.value from bun.Semver.String to []const u8, and updates the two sites where it is assigned and read.

Security risks

No security concerns. This fix actually improves correctness by preserving binary data integrity — the previous behavior silently truncated data, which could be considered a data integrity issue.

Level of scrutiny

Production-critical code (multipart form data parsing), but the change is minimal and well-scoped. I verified that FormData.Field is only used within url.zig itself — forEachMultipartEntry is called only from toJSFromMultipartData in the same file. The body slice points into the input buffer passed to the parser, which has sufficient lifetime throughout the callback execution.

Other factors

  • The filename and content_type fields correctly remain as bun.Semver.String since they contain text, not arbitrary binary data.
  • Four regression test cases cover the original issue, all-null bytes, single null byte, and 8-byte files with interleaved nulls (the Semver.String inline storage boundary).
  • Existing FormData test suites reportedly pass (342 + 110 + 2 tests).

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — targeted fix replacing Semver.String (C-string semantics) with a raw byte slice for FormData field values, with good regression tests.

Extended reasoning...

Overview

This PR fixes a bug where Request.formData() truncated small binary files at the first null byte. The core change is in src/url.zig: FormData.Field.value is changed from bun.Semver.String (which uses C-string null-termination for inline storage) to []const u8 (a raw byte slice). Two usage sites are updated accordingly. The PR also includes a benign markdown table formatting fix in docs/bundler/executables.mdx and a new regression test file.

Security risks

No security risks. The change actually fixes a data integrity issue — binary data was being silently truncated. The new code uses a direct slice into the input buffer, which has correct lifetime semantics (the input buffer outlives the callback).

Level of scrutiny

This is production-critical code (HTTP body parsing), so high scrutiny is warranted. However, the change is minimal (3 lines of Zig), the Field type is only used internally within url.zig (verified via grep — only two usage sites, both updated), and the fix is straightforward: replacing a string type that has null-termination semantics with a raw byte slice for binary data.

Other factors

  • No external callers access Field.value directly — the type is internal to the multipart parsing logic.
  • Regression tests cover the original issue (gzip header bytes), all-null files, single null byte, and 8-byte files with interleaved nulls.
  • The PR description confirms existing FormData tests continue to pass (342 + 110 + 2 tests).
  • CodeRabbit found no actionable issues.

@Jarred-Sumner Jarred-Sumner merged commit a870e7b into main Feb 28, 2026
60 of 64 checks passed
@Jarred-Sumner Jarred-Sumner deleted the claude/fix-formdata-null-byte-truncation-27478 branch February 28, 2026 10:42
structwafel pushed a commit to structwafel/bun that referenced this pull request Apr 25, 2026
…ing (oven-sh#27483)

## Summary
- `Request.formData()` truncated small binary files (≤8 bytes) at the
first `0x00` (null) byte
- Root cause: `FormData.Field.value` used `bun.Semver.String`, whose
inline storage mode scans for null bytes to determine string length
(C-string semantics)
- Fix: Replace `Field.value` with a raw `[]const u8` slice into the
input buffer, bypassing `Semver.String` entirely

## Test plan
- [x] Added regression test `test/regression/issue/27478.test.ts` with 4
cases:
- Gzip header bytes (`[0x1f, 0x8b, 0x08, 0x00]`) - original issue repro
  - All-null-byte file (`[0x00, 0x00, 0x00, 0x00]`)
  - Single null byte file (`[0x00]`)
  - 8-byte file with interleaved nulls (`[0x01, 0x00, 0x02, 0x00, ...]`)
- [x] Tests pass with `bun bd test` and fail with `USE_SYSTEM_BUN=1 bun
test`
- [x] Existing FormData tests pass (342 pass in body.test.ts, 110 pass
in FormData.test.ts, 2 pass in form-data-set-append.test.js)

Closes oven-sh#27478

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Bot <claude-bot@bun.sh>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
xhjkl pushed a commit to xhjkl/bun that referenced this pull request May 14, 2026
…ing (oven-sh#27483)

## Summary
- `Request.formData()` truncated small binary files (≤8 bytes) at the
first `0x00` (null) byte
- Root cause: `FormData.Field.value` used `bun.Semver.String`, whose
inline storage mode scans for null bytes to determine string length
(C-string semantics)
- Fix: Replace `Field.value` with a raw `[]const u8` slice into the
input buffer, bypassing `Semver.String` entirely

## Test plan
- [x] Added regression test `test/regression/issue/27478.test.ts` with 4
cases:
- Gzip header bytes (`[0x1f, 0x8b, 0x08, 0x00]`) - original issue repro
  - All-null-byte file (`[0x00, 0x00, 0x00, 0x00]`)
  - Single null byte file (`[0x00]`)
  - 8-byte file with interleaved nulls (`[0x01, 0x00, 0x02, 0x00, ...]`)
- [x] Tests pass with `bun bd test` and fail with `USE_SYSTEM_BUN=1 bun
test`
- [x] Existing FormData tests pass (342 pass in body.test.ts, 110 pass
in FormData.test.ts, 2 pass in form-data-set-append.test.js)

Closes oven-sh#27478

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Bot <claude-bot@bun.sh>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
robobun added a commit that referenced this pull request May 14, 2026
The multipart parser stored part name, filename, and content-type as
bun.Semver.String, which packs offset and length into 32-bit fields
(with bit 31 of length stolen as a tag). For any part whose header sat
past 4 GiB in the request body, the offset wrapped and the parser read
garbage — field names came back as bytes from the middle of the
preceding file body and the part was unreachable by name.

This is the remaining half of the bug behind #21490. The file body
itself (Field.value) was already switched to a raw slice in #27483,
which fixed the 2 GiB length truncation, but name/filename/content_type
were still u32-indexed.

Switch all Field slices to raw []const u8 and drop the subslicer. Also
remove a leftover @as(u32, @intcast(bytes.len)) in ArrayBuffer.fromBytes
whose target fields are usize — it panicked on >4 GiB buffers in debug
builds on the client-side FormData serialization path.

Fixes #21490
robobun added a commit that referenced this pull request May 21, 2026
The multipart parser stored part name, filename, and content-type as
bun.Semver.String, which packs offset and length into 32-bit fields
(with bit 31 of length stolen as a tag). For any part whose header sat
past 4 GiB in the request body, the offset wrapped and the parser read
garbage — field names came back as bytes from the middle of the
preceding file body and the part was unreachable by name.

This is the remaining half of the bug behind #21490. The file body
itself (Field.value) was already switched to a raw slice in #27483,
which fixed the 2 GiB length truncation, but name/filename/content_type
were still u32-indexed.

Switch all Field slices to raw []const u8 and drop the subslicer. Also
remove a leftover @as(u32, @intcast(bytes.len)) in ArrayBuffer.fromBytes
whose target fields are usize — it panicked on >4 GiB buffers in debug
builds on the client-side FormData serialization path.

Fixes #21490
robobun added a commit that referenced this pull request May 24, 2026
The multipart parser stored part name, filename, and content-type as
bun.Semver.String, which packs offset and length into 32-bit fields
(with bit 31 of length stolen as a tag). For any part whose header sat
past 4 GiB in the request body, the offset wrapped and the parser read
garbage — field names came back as bytes from the middle of the
preceding file body and the part was unreachable by name.

This is the remaining half of the bug behind #21490. The file body
itself (Field.value) was already switched to a raw slice in #27483,
which fixed the 2 GiB length truncation, but name/filename/content_type
were still u32-indexed.

Switch all Field slices to raw []const u8 and drop the subslicer. Also
remove a leftover @as(u32, @intcast(bytes.len)) in ArrayBuffer.fromBytes
whose target fields are usize — it panicked on >4 GiB buffers in debug
builds on the client-side FormData serialization path.

Fixes #21490
robobun added a commit that referenced this pull request May 27, 2026
The multipart parser stored part name, filename, and content-type as
bun.Semver.String, which packs offset and length into 32-bit fields
(with bit 31 of length stolen as a tag). For any part whose header sat
past 4 GiB in the request body, the offset wrapped and the parser read
garbage — field names came back as bytes from the middle of the
preceding file body and the part was unreachable by name.

This is the remaining half of the bug behind #21490. The file body
itself (Field.value) was already switched to a raw slice in #27483,
which fixed the 2 GiB length truncation, but name/filename/content_type
were still u32-indexed.

Switch all Field slices to raw []const u8 and drop the subslicer. Also
remove a leftover @as(u32, @intcast(bytes.len)) in ArrayBuffer.fromBytes
whose target fields are usize — it panicked on >4 GiB buffers in debug
builds on the client-side FormData serialization path.

Fixes #21490
robobun added a commit that referenced this pull request Jun 5, 2026
The multipart parser stored part name, filename, and content-type as
bun.Semver.String, which packs offset and length into 32-bit fields
(with bit 31 of length stolen as a tag). For any part whose header sat
past 4 GiB in the request body, the offset wrapped and the parser read
garbage — field names came back as bytes from the middle of the
preceding file body and the part was unreachable by name.

This is the remaining half of the bug behind #21490. The file body
itself (Field.value) was already switched to a raw slice in #27483,
which fixed the 2 GiB length truncation, but name/filename/content_type
were still u32-indexed.

Switch all Field slices to raw []const u8 and drop the subslicer. Also
remove a leftover @as(u32, @intcast(bytes.len)) in ArrayBuffer.fromBytes
whose target fields are usize — it panicked on >4 GiB buffers in debug
builds on the client-side FormData serialization path.

Fixes #21490
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Request.formData truncates tiny multipart binary files at first null byte

2 participants