Skip to content

blob: stop aliasing allocated content_type in dupeWithContentType#29910

Merged
Jarred-Sumner merged 8 commits into
mainfrom
farm/bf5f0ad5/blob-dupe-content-type-uaf
Apr 29, 2026
Merged

blob: stop aliasing allocated content_type in dupeWithContentType#29910
Jarred-Sumner merged 8 commits into
mainfrom
farm/bf5f0ad5/blob-dupe-content-type-uaf

Conversation

@robobun

@robobun robobun commented Apr 29, 2026

Copy link
Copy Markdown
Collaborator

What

Blob.dupeWithContentType guarded its content_type handling on duped.isHeapAllocated() immediately after calling duped.setNotHeapAllocated(), so both branches were dead. When the source Blob's content_type is heap-allocated, the bitwise-copied dupe aliased the same allocation while both sides had content_type_allocated == true.

This is a regression from #23015: the pre-refactor code checked duped.allocator != null before clearing it at the end of the function; the refactor moved the clear to the top but left the (renamed) guard in place.

Repro

const file = Bun.file(path, { type: "application/x-custom-type-not-in-registry-abcdefghijklm" });
const response = new Response(file);               // body holds a dupe that aliases file.content_type
await file.write("hello", { type: "application/x-..." }); // frees file.content_type
response.headers.get("content-type");              // reads freed memory

On ASAN builds:

==716==ERROR: AddressSanitizer: use-after-poison on address 0x71df454301c0
    #1 in Zig::toStringCopy(ZigString) helpers.h:217
    #2 in WebCore__FetchHeaders__put bindings.cpp:2082
    #5 in bun.js.webcore.Response.getOrCreateHeaders Response.zig:358

On release builds the freed slot gets reused and the read produces garbage:

TypeError: Header '25' has invalid value: 'ion/x-custom-type-not-in-registry-abcdefghijklm'

Fix

Drop the isHeapAllocated() guard and always deep-copy an allocated content_type in dupeWithContentType. The old !include_content_type branch's "resolve to static mime or fall back to empty" is gone — it would have dropped FormData's multipart/form-data; boundary=... (and any non-registry type) on Response.clone(), and the branch itself was marked // TODO: fix this / this is a bug. The include_content_type parameter is now a no-op.

Since every dupe now owns its content_type copy, Blob.deinit() frees it. That in turn required closing a few places that held a bitwise-copied Blob alongside the live owner:

  • fromJSWithoutDeferGC move=true: deep-copy name/content_type into the moved-out value so the source JS Blob keeps sole ownership; the BuildArtifact arm now dupe()s (its "move" only nulled the store on a local copy).
  • getSliceFrom(): free the dupe's copy before overwriting it with the slice's own type.
  • doWrite/getWriter: clear content_type_allocated after the in-place free so a registry-resolved static string isn't later freed by deinit().
  • BlobOrStringOrBuffer.deinitAndUnprotect: only deref the store (matching its deinit()) since .blob is a raw view of a live JS Blob.

Verified

  • bun bd test test/js/web/fetch/blob.test.ts — 16/16 pass
  • New UAF test fails on both debug/ASAN (use-after-poison) and system bun (garbage header) without the fix, passes with it
  • New clone test guards against dropping FormData's boundary on Response.clone()
  • ASAN stress: 1k× Response.clone/blob.slice/createObjectURL+revoke/new Response([blob])/write({type}) — no double-free
  • RSS is flat across 50k × 1KB-type Response.clone() and blob.slice()

The ref-count refactor in #23015 replaced the old `duped.allocator != null`
guard with `duped.isHeapAllocated()`, but also moved `setNotHeapAllocated()`
above the check, so the guard is always false and both branches that were
supposed to deep-copy or neutralize a heap-allocated content_type became
dead code.

As a result, every dupe of a Blob with content_type_allocated=true ended up
pointing at the same heap allocation as the source, with both sides claiming
ownership. Freeing one side (e.g. via `file.write(data, {type})`) left the
other with a dangling pointer, surfacing as a use-after-free when the
Response body's content_type is later read into headers.

Drop the isHeapAllocated() guard entirely — whether the *source* Blob lives
on the heap has no bearing on whether its content_type buffer needs copying.
If content_type_allocated is set, we must always take our own copy (or fall
back to a static mime string) to avoid aliased ownership.
@coderabbitai

coderabbitai Bot commented Apr 29, 2026

Copy link
Copy Markdown
Contributor

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 6b458f70-368a-4f81-93e7-7f2a34371de7

📥 Commits

Reviewing files that changed from the base of the PR and between 664fa0e and fd37755.

📒 Files selected for processing (2)
  • src/bun.js/webcore/Blob.zig
  • src/bun.js/webcore/blob/read_file.zig

Walkthrough

Blob cleanup semantics adjusted: deinit now releases only shared blob store references; Blob content_type ownership is standardized and deep-copied when necessary; JS→Zig blob move paths avoid double-free by duplicating owned fields; two regression tests for Blob duplication and Response.clone were added.

Changes

Cohort / File(s) Summary
Blob store cleanup
src/bun.js/node/types.zig
BlobOrStringOrBuffer.deinitAndUnprotect now releases only the shared store reference via blob.store.deref() instead of calling blob.deinit().
Blob content_type ownership & duplication
src/bun.js/webcore/Blob.zig
Tightened content_type ownership: allocate/copy content_type for FormData and other paths, maintain content_type_allocated state, ensure deinit frees allocated content_type, make dupeWithContentType deep-copy when allocated, and remove move path that could double-free by duplicating owned fields on JS→Zig conversions.
Read-file Blob cleanup
src/bun.js/webcore/blob/read_file.zig
run() now defers a blob.deinit() after takeOwnership() to guarantee cleanup of the captured Blob in handler context.
Tests: blob duplication & clone
test/js/web/fetch/blob.test.ts
Added two regression tests: one verifying dupeWithContentType does not alias an allocated content_type during Bun.fileResponse duplication, and another ensuring Response.clone() preserves allocated content-type for FormData responses.
🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: fixing the aliasing bug in dupeWithContentType where allocated content_type was being shared between dupe and original.
Description check ✅ Passed The description comprehensively covers both required template sections with detailed technical context, reproduction steps, fix explanation, and verification results.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Review rate limit: 3/5 reviews remaining, refill in 14 minutes and 46 seconds.

Comment @coderabbitai help to get the list of available commands and usage tips.

@robobun

robobun commented Apr 29, 2026

Copy link
Copy Markdown
Collaborator Author
Updated 6:49 AM PT - Apr 29th, 2026

@robobun, your commit 58a78af has 4 failures in Build #49058 (All Failures):


🧪   To try this PR locally:

bunx bun-pr 29910

That installs a local version of the PR into your bun-29910 executable, so you can run:

bun-29910 --bun

Comment thread src/bun.js/webcore/Blob.zig Outdated
Comment thread test/js/web/fetch/blob.test.ts Outdated
The previous commit made the `!include_content_type` branch reachable, which
surfaced its pre-existing "clear to empty string on registry miss" fallback.
That drops any content_type the mime registry doesn't know exactly — most
importantly the `multipart/form-data; boundary=...` string attached to
FormData bodies — so `new Response(formData).clone()` lost its Content-Type
header when headers hadn't been materialized on the original yet.

Since every caller of dupe()/dupeWithContentType() needs the content_type
preserved, collapse both paths to an unconditional deep copy and ignore the
`include_content_type` parameter. Also drop the explicit 60s test timeout
per test conventions.
Comment thread src/bun.js/webcore/Blob.zig
robobun and others added 2 commits April 29, 2026 03:44
Now that dupe() allocates a fresh content_type copy, those copies were
leaked because Blob.deinit() never freed content_type. Adding the free
exposed several places that held a bitwise-copied Blob with
content_type_allocated=true alongside the live owner:

- fromJSWithoutDeferGC move path: deep-copy name/content_type into the
  moved-out Blob so the source JS Blob keeps sole ownership of its
  buffers (and fix the BuildArtifact arm which never actually moved).
- getSliceFrom(): free the dupe's content_type copy before overwriting
  it with the slice's own type.
- doWrite/getWriter: clear content_type_allocated after freeing so a
  registry-resolved static mime string isn't later freed by deinit().
- BlobOrStringOrBuffer.deinitAndUnprotect: the .blob arm is a raw view
  of a live JS Blob; only deref the store, matching deinit().

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/bun.js/webcore/Blob.zig`:
- Around line 3571-3575: The deinit path currently always frees content_type
with bun.default_allocator even though fromDOMFormData() may have allocated the
multipart boundary using a different allocator; change the Blob struct/logic so
you record which allocator was used for content_type (e.g., store a
content_type_allocator or a flag with the allocator) when setting content_type
in fromDOMFormData(), and in deinit()/the block that checks
content_type_allocated free using that recorded allocator instead of
unconditionally calling bun.default_allocator; ensure this uses the same symbol
names (content_type, content_type_allocated, fromDOMFormData, deinit) so the
allocator-mismatch assert is avoided and memory is freed with the allocator that
created it.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 17ae8ea4-1b16-4cbc-a799-2962896ccec4

📥 Commits

Reviewing files that changed from the base of the PR and between f62e43d and 1541600.

📒 Files selected for processing (3)
  • src/bun.js/node/types.zig
  • src/bun.js/webcore/Blob.zig
  • test/js/web/fetch/blob.test.ts

Comment thread src/bun.js/webcore/Blob.zig
Blob.deinit() frees content_type with bun.default_allocator, so the
allocation must use the same allocator. fromDOMFormData() took an
allocator parameter and used it for the boundary string; the only
caller passes bun.default_allocator so there was no mismatch in
practice, but pin the allocation explicitly so future callers can't
trip the allocator-mismatch assert.
Comment thread src/bun.js/webcore/Blob.zig

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/bun.js/webcore/Blob.zig`:
- Around line 3574-3578: The blob's MIME buffer (content_type) is freed by
finalizing the source Blob but async operations like doReadFile() and
S3BlobDownloadTask.init() make raw bitwise copies that leave content_type
dangling; implement an owning copy/cleanup API on the Blob (add a dupe() that
deep-copies content_type and any allocated fields and a deinit() that frees them
using content_type_allocated logic) and replace places where the code does
struct copies (references: doReadFile(), S3BlobDownloadTask.init(), and any
callers like toFormDataWithBytes()) to call blob.dupe() when capturing for async
work and call blob.deinit() during teardown to avoid use-after-free.
- Around line 3536-3551: The dupeWithContentType/dupe() call now allocates a
duplicated content_type that detach() alone doesn't free, so update
writeFileInternal() (and any same-file temporaries) to perform full teardown:
replace the use of detach() on source_blob and ensure destination_blob is
deinitialized (call the Blob deinit/free method used in this file, e.g.
blob.deinit() or the existing deinit symbol) after use and also release the
store reference if needed so the duplicated content_type buffer is freed and no
leak occurs.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: ce8d0f88-01a3-440c-bf5f-42c9ae0c526f

📥 Commits

Reviewing files that changed from the base of the PR and between 1541600 and 664fa0e.

📒 Files selected for processing (1)
  • src/bun.js/webcore/Blob.zig

Comment thread src/bun.js/webcore/Blob.zig
Comment thread src/bun.js/webcore/Blob.zig
doReadFile() and S3BlobDownloadTask stashed a raw `this.*` copy of the
source JS Blob for use in the async completion callback. Now that
Blob.deinit() frees an allocated content_type, the source being GC'd
before the callback runs would leave the stashed copy's content_type
pointing at freed memory — reachable via e.g. toFormDataWithBytes ->
getFormDataEncoding -> getContentType.

Use dupe() so the context owns its own store ref, name ref and
content_type copy, and deinit() it on teardown.

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All prior findings are resolved and the latest revision looks correct, but this rewires content_type ownership across enough Blob call sites (deinit, dupe, slice, doReadFile, S3 download, fromJS move, BlobOrStringOrBuffer) — with one acknowledged deferred leak in writeFileInternal — that it's worth a human pass before merging.

Extended reasoning...

Overview

This PR fixes a UAF where Blob.dupeWithContentType left a bitwise-copied dupe aliasing the source's heap-allocated content_type (a regression from #23015's dead-guard refactor). The fix evolved across four rounds:

  1. Make dupeWithContentType deep-copy any allocated content_type unconditionally (the include_content_type parameter is now a no-op).
  2. Add the corresponding free to Blob.deinit() and free-before-overwrite in getSliceFrom().
  3. Close newly-exposed double-ownership at sites that held a bitwise copy alongside the live owner: fromJSWithoutDeferGC (move path now deep-copies name/content_type; BuildArtifact arm now dupe()s), doWrite/getWriter (clear content_type_allocated after in-place free), BlobOrStringOrBuffer.deinitAndUnprotect (deref store only, since .blob is a raw view).
  4. Replace raw this.* struct copies in doReadFile() and S3BlobDownloadTask.init() with owning dupe() + deinit() so async contexts outliving the source can't dangle.

Two regression tests are added (UAF repro via Bun.file({type})Responsewrite({type}), and FormData boundary preservation across Response.clone()).

Security risks

The original bug is a use-after-free reachable from user JS (Bun.file + Response + file.write), so the fix is security-relevant. The new code introduces no injection/auth/data-exposure surface; the risk profile is purely memory-safety (double-free / UAF / leak) in the rewired ownership paths. ASAN stress runs reported clean.

Level of scrutiny

High. This is a non-mechanical change to a core ownership contract used by Body/Response/Request, Bun.write, blob.slice(), ObjectURL, S3, and async file reads. Each review round so far surfaced a real follow-on issue (behavior regression on Response.clone(), per-dupe leak, dangling async copies), which is the signature of a change whose blast radius is hard to bound by inspection alone. The author has explicitly deferred one remaining bounded leak in writeFileInternal to a follow-up because its per-branch ownership is path-dependent.

Other factors

  • All my prior inline findings and CodeRabbit's are resolved; the bug-hunting system found nothing on the current revision.
  • CI on the previous commit showed unrelated pre-existing flakes (serve-stream-reject-flush-leak, bun-create, dev-and-prod); no failures attributable to this change were called out.
  • The doReadFile change adds a defer blob.deinit() in NewReadFileHandler.run after takeOwnership(); combined with .context = this.dupe(), the store now gets one extra ref/deref vs. before, which is correct but worth a maintainer's eye since it interacts with Store refcounting on the hot file-read path.

Given the breadth, the iterative fix history, and the acknowledged follow-up, this should not be auto-approved.

@Jarred-Sumner Jarred-Sumner merged commit 0f42aed into main Apr 29, 2026
73 of 77 checks passed
@Jarred-Sumner Jarred-Sumner deleted the farm/bf5f0ad5/blob-dupe-content-type-uaf branch April 29, 2026 10:00
xhjkl pushed a commit to xhjkl/bun that referenced this pull request May 14, 2026
…en-sh#29910)

## What

`Blob.dupeWithContentType` guarded its content_type handling on
`duped.isHeapAllocated()` immediately *after* calling
`duped.setNotHeapAllocated()`, so both branches were dead. When the
source Blob's `content_type` is heap-allocated, the bitwise-copied dupe
aliased the same allocation while both sides had `content_type_allocated
== true`.

This is a regression from oven-sh#23015: the pre-refactor code checked
`duped.allocator != null` *before* clearing it at the end of the
function; the refactor moved the clear to the top but left the (renamed)
guard in place.

## Repro

```js
const file = Bun.file(path, { type: "application/x-custom-type-not-in-registry-abcdefghijklm" });
const response = new Response(file);               // body holds a dupe that aliases file.content_type
await file.write("hello", { type: "application/x-..." }); // frees file.content_type
response.headers.get("content-type");              // reads freed memory
```

On ASAN builds:
```
==716==ERROR: AddressSanitizer: use-after-poison on address 0x71df454301c0
    oven-sh#1 in Zig::toStringCopy(ZigString) helpers.h:217
    oven-sh#2 in WebCore__FetchHeaders__put bindings.cpp:2082
    oven-sh#5 in bun.js.webcore.Response.getOrCreateHeaders Response.zig:358
```

On release builds the freed slot gets reused and the read produces
garbage:
```
TypeError: Header '25' has invalid value: 'ion/x-custom-type-not-in-registry-abcdefghijklm'
```

## Fix

Drop the `isHeapAllocated()` guard and always deep-copy an allocated
`content_type` in `dupeWithContentType`. The old `!include_content_type`
branch's "resolve to static mime or fall back to empty" is gone — it
would have dropped FormData's `multipart/form-data; boundary=...` (and
any non-registry type) on `Response.clone()`, and the branch itself was
marked `// TODO: fix this / this is a bug`. The `include_content_type`
parameter is now a no-op.

Since every dupe now owns its `content_type` copy, `Blob.deinit()` frees
it. That in turn required closing a few places that held a
bitwise-copied Blob alongside the live owner:

- `fromJSWithoutDeferGC` `move=true`: deep-copy `name`/`content_type`
into the moved-out value so the source JS Blob keeps sole ownership; the
BuildArtifact arm now `dupe()`s (its "move" only nulled the store on a
local copy).
- `getSliceFrom()`: free the dupe's copy before overwriting it with the
slice's own type.
- `doWrite`/`getWriter`: clear `content_type_allocated` after the
in-place free so a registry-resolved static string isn't later freed by
`deinit()`.
- `BlobOrStringOrBuffer.deinitAndUnprotect`: only deref the store
(matching its `deinit()`) since `.blob` is a raw view of a live JS Blob.

## Verified

- `bun bd test test/js/web/fetch/blob.test.ts` — 16/16 pass
- New UAF test fails on both debug/ASAN (use-after-poison) and system
bun (garbage header) without the fix, passes with it
- New clone test guards against dropping FormData's boundary on
`Response.clone()`
- ASAN stress: 1k×
`Response.clone`/`blob.slice`/`createObjectURL`+revoke/`new
Response([blob])`/`write({type})` — no double-free
- RSS is flat across 50k × 1KB-type `Response.clone()` and
`blob.slice()`

---------

Co-authored-by: robobun <robobun@users.noreply.github.com>
Co-authored-by: Jarred Sumner <jarred@jarredsumner.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants