blob: stop aliasing allocated content_type in dupeWithContentType#29910
Conversation
The ref-count refactor in #23015 replaced the old `duped.allocator != null` guard with `duped.isHeapAllocated()`, but also moved `setNotHeapAllocated()` above the check, so the guard is always false and both branches that were supposed to deep-copy or neutralize a heap-allocated content_type became dead code. As a result, every dupe of a Blob with content_type_allocated=true ended up pointing at the same heap allocation as the source, with both sides claiming ownership. Freeing one side (e.g. via `file.write(data, {type})`) left the other with a dangling pointer, surfacing as a use-after-free when the Response body's content_type is later read into headers. Drop the isHeapAllocated() guard entirely — whether the *source* Blob lives on the heap has no bearing on whether its content_type buffer needs copying. If content_type_allocated is set, we must always take our own copy (or fall back to a static mime string) to avoid aliased ownership.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (2)
WalkthroughBlob cleanup semantics adjusted: deinit now releases only shared blob store references; Blob Changes
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Review rate limit: 3/5 reviews remaining, refill in 14 minutes and 46 seconds. Comment |
|
Updated 6:49 AM PT - Apr 29th, 2026
❌ @robobun, your commit 58a78af has 4 failures in
🧪 To try this PR locally: bunx bun-pr 29910That installs a local version of the PR into your bun-29910 --bun |
The previous commit made the `!include_content_type` branch reachable, which surfaced its pre-existing "clear to empty string on registry miss" fallback. That drops any content_type the mime registry doesn't know exactly — most importantly the `multipart/form-data; boundary=...` string attached to FormData bodies — so `new Response(formData).clone()` lost its Content-Type header when headers hadn't been materialized on the original yet. Since every caller of dupe()/dupeWithContentType() needs the content_type preserved, collapse both paths to an unconditional deep copy and ignore the `include_content_type` parameter. Also drop the explicit 60s test timeout per test conventions.
Now that dupe() allocates a fresh content_type copy, those copies were leaked because Blob.deinit() never freed content_type. Adding the free exposed several places that held a bitwise-copied Blob with content_type_allocated=true alongside the live owner: - fromJSWithoutDeferGC move path: deep-copy name/content_type into the moved-out Blob so the source JS Blob keeps sole ownership of its buffers (and fix the BuildArtifact arm which never actually moved). - getSliceFrom(): free the dupe's content_type copy before overwriting it with the slice's own type. - doWrite/getWriter: clear content_type_allocated after freeing so a registry-resolved static mime string isn't later freed by deinit(). - BlobOrStringOrBuffer.deinitAndUnprotect: the .blob arm is a raw view of a live JS Blob; only deref the store, matching deinit().
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/bun.js/webcore/Blob.zig`:
- Around line 3571-3575: The deinit path currently always frees content_type
with bun.default_allocator even though fromDOMFormData() may have allocated the
multipart boundary using a different allocator; change the Blob struct/logic so
you record which allocator was used for content_type (e.g., store a
content_type_allocator or a flag with the allocator) when setting content_type
in fromDOMFormData(), and in deinit()/the block that checks
content_type_allocated free using that recorded allocator instead of
unconditionally calling bun.default_allocator; ensure this uses the same symbol
names (content_type, content_type_allocated, fromDOMFormData, deinit) so the
allocator-mismatch assert is avoided and memory is freed with the allocator that
created it.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 17ae8ea4-1b16-4cbc-a799-2962896ccec4
📒 Files selected for processing (3)
src/bun.js/node/types.zigsrc/bun.js/webcore/Blob.zigtest/js/web/fetch/blob.test.ts
Blob.deinit() frees content_type with bun.default_allocator, so the allocation must use the same allocator. fromDOMFormData() took an allocator parameter and used it for the boundary string; the only caller passes bun.default_allocator so there was no mismatch in practice, but pin the allocation explicitly so future callers can't trip the allocator-mismatch assert.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/bun.js/webcore/Blob.zig`:
- Around line 3574-3578: The blob's MIME buffer (content_type) is freed by
finalizing the source Blob but async operations like doReadFile() and
S3BlobDownloadTask.init() make raw bitwise copies that leave content_type
dangling; implement an owning copy/cleanup API on the Blob (add a dupe() that
deep-copies content_type and any allocated fields and a deinit() that frees them
using content_type_allocated logic) and replace places where the code does
struct copies (references: doReadFile(), S3BlobDownloadTask.init(), and any
callers like toFormDataWithBytes()) to call blob.dupe() when capturing for async
work and call blob.deinit() during teardown to avoid use-after-free.
- Around line 3536-3551: The dupeWithContentType/dupe() call now allocates a
duplicated content_type that detach() alone doesn't free, so update
writeFileInternal() (and any same-file temporaries) to perform full teardown:
replace the use of detach() on source_blob and ensure destination_blob is
deinitialized (call the Blob deinit/free method used in this file, e.g.
blob.deinit() or the existing deinit symbol) after use and also release the
store reference if needed so the duplicated content_type buffer is freed and no
leak occurs.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: ce8d0f88-01a3-440c-bf5f-42c9ae0c526f
📒 Files selected for processing (1)
src/bun.js/webcore/Blob.zig
doReadFile() and S3BlobDownloadTask stashed a raw `this.*` copy of the source JS Blob for use in the async completion callback. Now that Blob.deinit() frees an allocated content_type, the source being GC'd before the callback runs would leave the stashed copy's content_type pointing at freed memory — reachable via e.g. toFormDataWithBytes -> getFormDataEncoding -> getContentType. Use dupe() so the context owns its own store ref, name ref and content_type copy, and deinit() it on teardown.
There was a problem hiding this comment.
All prior findings are resolved and the latest revision looks correct, but this rewires content_type ownership across enough Blob call sites (deinit, dupe, slice, doReadFile, S3 download, fromJS move, BlobOrStringOrBuffer) — with one acknowledged deferred leak in writeFileInternal — that it's worth a human pass before merging.
Extended reasoning...
Overview
This PR fixes a UAF where Blob.dupeWithContentType left a bitwise-copied dupe aliasing the source's heap-allocated content_type (a regression from #23015's dead-guard refactor). The fix evolved across four rounds:
- Make
dupeWithContentTypedeep-copy any allocatedcontent_typeunconditionally (theinclude_content_typeparameter is now a no-op). - Add the corresponding free to
Blob.deinit()and free-before-overwrite ingetSliceFrom(). - Close newly-exposed double-ownership at sites that held a bitwise copy alongside the live owner:
fromJSWithoutDeferGC(move path now deep-copies name/content_type; BuildArtifact arm nowdupe()s),doWrite/getWriter(clearcontent_type_allocatedafter in-place free),BlobOrStringOrBuffer.deinitAndUnprotect(deref store only, since.blobis a raw view). - Replace raw
this.*struct copies indoReadFile()andS3BlobDownloadTask.init()with owningdupe()+deinit()so async contexts outliving the source can't dangle.
Two regression tests are added (UAF repro via Bun.file({type}) → Response → write({type}), and FormData boundary preservation across Response.clone()).
Security risks
The original bug is a use-after-free reachable from user JS (Bun.file + Response + file.write), so the fix is security-relevant. The new code introduces no injection/auth/data-exposure surface; the risk profile is purely memory-safety (double-free / UAF / leak) in the rewired ownership paths. ASAN stress runs reported clean.
Level of scrutiny
High. This is a non-mechanical change to a core ownership contract used by Body/Response/Request, Bun.write, blob.slice(), ObjectURL, S3, and async file reads. Each review round so far surfaced a real follow-on issue (behavior regression on Response.clone(), per-dupe leak, dangling async copies), which is the signature of a change whose blast radius is hard to bound by inspection alone. The author has explicitly deferred one remaining bounded leak in writeFileInternal to a follow-up because its per-branch ownership is path-dependent.
Other factors
- All my prior inline findings and CodeRabbit's are resolved; the bug-hunting system found nothing on the current revision.
- CI on the previous commit showed unrelated pre-existing flakes (
serve-stream-reject-flush-leak,bun-create,dev-and-prod); no failures attributable to this change were called out. - The
doReadFilechange adds adefer blob.deinit()inNewReadFileHandler.runaftertakeOwnership(); combined with.context = this.dupe(), the store now gets one extra ref/deref vs. before, which is correct but worth a maintainer's eye since it interacts withStorerefcounting on the hot file-read path.
Given the breadth, the iterative fix history, and the acknowledged follow-up, this should not be auto-approved.
…pe-content-type-uaf
…pe-content-type-uaf
…en-sh#29910) ## What `Blob.dupeWithContentType` guarded its content_type handling on `duped.isHeapAllocated()` immediately *after* calling `duped.setNotHeapAllocated()`, so both branches were dead. When the source Blob's `content_type` is heap-allocated, the bitwise-copied dupe aliased the same allocation while both sides had `content_type_allocated == true`. This is a regression from oven-sh#23015: the pre-refactor code checked `duped.allocator != null` *before* clearing it at the end of the function; the refactor moved the clear to the top but left the (renamed) guard in place. ## Repro ```js const file = Bun.file(path, { type: "application/x-custom-type-not-in-registry-abcdefghijklm" }); const response = new Response(file); // body holds a dupe that aliases file.content_type await file.write("hello", { type: "application/x-..." }); // frees file.content_type response.headers.get("content-type"); // reads freed memory ``` On ASAN builds: ``` ==716==ERROR: AddressSanitizer: use-after-poison on address 0x71df454301c0 oven-sh#1 in Zig::toStringCopy(ZigString) helpers.h:217 oven-sh#2 in WebCore__FetchHeaders__put bindings.cpp:2082 oven-sh#5 in bun.js.webcore.Response.getOrCreateHeaders Response.zig:358 ``` On release builds the freed slot gets reused and the read produces garbage: ``` TypeError: Header '25' has invalid value: 'ion/x-custom-type-not-in-registry-abcdefghijklm' ``` ## Fix Drop the `isHeapAllocated()` guard and always deep-copy an allocated `content_type` in `dupeWithContentType`. The old `!include_content_type` branch's "resolve to static mime or fall back to empty" is gone — it would have dropped FormData's `multipart/form-data; boundary=...` (and any non-registry type) on `Response.clone()`, and the branch itself was marked `// TODO: fix this / this is a bug`. The `include_content_type` parameter is now a no-op. Since every dupe now owns its `content_type` copy, `Blob.deinit()` frees it. That in turn required closing a few places that held a bitwise-copied Blob alongside the live owner: - `fromJSWithoutDeferGC` `move=true`: deep-copy `name`/`content_type` into the moved-out value so the source JS Blob keeps sole ownership; the BuildArtifact arm now `dupe()`s (its "move" only nulled the store on a local copy). - `getSliceFrom()`: free the dupe's copy before overwriting it with the slice's own type. - `doWrite`/`getWriter`: clear `content_type_allocated` after the in-place free so a registry-resolved static string isn't later freed by `deinit()`. - `BlobOrStringOrBuffer.deinitAndUnprotect`: only deref the store (matching its `deinit()`) since `.blob` is a raw view of a live JS Blob. ## Verified - `bun bd test test/js/web/fetch/blob.test.ts` — 16/16 pass - New UAF test fails on both debug/ASAN (use-after-poison) and system bun (garbage header) without the fix, passes with it - New clone test guards against dropping FormData's boundary on `Response.clone()` - ASAN stress: 1k× `Response.clone`/`blob.slice`/`createObjectURL`+revoke/`new Response([blob])`/`write({type})` — no double-free - RSS is flat across 50k × 1KB-type `Response.clone()` and `blob.slice()` --------- Co-authored-by: robobun <robobun@users.noreply.github.com> Co-authored-by: Jarred Sumner <jarred@jarredsumner.com>
What
Blob.dupeWithContentTypeguarded its content_type handling onduped.isHeapAllocated()immediately after callingduped.setNotHeapAllocated(), so both branches were dead. When the source Blob'scontent_typeis heap-allocated, the bitwise-copied dupe aliased the same allocation while both sides hadcontent_type_allocated == true.This is a regression from #23015: the pre-refactor code checked
duped.allocator != nullbefore clearing it at the end of the function; the refactor moved the clear to the top but left the (renamed) guard in place.Repro
On ASAN builds:
On release builds the freed slot gets reused and the read produces garbage:
Fix
Drop the
isHeapAllocated()guard and always deep-copy an allocatedcontent_typeindupeWithContentType. The old!include_content_typebranch's "resolve to static mime or fall back to empty" is gone — it would have dropped FormData'smultipart/form-data; boundary=...(and any non-registry type) onResponse.clone(), and the branch itself was marked// TODO: fix this / this is a bug. Theinclude_content_typeparameter is now a no-op.Since every dupe now owns its
content_typecopy,Blob.deinit()frees it. That in turn required closing a few places that held a bitwise-copied Blob alongside the live owner:fromJSWithoutDeferGCmove=true: deep-copyname/content_typeinto the moved-out value so the source JS Blob keeps sole ownership; the BuildArtifact arm nowdupe()s (its "move" only nulled the store on a local copy).getSliceFrom(): free the dupe's copy before overwriting it with the slice's own type.doWrite/getWriter: clearcontent_type_allocatedafter the in-place free so a registry-resolved static string isn't later freed bydeinit().BlobOrStringOrBuffer.deinitAndUnprotect: only deref the store (matching itsdeinit()) since.blobis a raw view of a live JS Blob.Verified
bun bd test test/js/web/fetch/blob.test.ts— 16/16 passResponse.clone()Response.clone/blob.slice/createObjectURL+revoke/new Response([blob])/write({type})— no double-freeResponse.clone()andblob.slice()