s3: protect stream error JSValue before handing it to ByteStream#29822
s3: protect stream error JSValue before handing it to ByteStream#29822robobun wants to merge 7 commits into
Conversation
|
Warning Rate limit exceeded
To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (1)
WalkthroughUnified stream error handling: weak JS error references were replaced with strong conversions, explicit lifecycle cleanup (deinit) was added for stream errors, and Readable/Byte/Resumable stream error paths now ensure JS errors remain alive. A new S3 GC test verifies error objects persist until streams are finalized. Changes
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Review rate limit: 0/5 reviews remaining, refill in 5 minutes and 24 seconds. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/s3/client.zig`:
- Around line 654-659: The code protects err_value (from err.toJS) before
calling readable.ptr.Bytes.onData, but if onData fails the protected JS cell is
never unprotected and leaks; update the error handling around
readable.ptr.Bytes.onData so that on any error path you call
err_value.unprotect() (or equivalent) before propagating the error—i.e., wrap
the try in a catch that unprotects err_value and then returns the error, or use
a defer/unprotect pattern scoped so successful onData transfers ownership while
failures unprotect the value; reference err_value, err.toJS,
err_value.protect(), readable.ptr.Bytes.onData and the opaqueCallback/error
propagation path when making the change.
In `@test/js/bun/s3/s3-stream-error-gc.test.ts`:
- Around line 45-47: Remove the brittle strict stderr assertion for the JS
subprocess test: drop or relax the expect(stderr).toBe("") check and rely on
validating stdout and exitCode instead; keep expect(stdout).toBe("") and
expect(exitCode).toBe(0) (or change stderr assertion to a non-failing check like
allowing stderr to be non-empty for bunExe() -e runs with bunEnv), so the test
no longer fails due to debug/ASAN warning lines appearing on stderr.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 19f2fe1f-f8c2-4c58-aeee-a9581813fe33
📒 Files selected for processing (2)
src/s3/client.zigtest/js/bun/s3/s3-stream-error-gc.test.ts
| expect(stderr).toBe(""); | ||
| expect(stdout).toBe(""); | ||
| expect(exitCode).toBe(0); |
There was a problem hiding this comment.
Remove strict empty-stderr assertion for this subprocess crash test
For bunExe() -e JS subprocesses, debug/ASAN builds can emit a known warning on stderr, so expect(stderr).toBe("") can fail even when behavior is correct.
Suggested fix
- expect(stderr).toBe("");
expect(stdout).toBe("");
+ if (exitCode !== 0) {
+ console.error(stderr);
+ }
expect(exitCode).toBe(0);Based on learnings: In Bun JS/TS tests that spawn bunExe() with bunEnv, strict expect(stderr).toBe("") is brittle for JS-running subprocesses because ASAN warning lines may appear; crash regressions should be guarded by stdout + exitCode.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@test/js/bun/s3/s3-stream-error-gc.test.ts` around lines 45 - 47, Remove the
brittle strict stderr assertion for the JS subprocess test: drop or relax the
expect(stderr).toBe("") check and rely on validating stdout and exitCode
instead; keep expect(stdout).toBe("") and expect(exitCode).toBe(0) (or change
stderr assertion to a non-failing check like allowing stderr to be non-empty for
bunExe() -e runs with bunEnv), so the test no longer fails due to debug/ASAN
warning lines appearing on stderr.
|
CI failures are pre-existing Windows flakes unrelated to this change:
The new |
Jarred-Sumner
left a comment
There was a problem hiding this comment.
can you make it a strong JSValue so we know whether it's protected or not instead of having to remember to protec/tunprotect?
|
Done — Verified: 0/30 crashes on the repro, regression test green, |
There was a problem hiding this comment.
Thanks — all three lifecycle gaps I flagged are now closed (append-overwrite, deinit hoist, RequestContext.onPipe). I'm not auto-approving because the fix grew from a call-site protect into a StreamError.JSValue → jsc.Strong.Optional ownership refactor across the whole ByteStream consumer set (fetch, S3, Bun.serve piping, body buffering), and the Windows serve-stream-reject-flush-leak assertion on 13bc6ca is in-domain enough that I'd want a human to confirm it's unrelated before merge.
Extended reasoning...
Overview
What started as a one-line .protect() at the S3 call site evolved into changing streams.Result.StreamError.JSValue from a raw jsc.JSValue to jsc.Strong.Optional, removing the WeakJSValue variant and toJSWeak/WasStrong, and updating every producer (s3/client.zig, FetchTasklet.zig, Body.zig) and every consumer (ByteStream.append/onData/deinit/toBufferedValue, ResumableSink.onStreamPipe, Body.ValueBufferer.onStreamPipe, RequestContext.onPipe, Result.fulfillPromise/toJS, BufferAction.reject, ReadableStream.processResult). 8 source files + 1 new GC regression test.
Prior review status
I left three inline comments on earlier revisions:
append()overwrotepending.resultwithout deinit +onCancel/deinitguard skipped the release → addressed bypending.result.deinit()before both.errassignments and by hoistingpending.result.deinit()out of theif (!this.done)guard inByteStream.deinit().- Same path still open after the Strong rework → addressed by the same hoist in 13bc6ca.
RequestContext.onPipe(thirdPipe.Wrapconsumer) not updated → addressed in 2663735 withif (stream_ == .err) stream_.err.deinit();in the defer.
I grepped for all Pipe.Wrap sites and confirmed only the three exist; all now release the Strong. The bug-hunting pass on the current head found nothing.
Security risks
None — this is a GC-rooting/lifecycle change. No auth, parsing, or trust-boundary code touched.
Level of scrutiny
High. Strong.Optional heap-allocates an Impl per .create(); the new contract requires every path that receives or abandons a .err.JSValue to call deinit() (or toJS(), which now consumes). Missing one is a permanent JSC root + heap leak; double-consuming is a UAF/double-free. The change threads through fetch body error delivery, S3 streaming, Bun.serve proxied-body piping, and Response.body buffering — all production hot paths.
Other factors
- CI on 13bc6ca shows
test/js/bun/http/serve-stream-reject-flush-leak.test.tsfailing on three Windows targets withinternal assertion failure: allocators do not match. That test (added in #29865) exercises exactly the stream-reject path this PR rewires. The author's earlier flake-dismissal comment predates this failure appearing. CI for 2663735 isn't reflected in the timeline yet. - CodeRabbit's note on
expect(stderr).toBe("")in the new test being brittle under ASAN remains unaddressed (minor). BufferAction.rejectandResult.toJSnow take theStreamErrorby value, copy it, anddeinitthe copy's Strong — correct as long as no caller subsequently deinit's the original; I traced the reachable call sites and didn't find one, but the by-value-copy-then-free-shared-heap-pointer pattern is subtle enough to merit a second pair of eyes.
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/bun.js/webcore/Body.zig (1)
1561-1583:⚠️ Potential issue | 🟠 MajorDon't drop terminal byte-stream errors here.
Line 1582 still reports
nulltoonFinishedBufferingeven whenstream_ == .err. With this path, buffered body consumers will resolve with truncated bytes instead of rejecting on stream failure; the newdeinit()only makes that loss permanent. Thread the error through the callback before releasing it.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/bun.js/webcore/Body.zig` around lines 1561 - 1583, onStreamPipe currently calls sink.onFinishedBuffering(sink.ctx, bytes, null, true) even when stream_ == .err, causing terminal stream errors to be dropped; before deinitializing stream_.err you must pass the error through to onFinishedBuffering so buffered consumers reject instead of receiving truncated bytes. Update onStreamPipe to detect stream_ == .err (or !stream.isDone() with .err state) and call sink.onFinishedBuffering(sink.ctx, bytes, stream_.err, true) (or otherwise forward the captured error value) before the defer deinit runs, ensuring stream_.err is not released prior to being reported.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/bun.js/webcore/streams.zig`:
- Around line 219-243: StreamError now holds a jsc.Strong.Optional making it
single-owner; the code currently copies the union (e.g., using `var that =
this.*`, `var err_ = err`, or passing `err` by value) and then calls
`.deinit()`/`.toJS()` which consumes the strong reference out of the copy
leaving the original with a stale pointer and risking double-free/UAF. Fix by
moving the union out of the owning container and operating on the moved value
(use pointer receivers like `*StreamError`/`*Result`, take `this.result` by
value before handing to callbacks, or otherwise replace the field with a neutral
sentinel before calling `toJS()`/`deinit()`), and update call sites that do
shallow copies (mentions: StreamError.toJS, StreamError.deinit, Result handling
at the callback handoff sites) so ownership is transferred exactly once.
---
Outside diff comments:
In `@src/bun.js/webcore/Body.zig`:
- Around line 1561-1583: onStreamPipe currently calls
sink.onFinishedBuffering(sink.ctx, bytes, null, true) even when stream_ == .err,
causing terminal stream errors to be dropped; before deinitializing stream_.err
you must pass the error through to onFinishedBuffering so buffered consumers
reject instead of receiving truncated bytes. Update onStreamPipe to detect
stream_ == .err (or !stream.isDone() with .err state) and call
sink.onFinishedBuffering(sink.ctx, bytes, stream_.err, true) (or otherwise
forward the captured error value) before the defer deinit runs, ensuring
stream_.err is not released prior to being reported.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: d93f9136-8086-4587-945d-ff50851cf176
📒 Files selected for processing (9)
src/bun.js/api/server/RequestContext.zigsrc/bun.js/webcore/Body.zigsrc/bun.js/webcore/ByteStream.zigsrc/bun.js/webcore/ReadableStream.zigsrc/bun.js/webcore/ResumableSink.zigsrc/bun.js/webcore/fetch/FetchTasklet.zigsrc/bun.js/webcore/streams.zigsrc/s3/client.zigtest/js/bun/s3/s3-stream-error-gc.test.ts
| pub const StreamError = union(enum) { | ||
| Error: Syscall.Error, | ||
| AbortReason: jsc.CommonAbortReason, | ||
| JSValue: jsc.Strong.Optional, | ||
|
|
||
| // TODO: use an explicit jsc.Strong.Optional here. | ||
| JSValue: jsc.JSValue, | ||
| WeakJSValue: jsc.JSValue, | ||
|
|
||
| const WasStrong = enum { | ||
| Strong, | ||
| Weak, | ||
| }; | ||
| pub fn deinit(this: *StreamError) void { | ||
| if (this.* == .JSValue) { | ||
| this.JSValue.deinit(); | ||
| } | ||
| } | ||
|
|
||
| pub fn toJSWeak(this: *const @This(), globalObject: *jsc.JSGlobalObject) struct { jsc.JSValue, WasStrong } { | ||
| /// Returns the error as a JSValue. For the `.JSValue` variant this | ||
| /// releases the strong reference, so the returned value is only | ||
| /// rooted by the caller's stack. Safe to call `deinit` afterwards. | ||
| pub fn toJS(this: *StreamError, globalObject: *jsc.JSGlobalObject) jsc.JSValue { | ||
| return switch (this.*) { | ||
| .Error => |err| { | ||
| return .{ err.toJS(globalObject) catch return .{ .zero, WasStrong.Weak }, WasStrong.Weak }; | ||
| }, | ||
| .JSValue => .{ this.JSValue, WasStrong.Strong }, | ||
| .WeakJSValue => .{ this.WeakJSValue, WasStrong.Weak }, | ||
| .AbortReason => |reason| { | ||
| const value = reason.toJS(globalObject); | ||
| return .{ value, WasStrong.Weak }; | ||
| .Error => |err| err.toJS(globalObject) catch .zero, | ||
| .JSValue => |*strong| { | ||
| const value = strong.get() orelse .js_undefined; | ||
| strong.deinit(); | ||
| return value; | ||
| }, | ||
| .AbortReason => |reason| reason.toJS(globalObject), | ||
| }; | ||
| } |
There was a problem hiding this comment.
StreamError.JSValue is now owning, but this file still consumes copied aliases.
Making JSValue a jsc.Strong.Optional changes StreamError/Result from “cheaply copyable” to “single-owner in practice”. The new toJS()/deinit() helpers release that strong, but this file still does shallow copies before consuming it — e.g. Line 551 (var that = this.*; that.deinit()), Line 590 (var err_ = err; err_.toJS(...)), and Line 1553 (var err_ = err). Those paths free the shared strong through the copy while the original payload still holds the stale pointer, so a later deinit() or second conversion on the original becomes a double-release/UAF.
Please make the error path consume the original storage instead of a copy (for example, use *Result / *StreamError receivers and move this.result out before handing it off).
Possible direction
-pub fn toJS(this: *const Result, globalThis: *JSGlobalObject) bun.JSError!JSValue {
+pub fn toJS(this: *Result, globalThis: *JSGlobalObject) bun.JSError!JSValue {
...
- .err => |err| {
- var err_ = err;
- const js_err = err_.toJS(globalThis);
+ .err => |*err| {
+ const js_err = err.toJS(globalThis);
js_err.ensureStillAlive();
return jsc.JSPromise.rejectedPromise(globalThis, js_err).toJS();
},- pub fn reject(this: *BufferAction, global: *jsc.JSGlobalObject, err: Result.StreamError) bun.JSTerminated!void {
- var err_ = err;
- return this.swap().reject(global, err_.toJS(global));
+ pub fn reject(this: *BufferAction, global: *jsc.JSGlobalObject, err: *Result.StreamError) bun.JSTerminated!void {
+ const value = err.toJS(global);
+ value.ensureStillAlive();
+ return this.swap().reject(global, value);
}For callback handoff sites, move the Result out of the owning field first so ownership transfers exactly once.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/bun.js/webcore/streams.zig` around lines 219 - 243, StreamError now holds
a jsc.Strong.Optional making it single-owner; the code currently copies the
union (e.g., using `var that = this.*`, `var err_ = err`, or passing `err` by
value) and then calls `.deinit()`/`.toJS()` which consumes the strong reference
out of the copy leaving the original with a stale pointer and risking
double-free/UAF. Fix by moving the union out of the owning container and
operating on the moved value (use pointer receivers like
`*StreamError`/`*Result`, take `this.result` by value before handing to
callbacks, or otherwise replace the field with a neutral sentinel before calling
`toJS()`/`deinit()`), and update call sites that do shallow copies (mentions:
StreamError.toJS, StreamError.deinit, Result handling at the callback handoff
sites) so ownership is transferred exactly once.
|
CI failures on 2663735 are all pre-existing on merged PRs #29904 and #29907:
The new |
StreamError.JSValue is the strong variant - Result.deinit, processResult and fulfillPromise all unprotect it when consuming the pending result. The S3 download stream callback was storing the freshly-allocated error instance without protecting it, so if a GC ran before the stream was read or finalized the stored pointer went stale and the ByteStream finalizer dereferenced a dead cell.
Protecting at the S3 call site would leak when onData takes the buffer_action or already-done paths, which consume the error without unprotecting. Move the protect() into append() where the value is actually parked in pending.result across GC; deinit/processResult unprotect it from there.
The raw-JSValue + manual protect/unprotect contract was easy to get wrong (producers forgot to protect, some consume paths forgot to unprotect). Use a real Strong so the type carries the ownership. toJSWeak + WasStrong collapse into a single consuming toJS(), and the unused WeakJSValue variant is dropped.
… unconditionally in deinit BufferAction.reject copies the StreamError internally so onCancel's existing call site stays unchanged. Hoist pending.result.deinit() out of the !done guard in ByteStream.deinit so a parked error Strong is released even when onCancel already marked the stream done without a pending pull.
Third Pipe.Wrap consumer; now matches Body.onStreamPipe and ResumableSink.onStreamPipe so a .err.JSValue Strong handed to a proxied Bun.serve response body is released after the response is ended.
… timing Counting surviving Error instances after GC is deterministic on both debug and release builds: 300/300 retained with the fix, 0/300 without. Also checks they are released once the streams are dropped.
3bf320f to
3003048
Compare
Fuzzilli found a flaky null-VM dereference in
MarkedBlock::vm()triggered by:What happens
When
S3Client.file(...).stream()fails to sign the request (e.g. no credentials),S3DownloadStreamWrapper.callbackis invoked synchronously with the error. It creates a JS error instance and pushes it into theByteStreamasStreamError.JSValue, which lands inpending.resultviaappend().StreamError.JSValueis the strong variant —Result.deinit(),processResult()andfulfillPromise()all.unprotect()it when the pending result is consumed — but the S3 callback never.protect()ed it. If a GC ran before the stream was read or finalized, the error object was swept whilepending.resultstill held a raw pointer to it, and the ByteStream finalizer would dereference a dead cell.Fix
Protect the error value before handing it to
onData, matching the contract the consumers already rely on.Repro
Before: ~90% crash rate on debug+asan. After: 0/50.