Fix double-free in S3 static methods when path is passed as string#28592
Fix double-free in S3 static methods when path is passed as string#28592robobun wants to merge 1 commit into
Conversation
|
Updated 6:09 PM PT - May 8th, 2026
❌ @robobun, your commit c48596e has 2 failures in
🧪 To try this PR locally: bunx bun-pr 28592That installs a local version of the PR into your bun-28592 --bun |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughUnconditionally deinitialize parsed PathLike/path resources across S3-related Zig codepaths (including constructInternal), fix Changes
🚥 Pre-merge checks | ✅ 2✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/bun.js/webcore/S3File.zig`:
- Around line 86-87: The constructor currently transfers path ownership only
after constructS3FileInternalStore returns, so if
constructS3FileWithS3Credentials does further fallible work after
Blob.Store.initS3() (e.g., getTruthyComptime("type"), file_type.toSlice()) under
errdefer store.deinit(), a thrown error causes a double-free because the outer
errdefer still sees .path; fix by performing the ownership handoff inside the
constructor (move setting of path_or_blob to .{ .blob = blob } into
constructS3FileInternalStore so the moved path is hidden from outer errdefer) or
alternatively move all fallible operations (getTruthyComptime,
file_type.toSlice, etc.) to before calling Blob.Store.initS3() so no fallible
work runs after initS3; update the same pattern at the other occurrences (lines
referenced around constructS3FileInternalStore,
constructS3FileWithS3Credentials, and the errdefer store.deinit usage).
In `@test/js/bun/s3-presign-double-free.test.ts`:
- Around line 8-16: The test's calls array currently uses Date() inputs which
rely on external S3 behavior to trigger the post-initS3() ownership-unwind path;
replace those Date() arguments with a deterministic local throwing object (an
object whose type getter throws) so each call to Bun.S3Client.presign / unlink /
size / exists / stat / write will synchronously throw inside the post-initS3()
path and reliably hit the constructS3FileInternalStore() ownership bug; update
the calls array entries that pass Date() to instead pass this throwing-type
object so the test deterministically reproduces the double-free/unwind scenario.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 7a910e43-869e-405f-83e7-65d0c0bc3f44
📒 Files selected for processing (2)
src/bun.js/webcore/S3File.zigtest/js/bun/s3-presign-double-free.test.ts
| const calls = [ | ||
| () => Bun.S3Client.presign(Date()), | ||
| () => Bun.S3Client.presign(Date(), Date()), | ||
| () => Bun.S3Client.unlink(Date()), | ||
| () => Bun.S3Client.size(Date()), | ||
| () => Bun.S3Client.exists(Date()), | ||
| () => Bun.S3Client.stat(Date()), | ||
| () => Bun.S3Client.write(Date(), "data"), | ||
| ]; |
There was a problem hiding this comment.
Use a deterministic local throw in the post-initS3() path.
These inputs only reach the risky unwind if the call happens to fail after constructS3FileInternalStore() has already taken ownership, which currently depends on ambient S3 config or network behavior. A throwing type getter gives you a local, deterministic failure in that path and will reliably catch this ownership bug.
Example repro shape
+ const throwingOptions = {
+ get type() {
+ throw new Error("boom");
+ },
+ };
const calls = [
- () => Bun.S3Client.presign(Date()),
- () => Bun.S3Client.presign(Date(), Date()),
- () => Bun.S3Client.unlink(Date()),
- () => Bun.S3Client.size(Date()),
- () => Bun.S3Client.exists(Date()),
- () => Bun.S3Client.stat(Date()),
- () => Bun.S3Client.write(Date(), "data"),
+ () => Bun.S3Client.presign(Date(), throwingOptions),
+ () => Bun.S3Client.unlink(Date(), throwingOptions),
+ () => Bun.S3Client.size(Date(), throwingOptions),
+ () => Bun.S3Client.exists(Date(), throwingOptions),
+ () => Bun.S3Client.stat(Date(), throwingOptions),
+ () => Bun.S3Client.write(Date(), "data", throwingOptions),
];🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@test/js/bun/s3-presign-double-free.test.ts` around lines 8 - 16, The test's
calls array currently uses Date() inputs which rely on external S3 behavior to
trigger the post-initS3() ownership-unwind path; replace those Date() arguments
with a deterministic local throwing object (an object whose type getter throws)
so each call to Bun.S3Client.presign / unlink / size / exists / stat / write
will synchronously throw inside the post-initS3() path and reliably hit the
constructS3FileInternalStore() ownership bug; update the calls array entries
that pass Date() to instead pass this throwing-type object so the test
deterministically reproduces the double-free/unwind scenario.
There was a problem hiding this comment.
♻️ Duplicate comments (1)
test/js/bun/s3-presign-double-free.test.ts (1)
11-19:⚠️ Potential issue | 🟠 MajorMake the regression path deterministic after ownership transfer.
Line 12–Line 18 currently rely on
Date()inputs and ambient behavior, so this can pass without reliably entering the post-constructS3FileInternalStore()unwind path that previously double-freed memory. Use a local throwing options getter and pass it to each static method to force a deterministic local failure in that path.Suggested deterministic repro shape
async function main() { + const throwingOptions = { + get type() { + throw new Error("boom"); + }, + }; const calls = [ - () => Bun.S3Client.presign(Date()), - () => Bun.S3Client.presign(Date(), Date()), - () => Bun.S3Client.unlink(Date()), - () => Bun.S3Client.size(Date()), - () => Bun.S3Client.exists(Date()), - () => Bun.S3Client.stat(Date()), - () => Bun.S3Client.write(Date(), "data"), + () => Bun.S3Client.presign(Date(), throwingOptions), + () => Bun.S3Client.unlink(Date(), throwingOptions), + () => Bun.S3Client.size(Date(), throwingOptions), + () => Bun.S3Client.exists(Date(), throwingOptions), + () => Bun.S3Client.stat(Date(), throwingOptions), + () => Bun.S3Client.write(Date(), "data", throwingOptions), ];🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/js/bun/s3-presign-double-free.test.ts` around lines 11 - 19, Replace the non-deterministic Date() inputs used in the calls array for Bun.S3Client.presign, presign (2-arg), unlink, size, exists, stat, and write with a deterministic failing "options" supplier to force the post-constructS3FileInternalStore() unwind path; create a local function that always throws (e.g., getOptionsThatThrows()) and pass its result or the function (as appropriate for the overloads) into each static method invocation so each call deterministically triggers the same local failure and exercises the regression path reliably.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@test/js/bun/s3-presign-double-free.test.ts`:
- Around line 11-19: Replace the non-deterministic Date() inputs used in the
calls array for Bun.S3Client.presign, presign (2-arg), unlink, size, exists,
stat, and write with a deterministic failing "options" supplier to force the
post-constructS3FileInternalStore() unwind path; create a local function that
always throws (e.g., getOptionsThatThrows()) and pass its result or the function
(as appropriate for the overloads) into each static method invocation so each
call deterministically triggers the same local failure and exercises the
regression path reliably.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 238927f0-c92f-4854-9df2-d02d4c5b464d
📥 Commits
Reviewing files that changed from the base of the PR and between 0968ae7 and 0c75e2120755c660b13065f029f35ca19c0b3d31.
📒 Files selected for processing (1)
test/js/bun/s3-presign-double-free.test.ts
There was a problem hiding this comment.
The fix is targeted and the logic appears sound, but this is a memory-safety fix in production Zig code — worth a human look to confirm the ownership transfer semantics are correct in all error paths.
Extended reasoning...
Overview
This PR modifies src/bun.js/webcore/S3File.zig to fix a use-after-poison (double-free) in 6 S3Client static methods (presign, unlink, write, size, exists, stat). The fix adds path_or_blob = .{ .blob = blob }; after constructS3FileInternalStore succeeds, preventing the errdefer guard from freeing path memory that has already been transferred to the store. A regression test is also added.
Security risks
No security-sensitive surfaces (auth, crypto, permissions) are introduced. The fix removes a memory-safety bug, which is a net positive for security.
Level of scrutiny
High. This is a memory management fix in Zig, where ownership semantics are manual and subtle. The fix itself is mechanical (same one-liner repeated across 6 functions), but the correctness depends on understanding the ownership contract between PathLike, Store.initS3, blob.deinit(), and the errdefer guard. The described root cause (toThreadSafe being a no-op for encoded_slice) is plausible but not trivially verifiable without tracing through the allocator chain.
Other factors
The two inline bug reports are nits (missing expect(stderr).toBe("") assertion, test placed in wrong directory). Neither blocks the fix. The PR description is thorough and the code comments explain intent clearly. Given the memory-safety nature of the change and the low cost of a human review pass on a 6-line fix, deferring is the appropriate call.
| env: bunEnv, | ||
| stderr: "pipe", | ||
| }); | ||
|
|
||
| const [stdout, stderr, exitCode] = await Promise.all([proc.stdout.text(), proc.stderr.text(), proc.exited]); | ||
|
|
||
| expect(stdout).toBe("OK\n"); | ||
| expect(exitCode).toBe(0); |
There was a problem hiding this comment.
🟡 The test pipes and collects stderr but never asserts expect(stderr).toBe(""), violating the explicit pattern in test/CLAUDE.md. This means sanitizer warnings that do not abort the process (ASAN/UBSAN output with exit code 0) would be silently ignored. Add expect(stderr).toBe("") before the existing stdout/exitCode assertions.
Extended reasoning...
What the bug is: The test at test/js/bun/s3-presign-double-free.test.ts (lines 29-36) explicitly pipes stderr with stderr: "pipe" and collects it via proc.stderr.text(), but the collected value is never asserted upon. The assertions only check stdout and exitCode, leaving stderr unverified.
The specific code path: Lines 29-36 collect [stdout, stderr, exitCode] in a Promise.all. The variable stderr is bound but unused in any assertion. expect(stdout).toBe("OK\n") and expect(exitCode).toBe(0) are present, but no expect(stderr).toBe("") follows.
Why existing code does not prevent it: The test correctly validates the crash scenario (a double-free would cause a non-zero exit or no "OK" output). However, sanitizer tools like AddressSanitizer can emit diagnostic output to stderr and still let the process exit with code 0. Without asserting on stderr, such diagnostics would be silently swallowed.
CLAUDE.md violation: test/CLAUDE.md lines 36-51 (single-file template) and lines 60-96 (multi-file template) both explicitly show expect(stderr).toBe("") as part of the standard assertion pattern for spawned Bun processes. One refutation claimed CLAUDE.md prohibits this check at line 99 -- this is factually wrong. Line 99 of CLAUDE.md actually reads: "Generally, await using or using is a good idea to ensure proper resource cleanup." The refutation fabricated a non-existent rule.
On the S3-errors-to-stderr objection: A refutation argued that failed S3 calls (no credentials) would produce stderr output. In practice, S3 errors in Bun propagate as JS exceptions, which are entirely swallowed by try { await call(); } catch(e) {}. No JS exception output reaches stderr; only native crashes or sanitizer warnings would. The subprocess stderr should be empty under normal conditions, making expect(stderr).toBe("") a sound assertion.
Step-by-step proof: (1) A hypothetical regression introduces a use-after-poison that only triggers ASAN warning mode (not abort). (2) The subprocess prints an ASAN report to stderr but still exits 0 and prints "OK" to stdout. (3) Both existing assertions pass. (4) The ASAN warning is never detected and CI appears green. Adding expect(stderr).toBe("") would catch this case.
How to fix: Add expect(stderr).toBe(""); after the Promise.all, before the existing stdout/exitCode assertions, matching the CLAUDE.md convention shown in its templates.
0c75e21 to
375d788
Compare
| var path = pathlike; | ||
| // this actually protects/refs the pathlike | ||
| path.toThreadSafe(); | ||
| // For encoded_slice, toThreadSafe is a no-op, so the store would alias | ||
| // the caller's memory. Clone it so the store owns an independent copy. | ||
| if (path == .encoded_slice and path.encoded_slice.isAllocated()) { | ||
| path = .{ .encoded_slice = try path.encoded_slice.toOwned(allocator) }; | ||
| } | ||
|
|
||
| const store = Blob.Store.new(.{ | ||
| .data = .{ |
There was a problem hiding this comment.
🟣 Pre-existing bug: in Store.S3.path(), the trailing-slash stripping branch path_name = path_name[0..path_name.len] is a no-op — it produces the identical slice instead of removing the last character. The symmetric backslash branch at the next line correctly uses path_name.len - 1. As a result, S3 keys ending in / are silently passed verbatim to all S3 operations (presign, stat, exists, unlink, write, delete), potentially causing 404s since S3 treats prefix/ and prefix as distinct keys. Fix: change to path_name[0..path_name.len - 1].
Extended reasoning...
What the bug is and how it manifests
In Store.S3.path() (Store.zig, the path method of the S3 struct), the branch intended to strip a trailing / from the S3 path contains a logic error:
if (strings.endsWith(path_name, "/")) {
path_name = path_name[0..path_name.len]; // BUG: no-op
} else if (strings.endsWith(path_name, "\\")) {
path_name = path_name[0 .. path_name.len - 1]; // Correct
}path_name[0..path_name.len] creates a slice from index 0 to the full length — identical to the original. No character is stripped. The comment above the block says "normalize start and ending", and the symmetric backslash branch correctly uses path_name.len - 1, confirming the intent was to strip one character.
The specific code path that triggers it
Any S3 URL that resolves (after bun.URL.parse().s3Path()) to a string ending in / will hit this branch and return the path unchanged. For example, s3://bucket/prefix/ would yield prefix/ instead of prefix as the key used in all downstream S3 API calls.
Why existing code does not prevent it
There is no other sanitization of the trailing slash downstream. The path() method is called directly before constructing S3 API requests (presign, stat, exists, unlink, write, delete), so whatever path() returns is used verbatim as the S3 object key.
Impact
S3 treats prefix/ and prefix as completely different object keys. Operations on prefix/ when the intended key is prefix will silently 404 or operate on an unexpected key. This affects all six public S3 APIs: presign, stat, exists, size, unlink, and write.
Step-by-step proof
- User calls
Bun.S3Client.stat("s3://bucket/prefix/")(trailing slash). Store.S3.path()callsbun.URL.parse("s3://bucket/prefix/").s3Path()— returns"prefix/"(or"/prefix/"before the leading-slash strip).- The
strings.endsWith(path_name, "/")check is true. path_name = path_name[0..path_name.len]— no-op,path_nameis still"prefix/".- The leading-slash strip does not apply.
"prefix/"is returned and used as the S3 key.- The S3 HEAD request targets
prefix/, notprefix— silently wrong.
How to fix
Change path_name[0..path_name.len] to path_name[0..path_name.len - 1] to match the intent and the symmetric backslash branch. This is a pre-existing bug not introduced by this PR; the PR does not modify S3.path(). However, the PR extensively modifies the S3 path handling infrastructure in the same file, making this a natural opportunity to flag it.
9538183 to
7fde217
Compare
| () => Bun.S3Client.presign(Date()), | ||
| () => Bun.S3Client.presign(Date(), Date()), | ||
| () => Bun.S3Client.unlink(Date()), | ||
| () => Bun.S3Client.size(Date()), | ||
| () => Bun.S3Client.exists(Date()), | ||
| () => Bun.S3Client.stat(Date()), | ||
| () => Bun.S3Client.write(Date(), "data"), |
There was a problem hiding this comment.
🟡 The regression test uses Date() (without new) as the options argument, which returns a string — not a JS object. Because opts.isObject() returns false for a string, the getTruthyComptime("type") branch inside constructS3FileWithS3Credentials is never reached, so the test only exercises the no-credentials failure path (Path 1) and not the throwing-type-getter failure path (Path 2) described in the PR. Adding a case like () => Bun.S3Client.presign(Date(), { get type() { throw new Error("boom"); } }) would deterministically trigger the post-initS3() code path the PR claims to fix.
Extended reasoning...
What the bug is and how it manifests
In JavaScript, Date() called without new returns a plain string (e.g., "Thu Mar 26 2026 ...") rather than a Date object. The test passes this string as the options argument to every call in the calls array (e.g., Bun.S3Client.presign(Date(), Date())). Inside constructS3FileWithS3Credentials, the options object is only interrogated when opts.isObject() is true. Since a string is not an object, opts.isObject() returns false and the entire getTruthyComptime("type") block — which is the code that could exercise the post-initS3() failure path — is skipped unconditionally.
The specific code path that triggers Path 2
The PR description identifies two double-free paths. Path 2 occurs when constructS3FileWithS3Credentials itself fails after Blob.Store.initS3() has already aliased the path: (1) initS3 runs and the store holds a pointer to the path buffer; (2) errdefer store.deinit() is armed; (3) opts.getTruthyComptime(globalObject, "type") invokes a throwing getter; (4) the store errdefer fires and frees the path buffer; (5) error.JSError propagates to the caller before path_or_blob = .{ .blob = blob } is reached; (6) the outer errdefer still sees path_or_blob == .path and calls path_or_blob.path.deinit() on already-freed memory.
Why the current test does not prevent it
All entries in the calls array pass either no options or a Date() string as options. A string fails opts.isObject(), so getTruthyComptime is never invoked. The test therefore only exercises the path where the constructor succeeds but getPresignUrlFrom / the subsequent operation fails due to missing S3 credentials. This is Path 1, not Path 2.
Addressing the refutation
One verifier argued that Store.zig was not modified in commit 9538183, so Path 2 was never actually fixed — and therefore not testing it is consistent. This is a fair observation about implementation completeness. However, it cuts both ways: if Path 2 remains broken, a test that attempts to trigger it (with { get type() { throw new Error("boom"); } }) would either crash under ASAN (revealing the unfixed bug) or pass (revealing the fix was effective through some other mechanism). Either outcome is more informative than silently skipping the path. The PR description explicitly mentions Path 2 as something that "existed" and the coderabbitai reviewer correctly identified the exact missing test case.
Impact
The coverage gap means the described Path 2 scenario remains unverified by CI. Under an ASAN build the double-free would manifest as a use-after-poison crash if Path 2 is genuinely unfixed. The missing test would catch any regression reintroducing the bug in either path.
How to fix
Add a throwingOptions object to the test and include it as the options argument in the calls array:
const throwingOptions = { get type() { throw new Error("boom"); } };
// then add:
() => Bun.S3Client.presign(Date(), throwingOptions),
() => Bun.S3Client.unlink(Date(), throwingOptions),
// etc.This deterministically exercises the post-initS3() failure path on every run without requiring real S3 credentials or network access.
| } | ||
| const options = args.nextEat(); | ||
| var blob = try constructS3FileInternalStore(globalThis, path.path, options); | ||
| path_or_blob = .{ .blob = blob }; // path ownership transferred to store | ||
| defer blob.deinit(); | ||
| return try getPresignUrlFrom(&blob, globalThis, options); | ||
| }, |
There was a problem hiding this comment.
🔴 The double-free fix was applied to S3File.zig static methods but not to the structurally identical S3Client.zig instance methods (presign, exists, size, stat, write, unlink). In those methods, errdefer path.deinit() remains armed after constructS3FileWithS3CredentialsAndOptions succeeds and the store aliases the original path buffer — if any downstream call (e.g. getPresignUrlFrom with a throwing method getter) throws, both defer blob.detach() and errdefer path.deinit() fire on the same memory, causing use-after-poison. Apply the same path_or_blob = .{ .blob = blob } neutralization pattern to the corresponding .path => branches in all six S3Client.zig instance methods.
Extended reasoning...
What the bug is and how it manifests
The PR fixes a double-free in S3File.zig static methods (e.g. presign, unlink, write, size, exists, stat) by adding path_or_blob = .{ .blob = blob } after constructS3FileInternalStore succeeds. This changes the union tag so the outer errdefer (which only fires when path_or_blob == .path) is disabled once the store takes ownership. However, the structurally identical pattern in S3Client.zig instance methods was not updated — errdefer path.deinit() remains always armed after blob construction succeeds.
The specific code path that triggers it
In each S3Client.zig instance method (e.g. presign):
PathLike.fromJS(...)allocates anencoded_slice(pointer A).errdefer path.deinit()is registered — no disabling mechanism exists.constructS3FileWithS3CredentialsAndOptions(...)succeeds; inside,Blob.Store.initS3storespathlikedirectly (no clone for the non-allocated case), aliasing pointer A into the store.defer blob.detach()is registered withstore.ref_count = 1.- A downstream call (e.g.
getPresignUrlFromwith amethodgetter that throws) raises a JS error.
Why existing code does not prevent it
The path_or_blob = .{ .blob = blob } assignment that disables the errdefer in S3File.zig is absent from S3Client.zig. There is no other mechanism to communicate to the outer errdefer that ownership has already been transferred. The S3File.zig fix is purely call-site level — it works because it resets the union tag after the successful constructor return — but S3Client.zig was never updated with the equivalent reset.
Impact
All six S3Client instance methods are affected: presign, exists, size, stat, write, unlink. The crash is deterministic and user-triggerable from JavaScript without any special permissions or credentials.
How to fix
In each S3Client.zig instance method, add the ownership-transfer line immediately after constructS3FileWithS3CredentialsAndOptions succeeds, neutralizing the errdefer path.deinit() the same way the static methods do it:
var blob = try constructS3FileWithS3CredentialsAndOptions(...);
path_neutralized = blob; // or equivalent tag-flip to disable errdefer
defer blob.detach();Step-by-step proof (concrete trigger for presign)
const client = new Bun.S3Client({ bucket: "test" });
client.presign("mykey", { get method() { throw new Error("boom"); } });PathLike.fromJSallocatesencoded_slicefor"mykey"→ pointer A.errdefer path.deinit()armed.constructS3FileWithS3CredentialsAndOptionssucceeds; store aliases pointer A (pathlike.encoded_slice = pointer A).defer blob.detach()registered;store.ref_count = 1.getPresignUrlFromcallsoptions.getTruthyComptime(globalObject, "method")→ invokes the getter → throwsError("boom").defer blob.detach()fires →store.deref()→ ref1→0→store.deinit()→s3.deinit()→pathlike.deinit()→ frees pointer A.errdefer path.deinit()fires (because error occurred) →path.deinit()→ frees pointer A again → use-after-poison.
1e28007 to
9d46dbd
Compare
aed9a81 to
8b8c5ae
Compare
| } | ||
| const options = args.nextEat(); | ||
| var blob = try constructS3FileInternalStore(globalThis, path.path, options); | ||
| path_or_blob = .{ .blob = blob }; // path ownership transferred to store |
There was a problem hiding this comment.
🔴 🔴 The diff no longer matches the PR description — only the original v1 tag-flip in S3File.zig and the test file remain. The encoded_slice clone in Store.initS3/initS3WithReferencedCredentials, the errdefer→defer conversions in S3File.zig/S3Client.zig, and the Blob.zig defers described in the PR body and CodeRabbit walkthrough are all absent (likely lost when rebasing onto the src/bun.js/→src/runtime/ restructure in c8b4c36/e643d7b). Concretely, the Path-2 double-free that was marked "✅ Addressed in commit 9538183" is reachable again, and the earlier review comments at Store.zig:74 and Blob.zig:2123 now give advice predicated on a clone that no longer exists — applying them as-is would introduce new double-frees.
Extended reasoning...
What the bug is
The PR description's Fix section states: "Clone encoded_slice in initS3/initS3WithReferencedCredentials so the store always owns an independent copy" and "Change errdefer to defer for path cleanup in S3File functions". The CodeRabbit walkthrough additionally lists changes to S3Client.zig (8 methods), Blob.zig (constructBunFile + findOrCreateFileFromPath), and stat() error-message corrections. However, the actual diff contains only two files: src/runtime/webcore/S3File.zig (six path_or_blob = .{ .blob = blob } tag-flips) and the test. Verified against the current tree at the post-restructure paths:
src/runtime/webcore/blob/Store.zig:68-124—initS3/initS3WithReferencedCredentialscontain onlyvar path = pathlike; path.toThreadSafe();with notoOwned/clone logic.src/runtime/webcore/S3Client.zig:138,154,172,189,206,220,254— all still useerrdefer path.deinit(), notdefer.src/runtime/webcore/S3File.zig— the sixerrdefer { if (path_or_blob == .path) ... }blocks are unchanged (not converted todefer);stat()still says"get size"at both branches.
git log on Store.zig shows it was only touched by the pure git mv restructure (c8b4c36), never by this PR's fix commits. The edits to the old src/bun.js/webcore/ paths almost certainly did not survive the rebase onto the src/runtime/webcore/ restructure.
Concrete consequence: the Path-2 double-free is back
Inline comment 2996090258 was marked "✅ Addressed in commit 9538183", but with the Store.zig clone gone, the tag-flip alone cannot fix Path 2 because ownership transfers inside the try expression before the tag is flipped. Step-by-step, for Bun.S3Client.presign(<path>, <opts>) where opts has a type getter that throws on its second read (the first read happens earlier in getCredentialsWithOptions) and <path> produces an .encoded_slice:
PathOrBlob.fromJSNoCopy→path_or_blob = .pathwithencoded_slicebuffer A.- Outer
errdefer { if (path_or_blob == .path) path_or_blob.path.deinit() }armed. try constructS3FileInternalStore(...)→constructS3FileWithS3Credentials:Blob.Store.initS3(path, ...)runs. Store.zig:97-122 doesvar path = pathlike; path.toThreadSafe();—PathLike.toThreadSafe()(types.zig:557-569) falls through toelse => {}for.encoded_slice, a no-op, sostore.data.s3.pathlikealiases buffer A.errdefer store.deinit()armed.try opts.getTruthyComptime(globalObject, "type")invokes the getter → throws →error.JSError.errdefer store.deinit()fires →Store.deinit()(Store.zig:179-201) →s3.deinit()(Store.zig:454-459) →pathlike.deinit()→ frees A.
- The error propagates through the
tryat S3File.zig:86. Line 87 (path_or_blob = .{ .blob = blob }) never executes. - Outer
errdeferfires:path_or_blob == .pathis still true →path_or_blob.path.deinit()→ frees A again → use-after-poison.
Why the existing change does not prevent it
The only Zig change in the diff is the tag-flip placed after try constructS3FileInternalStore(...). By Zig semantics, an assignment after a try does not run when the try propagates an error. So the tag-flip helps Path 1 (constructor succeeds, downstream op fails) but cannot help Path 2 (constructor fails after initS3). The intended fix for Path 2 was the Store.zig clone, which is missing.
Secondary consequence: stale review comments now give dangerous advice
Earlier comments at Store.zig:74 (#2996283354) and Blob.zig:2123 (#2996546999) instruct adding defer path.deinit() / moving defer path.deinitAndUnprotect() on the assumption that initS3 clones the path and the caller must free its own copy. With the clone gone, initS3 once again takes ownership of the caller's buffer; applying those suggestions as written would make both the caller and the store free the same buffer — a new double-free. Those comments only make sense once the Store.zig clone is restored.
Addressing the "duplicate" objection
One reviewer noted overlap with existing comments: the S3Client instance-method double-free is already tracked in #2997528729, and the stat() error message in #2996283366. That is true, but neither of those captures the root cause here — that the diff has regressed to its v1 state and no longer contains what the PR description and "✅ Addressed" markers claim. Without surfacing this explicitly, the author may believe commits 9538183+ are present when they are not, and may apply the now-dangerous follow-up suggestions on top of the wrong base. The actionable fix is different in kind ("restore the dropped commits at the new src/runtime/webcore/ paths") from any individual open comment.
How to fix
Re-apply the dropped changes at the post-restructure paths: (1) clone encoded_slice via toOwned(allocator) in Store.initS3/initS3WithReferencedCredentials; (2) convert the six errdefer blocks in S3File.zig to defer; (3) change errdefer path.deinit() → defer path.deinit() across the S3Client.zig instance/static methods; (4) add the Blob.zig defers (constructBunFile, findOrCreateFileFromPath); (5) fix the stat() error strings. Then re-evaluate the still-open follow-up comments against the restored base.
5546cfc to
225dfc3
Compare
b7b4ebd to
6299907
Compare
| return globalThis.throwInvalidArguments("Expected a S3 or path to upload", .{}); | ||
| } | ||
| var blob = try constructS3FileInternalStore(globalThis, path.path, options); | ||
| path_or_blob = .{ .blob = blob }; // path ownership transferred to store |
There was a problem hiding this comment.
🟣 🟣 🟣 Pre-existing companion to #3201498669 at the same fetch.zig→uploadStream boundary, different (worse) symptom: fetch.zig:1335 does _ = try bun.S3.uploadStream(...), discarding the return value, then returns its own s3_stream.promise at line 1353. uploadStream's three early returns at client.zig:468/473/485 (isDisturbed / .Invalid / pending .err) produce a rejected-promise JSValue before the MultiPartUpload task and callback are created at line 491+, so Wrapper.resolve — the only place s3_stream is freed and its promise settled (fetch.zig:1274-1319) — is never invoked. Net: the rejection is silently discarded, the heap Wrapper + JSPromise.Strong + url_proxy_buffer are leaked, and await fetch('s3://...', {method:'PUT', body:disturbedStream}) hangs forever. All other uploadStream callers (Blob.zig:1278/1353/1561/1624/2605) do return S3.uploadStream(...); only fetch.zig discards it. Fix: capture the return and, if it's already-rejected, reject s3_stream.promise and free s3_stream (or drop the wrapper entirely and return try bun.S3.uploadStream(...) with a thin Response-building callback).
Extended reasoning...
What the bug is
At fetch.zig:1324-1353, the S3 + ReadableStream-body branch of fetch() does:
const promise = jsc.JSPromise.Strong.init(globalThis); // line 1324
const s3_stream = Wrapper.new(.{
.url = url, .url_proxy_buffer = url_proxy_buffer,
.promise = promise, .global = globalThis,
}); // 1326-1331
const promise_value = promise.value();
_ = try bun.S3.uploadStream( // 1335 — RETURN VALUE DISCARDED
credentialsWithOptions.credentials.dupe(),
url.s3Path(), body.ReadableStream.get(globalThis).?,
..., @ptrCast(&Wrapper.resolve), s3_stream,
);
url = .{}; url_proxy_buffer = "";
return promise_value; // 1353 — s3_stream.promiseuploadStream (client.zig:449-518) returns bun.JSError!JSValue. On the happy path it returns the multipart task's own promise and later invokes callback (Wrapper.resolve, which settles s3_stream.promise and frees the wrapper). But on its three early-return paths at client.zig:468 (readable_stream.isDisturbed(globalThis)), 473 (.Invalid), and 485 (.File|.Bytes with pending.result == .err), it returns jsc.JSPromise.rejectedPromise(...).toJS() before the MultiPartUpload task and S3UploadStreamWrapper (which holds callback/callback_context) are created at line 491+. fetch.zig:1335 discards that rejected-promise return value via _ =.
The specific code path that triggers it
fetch('s3://bucket/key', { method: 'PUT', body: <stream>, s3: {...} }) reaches fetch.zig:1263 (body == .ReadableStream) → line 1335. The early returns trigger when:
- The body stream has already been read/locked (
isDisturbed— e.g. the user already calledreader.read()or passed the same stream twice). - The stream's
.ptrtag is.Invalid. - The body is a
Bun.file(...)/bytes-backed stream whose source already has a pending.err(e.g.Bun.file('/nonexistent').stream()after the open fails).
Why existing safeguards don't prevent it
Wrapper.resolve (fetch.zig:1274-1319) is the only place that does defer bun.destroy(self) (line 1276), defer bun.default_allocator.free(self.url_proxy_buffer) (line 1277), and self.promise.resolve(...)/reject(...) (lines 1293/1316). It is reached solely via the MultiPartUpload callback chain that begins at client.zig:491+ — the early returns at 468/473/485 are before that, so the callback is never registered. There is no errdefer s3_stream.deinit() or fallback rejection between fetch.zig:1326 and 1353.
The other 5 uploadStream callers (Blob.zig:1278/1353/1561/1624/2605) all do return S3.uploadStream(...), so they propagate the rejected promise to their caller. Only fetch.zig wraps its own outer promise and discards uploadStream's return.
Impact
When any early return fires:
- The rejected promise is discarded (
_ =) → the rejection is never observed (silent unhandled rejection inside native code). Wrapper.resolveis never called → the heapWrapperstruct, itsJSPromise.Strong(aBun__StrongRefnative allocation that permanently roots the promise in the GC), and theurl_proxy_bufferallocation are leaked.s3_stream.promiseis never settled →fetch()returnspromise_valuewhich hangs forever;awaitnever resolves or rejects, and the leaked Strong roots the entire await chain.
This is in addition to the credentials/this.ref() leak that #3201498669 already flags for these same early returns — fixing that (moving this.ref() below the checks, per #3201498669) would not fix the forever-pending fetch promise or the Wrapper/Strong/url_proxy_buffer leak, since the fix locus is fetch.zig (the discarding caller), not client.zig.
Step-by-step proof
const s = new ReadableStream({ pull(c){ c.enqueue(new Uint8Array([1])); c.close(); } });
await s.getReader().read(); // disturb the stream
await fetch('s3://bucket/key', { method: 'PUT', body: s,
s3: { accessKeyId: 'x', secretAccessKey: 'y' } });
// ↑ HANGS FOREVER- fetch.zig:1263:
body == .ReadableStream, URL is S3 → enters the branch. - Line 1324:
JSPromise.Strong.init(globalThis)allocates aBun__StrongRefand roots a fresh JSPromise P. - Lines 1326-1331:
Wrapper.new(.{ .promise = P, .url_proxy_buffer = <buf>, ... })heap-allocates W. - Line 1335:
_ = try bun.S3.uploadStream(creds.dupe(), path, s, ..., &Wrapper.resolve, W). - client.zig:465:
this.ref()(the credentials leak — #3201498669). Line 467:readable_stream.isDisturbed(globalThis)→ true. Line 468:return jsc.JSPromise.rejectedPromise(globalThis, ...).toJS(). TheMultiPartUploadtask at line 491 andS3UploadStreamWrapperat line 512 are never created, soWrapper.resolveis never registered as a callback. - fetch.zig:1335: the rejected promise JSValue is assigned to
_— discarded. - Line 1353:
return promise_value(the JSValue handle for P). P is never resolved or rejected. - The user's
await fetch(...)hangs forever. W, the Strong root on P, andurl_proxy_bufferare permanently leaked.
How to fix
Either (a) capture the return and bridge it on synchronous rejection:
const r = try bun.S3.uploadStream(...);
if (r.asAnyPromise()) |p| if (p.status(globalThis.vm()) == .rejected) {
s3_stream.promise.reject(globalThis, p.result(globalThis.vm())) catch {};
bun.default_allocator.free(s3_stream.url_proxy_buffer);
s3_stream.promise.deinit();
bun.destroy(s3_stream);
return r; // or return promise_value
}or (b) drop Wrapper/s3_stream entirely and return try bun.S3.uploadStream(...) directly (the pattern used at every other call site), with a thin callback that builds the Response and frees url_proxy_buffer.
Relationship to this PR / not a duplicate
Pre-existing — the PR diff does not touch fetch.zig, and this is not on the call path of any PR-modified function (S3File.write routes through Blob.writeFileInternal, whose uploadStream callers do return S3.uploadStream(...)). Flagged because it is the explicit companion to #3201498669, which is already accepted on this PR and references fetch.zig:1336 and the same client.zig:468/473/485 early returns: that comment covers the credentials this.ref() leak on those returns; this covers the discarded _ = return value, the s3_stream Wrapper/JSPromise.Strong/url_proxy_buffer leak, and the user-visible forever-pending fetch promise, none of which #3201498669 (or any other comment on this PR) mentions. Anchored at S3File.zig:156 alongside #3201498669/#3199552705 since the PR diff doesn't touch fetch.zig.
| } | ||
| const options = args.nextEat(); | ||
| var blob = try constructS3FileInternalStore(globalThis, path.path, options); | ||
| path_or_blob = .{ .blob = blob }; // path ownership transferred to store |
There was a problem hiding this comment.
🟣 🟣 🟣 Pre-existing companion to #3201498669 (which already directs the author to fetch.zig:1336, 37 lines above): in the fetch("s3://...") non-stream-body + proxy branch, fetch.zig:1373 does bun.copy(u8, buffer[proxy_.href.len..], proxy_.href) — the destination offset should be buffer[result.url.len..] (lines 1376-1377 read from that intended layout). When proxy_.href.len > result.url.len the destination slice (length result.url.len) is shorter than the source → Debug assert(output.len >= input.len) panic / release memmove heap overflow; when shorter, the proxy bytes splice into the middle of the signed URL and buffer[result.url.len..] is left uninitialized, so line 1377 parses garbage as the proxy host. Reachable via fetch("s3://b/k", { proxy: "http://...", method: "GET"|"PUT"|"DELETE"|"HEAD", s3: {...} }). Fix: change line 1373 to bun.copy(u8, buffer[result.url.len..], proxy_.href);.
Extended reasoning...
What the bug is
In the fetch("s3://...") non-presigned path with a proxy configured (fetch.zig:1367-1377 — sibling to the stream-body S3 branch at fetch.zig:1336 already referenced by #3201498669), the joined URL/proxy buffer is built with the wrong destination offset on the second copy:
// fetch.zig:1371-1377
var buffer = bun.handleOom(allocator.alloc(u8, result.url.len + proxy_.href.len));
bun.copy(u8, buffer[0..result.url.len], result.url);
bun.copy(u8, buffer[proxy_.href.len..], proxy_.href); // ← BUG: should be buffer[result.url.len..]
url_proxy_buffer = buffer;
url = ZigURL.parse(url_proxy_buffer[0..result.url.len]);
proxy = ZigURL.parse(url_proxy_buffer[result.url.len..]); // reads where proxy was *supposed* to goThe intended layout is [signed_url][proxy_url], and lines 1376-1377 read from exactly that layout — but line 1373 writes proxy_.href starting at offset proxy_.href.len instead of result.url.len. The two analogous proxy-buffer joins earlier in this function (the non-S3 path) write to buffer[url.href.len..]; only the S3 branch's copy uses the wrong index variable.
Concrete consequences
bun.copy → bun.memmove (bun.zig:3468-3478): in Debug it asserts output.len >= input.len; in release it does c.memmove(dest.ptr, src.ptr, src.len). The destination slice buffer[proxy_.href.len..] has length (result.url.len + proxy_.href.len) - proxy_.href.len = result.url.len.
Case A — proxy_.href.len > result.url.len (e.g. short bucket/key + short endpoint, long proxy URL with embedded auth like http://user:longpassword@proxy.corp.internal:3128/): the destination slice has length result.url.len < proxy_.href.len. Debug/ReleaseSafe: assert(output.len >= input.len) panics. ReleaseFast: memmove writes proxy_.href.len bytes starting at buffer.ptr + proxy_.href.len, i.e. end-of-write at offset 2*proxy_.href.len, which is proxy_.href.len - result.url.len bytes past the end of the allocation — heap buffer overflow.
Case B — proxy_.href.len < result.url.len (the common case — signRequest(..., false, null) here is not presigned so result.url is just https://<endpoint>/<bucket>/<key>): line 1373 overwrites buffer[proxy_.href.len .. 2*proxy_.href.len] (the middle of the signed URL just written at line 1372), and buffer[result.url.len..] is never written (allocator.alloc does not zero). Line 1376 then parses a corrupted signed URL (proxy bytes spliced into the middle of the host/path), and line 1377 calls ZigURL.parse on uninitialized heap memory — the resulting fetch is dispatched to a garbage proxy host/port.
Case C — proxy_.href.len == result.url.len: works by coincidence.
Reachability
proxy is set from the user's { proxy: "..." } option earlier in fetch.zig. The S3+proxy branch at line 1367 is reached for any fetch("s3://bucket/key", { proxy: "http://...", method: "GET"|"PUT"|"DELETE"|"HEAD" }) whose body is not a ReadableStream (the stream-body case takes the uploadStream branch at line 1336 instead). E.g.:
await fetch("s3://b/k", {
proxy: "http://user:verylongpasswordxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx@proxy:3128",
s3: { accessKeyId: "x", secretAccessKey: "y", endpoint: "http://s3.local" },
});with a proxy URL longer than the signed URL → heap overflow; with a shorter one → request issued with a corrupted URL to an uninitialized-memory "proxy".
Why existing safeguards don't prevent it
Lines 1376-1377 read from the intended offsets ([0..result.url.len] / [result.url.len..]), confirming the layout intent and the typo. Nothing re-derives the proxy URL after line 1377; whatever ZigURL.parse extracts from the uninitialized region is used verbatim as the HTTP CONNECT target.
Step-by-step proof (Case A, heap overflow)
Take result.url = "http://s3.local/b/k" (19 bytes) and proxy_.href = "http://user:passpasspasspasspass@proxy:3128/" (44 bytes):
- Line 1371:
buffer = alloc(19 + 44) = alloc(63). - Line 1372:
buffer[0..19] = result.url— OK. - Line 1373: destination =
buffer[44..], a slice of length63 - 44 = 19. Source length = 44. Debug:assert(19 >= 44)→ panic. Release:c.memmove(buffer.ptr + 44, proxy_.href.ptr, 44)writes bytes at offsets44..88, but the allocation ends at offset63→ 25-byte heap buffer overflow.
(Case B, uninitialized read): with result.url.len = 60, proxy_.href.len = 20: line 1373 writes proxy bytes at buffer[20..40] (clobbering the middle of the URL written at step 2); buffer[60..80] is never written. Line 1377 parses buffer[60..80] — uninitialized heap — as the proxy URL.
How to fix
Change fetch.zig:1373 to bun.copy(u8, buffer[result.url.len..], proxy_.href);.
Relationship to this PR / not a duplicate
Pre-existing — the PR diff does not touch fetch.zig, and the trigger requires the fetch("s3://...") API rather than the Bun.S3Client.* static methods this PR modifies. Flagged because (a) it sits in the same S3-request subsystem this PR's review thread has already enumerated end-to-end — #3201498669 already directs the author to fetch.zig:1336, 37 lines above, in the sibling stream-body S3 branch; (b) it is a crash/memory-safety bug (heap overflow + uninitialized read) rather than a leak, so strictly higher severity than most existing pre-existing comments on this PR; and (c) none of the 30 existing comments mention fetch.zig:1373, the proxy-buffer join, buffer[proxy_.href.len..], or fetch+S3+proxy at all.
3221391 to
548bb95
Compare
| var blob = try constructS3FileInternalStore(globalThis, path.path, options); | ||
| path_or_blob = .{ .blob = blob }; // path ownership transferred to store | ||
| defer blob.deinit(); | ||
| return try getPresignUrlFrom(&blob, globalThis, options); |
There was a problem hiding this comment.
🟣 🟣 Pre-existing correction to #3201670545: that comment's recommended one-line fix (add .content_encoding = credentialsWithOptions.content_encoding to the signRequest literal at S3File.zig:505-513) is a no-op for presign, because signRequest's presigned-URL branch never reads content_encoding. getPresignUrlFrom passes a non-null third arg (S3File.zig:513) → signQuery = true (credentials.zig:362), and the signQuery branch hardcodes X-Amz-SignedHeaders=host (line 471/553) and emits only response-content-disposition (513-517/555-557/620-622) and response-content-type (519-523/559-561/624-626) — there is no encoded_content_encoding buffer and no response-content-encoding={s} append. The credentials.zig:464/654/741-745 references #3201670545 cites are all in the non-signQuery else branch (used by executeSimpleS3Request). Fix: in addition to #3201670545's S3File.zig change, add a response-content-encoding encode + query-param block in the signQuery branch between lines 557 and 559 (and 622/624 in url_query_parts).
Extended reasoning...
What the bug is
Existing comment #3201670545 correctly observes that getPresignUrlFrom's signRequest struct literal at S3File.zig:505-513 omits .content_encoding while passing .content_disposition and .content_type, and recommends adding .content_encoding = credentialsWithOptions.content_encoding. That recommendation is accurate as far as it goes — but it is insufficient on its own: applying it produces a byte-identical presigned URL, because signRequest's presigned-URL code path never reads signOptions.content_encoding.
The specific code path
getPresignUrlFrom calls credentialsWithOptions.credentials.signRequest(.{...}, false, .{ .expires = expires }) (S3File.zig:513). The non-null third argument makes signQuery = true (credentials.zig:362: const signQuery = signQueryOption != null). Execution then takes the if (signQuery) branch at credentials.zig:498-645. That branch:
- Line 471:
const signed_headers = if (signQuery) "host" else SignedHeaders.get(header_key);— hardcodes"host". Soheader_key.content_encoding(computed at line 464) is dead on this path; even ifcontent_encodingwere non-null, it could never appear inSignedHeadersfor a presigned URL. - Lines 513-517: URI-encodes
content_dispositionintoencoded_content_disposition. - Lines 519-523: URI-encodes
content_typeintoencoded_content_type. - Lines 555-557 / 620-622: emits
response-content-disposition={s}into bothquery_parts(canonical request) andurl_query_parts(final URL). - Lines 559-561 / 624-626: emits
response-content-type={s}likewise. - There is no
encoded_content_encodingbuffer, noresponse-content-encoding={s}append, and the alphabetical-order comments at lines 531/594 do not list it.
The lines #3201670545 cites as evidence that "signRequest consumes it" — credentials.zig:464 (SignedHeaders.Key), 654 (canonical-request format), 741-745 (content-encoding header emission) — all live in the non-signQuery else branch, which is reached only when signQueryOption == null (i.e. executeSimpleS3Request at simple_request.zig:378). The signQuery branch returns early at credentials.zig:680-692, so 741-745 is unreachable for presign.
Why #3201670545's fix alone is ineffective
Trace for Bun.S3Client.presign('key', { contentEncoding: 'gzip', bucket: 'b', accessKeyId: 'x', secretAccessKey: 'y' }) with #3201670545's one-line fix applied:
getPresignUrlFrom→ S3File.zig:501:credentialsWithOptions = s3.getCredentialsWithOptions(options, ...)→credentialsWithOptions.content_encoding = 'gzip'(parsed/validated at credentials_jsc.zig:222-239).- (with fix) S3File.zig:~513:
signRequest({..., .content_encoding = 'gzip'}, false, .{ .expires }). - credentials.zig:351-354:
const content_encoding = signOptions.content_encoding→'gzip'. Line 362:signQuery = true. Line 471:signed_headers = "host"—content_encodingis not folded in. - Line 498: enters
if (signQuery). Lines 513-523: encodecontent_disposition/content_typeonly. Lines 529-575: buildquery_parts—response-content-disposition(555-557) andresponse-content-type(559-561) are appended iff non-null;content_encodingis never read. Same forurl_query_partsat 592-626. - Returned URL has no
response-content-encodingquery param andX-Amz-SignedHeaders=hostonly. Output is identical to before the fix.
So the author would apply #3201670545, see it compile, and believe contentEncoding is plumbed through to presign — when it is still silently dropped at the second of two layers.
Impact
response-content-encoding is a standard S3 GetObject response-override query parameter (alongside response-content-disposition and response-content-type, which Bun already supports). Without it, there is no way to request a presigned GET URL that overrides the served Content-Encoding header — the user's { contentEncoding: 'gzip' } is parsed, allocated, and CR/LF-validated (credentials_jsc.zig:231), then discarded. Functional bug, not memory-safety; same "option silently dropped at consumer call site" class as #3200790776/#3201670545/#3202066991.
How to fix
In addition to #3201670545's S3File.zig change, in signRequest's signQuery branch:
- After credentials.zig:523, add an encode block:
var content_encoding_encoded_buffer: [256]u8 = undefined; var encoded_content_encoding: ?[]const u8 = null; if (content_encoding) |ce| { encoded_content_encoding = encodeURIComponent(ce, &content_encoding_encoded_buffer, true) catch return error.FailedToGenerateSignature; }
- Between lines 557 and 559 (and 622/624 in
url_query_parts), insert (alphabetical: disposition < encoding < type):if (encoded_content_encoding) |ce| { try query_parts.append(try std.fmt.allocPrint(allocator, "response-content-encoding={s}", .{ce})); }
- Bump the
BoundedArraycapacities at lines 529/592 by one and update the alphabetical-order comments at 531/594.
Relationship to this PR / not a duplicate
Pre-existing — neither S3File.zig:505-513 nor credentials.zig:498-645 are in the diff. This is a correction to #3201670545's recommended fix, in the same vein as #3201498672 (correcting #3199552705) and #3200173448 (refining #3189832498) already on this PR: same user-visible symptom (contentEncoding dropped from presign) but a materially different fix locus — #3201670545 targets only the S3File.zig literal, but the second layer of the drop is in credentials.zig's signQuery branch, and applying #3201670545 alone leaves the symptom unresolved while giving false confidence. None of the existing PR comments mention response-content-encoding, the signQuery branch's response-override params, credentials.zig:555-561/620-626, or that X-Amz-SignedHeaders=host is hardcoded for presign.
| } | ||
| const options = args.nextEat(); | ||
| var blob = try constructS3FileInternalStore(globalThis, path.path, options); | ||
| path_or_blob = .{ .blob = blob }; // path ownership transferred to store |
There was a problem hiding this comment.
🟣 🟣 🟣 Pre-existing, download-side companion to the upload/request-side ownership defects already on this PR (#3199552705/#3201498672/#3200790769/#3201498669): doReadFromS3 (Blob.zig:119) wraps the consumer with lifetime = .clone while its file-read sibling NewReadFileHandler (read_file.zig:26) correctly uses .temporary; the .clone contract assumes "the store owns the memory" (comment at Blob.zig:3812), but for S3 the store holds only credentials+path. onS3DownloadResolved (Blob.zig:2398-2412) extracts response.body.list.items and never frees response.body (whose ownership simple_request.zig:300-307 explicitly transfers to the callback — see the doc at line 20: "body is owned and dont need to be copied, but dont forget to free it"). Under .clone, toArrayBufferViewWithBytes (3980) copies via Bun__createArrayBufferForCopy, and toStringWithBytes's all-ASCII arm (3810-3813) installs Store.external as the finalizer — which only does this.deref() (Store.zig:63-67) and never frees the byte pointer. Net: every successful s3file.text()/.arrayBuffer()/.bytes()/.json()/.formData() leaks the entire downloaded body buffer — by far the largest per-call leak in the subsystem. Fix: change Blob.zig:119 from .clone to .temporary. None of the PR-modified static methods trigger a download, so this is unrelated to the PR's actual changes.
Extended reasoning...
What the bug is
doReadFromS3 (Blob.zig:114-123) is the entry point for s3file.text() / .arrayBuffer() / .bytes() / .json() / .formData() (via toString/toArrayBufferView/toJSON/toFormData → isS3() → doReadFromS3). It binds the bytes consumer with the wrong lifetime:
// Blob.zig:117-121
const WrappedFn = struct {
pub fn wrapped(b: *Blob, g: *JSGlobalObject, by: []u8) jsc.JSValue {
return jsc.toJSHostCall(g, @src(), Function, .{ b, g, by, .clone });
}
};
return S3BlobDownloadTask.init(global, this, WrappedFn.wrapped);while its file-read sibling NewReadFileHandler (read_file.zig:24-26), which feeds the same Function set, correctly uses .temporary. The .clone contract assumes buf lives inside this.store — see the comment at Blob.zig:3812: "the store owns the memory" and the assert at 3817 (store.data == .bytes) for the .transfer arm. For an S3 store, this.store.data == .s3, which holds only credentials+path — never the downloaded body.
Where the bytes come from and why nothing frees them
S3HttpSimpleTask.onResponse for .download (simple_request.zig:300-313) moves the accumulated response_buffer into the callback's result.success.body and resets its own copy to empty (this.response_buffer = .{ .allocator = ..., .list = .{ .items = &.{}, .capacity = 0 } }), so its own defer this.deinit() → response_buffer.deinit() is a no-op. The S3DownloadResult doc at simple_request.zig:20 explicitly states: "body is owned and dont need to be copied, but dont forget to free it".
onS3DownloadResolved (Blob.zig:2398-2412) does:
.success => |response| {
const bytes = response.body.list.items;
...
try jsc.AnyPromise.wrap(..., S3BlobDownloadTask.callHandler, .{ this, bytes });
},with no defer response.body.deinit(). S3BlobDownloadTask.deinit() (2445-2450) frees the task struct, derefs the S3 store, and deinits the Strong — but never sees response.body. So the only way the body buffer can be freed is if the consumer (toStringWithBytes/toArrayBufferViewWithBytes/etc.) adopts it. Under .clone, none of them do.
Why .clone leaks for every consumer
toArrayBufferViewWithBytes.clonearm (Blob.zig:3980):jsc.ArrayBuffer.create(global, buf, ...)→ array_buffer.zig:150-156 →Bun__createArrayBufferForCopy/Bun__createUint8ArrayForCopy, which copybufinto a fresh JS-managed allocation. The originalbufis never freed. (Contrast.temporaryat 4015:ArrayBuffer.fromBytes(buf, ...).toJS(global)adoptsbufand frees it via the default allocator on GC.)toStringWithBytesall-ASCII.clonearm (Blob.zig:3810-3813):this.store.?.ref(); return ZigString.init(buf).external(global, this.store.?, Store.external)— creates a JS string that points directly intobufwith the S3 store as finalizer context.Store.external(Store.zig:63-67) is{ if (ptr == null) return; bun.cast(*Store, ptr).deref(); }— it ignores the byte pointer entirely. So when the JS string is GC'd, the S3 store is deref'd butbufis never freed.toStringWithBytesnon-ASCII branch (Blob.zig:3789-3801): allocates UTF-16 and returns;raw_bytesis freed only whenlifetime == .temporary(3797-3799). Under.clone, the original buffer leaks.toJSONWithBytes/toFormDataWithBytes: same pattern (they call into the same lifetime switch or copy-only paths).
Impact
Every successful s3file.text() / .json() / .arrayBuffer() / .bytes() / .formData() leaks the entire downloaded body buffer (object size + ArrayList over-capacity). For a service that loops await client.file(key).arrayBuffer() over many objects, native heap grows by the sum of all downloaded sizes — orders of magnitude larger than any other leak flagged on this PR (which are per-call structs, paths, or credential strings).
Step-by-step proof — await Bun.s3('bucket/key').arrayBuffer() (1 MB object)
Blob.toArrayBufferView→isS3()→doReadFromS3(toArrayBufferWithBytes, global)(Blob.zig:4040).WrappedFn.wrappedbindslifetime = .clone.S3BlobDownloadTask.init(Blob.zig:2414+):.blob = blob.dupe()(S3 store rc+1), schedulesS3.download.- HTTP 200.
S3HttpSimpleTask.onResponse(simple_request.zig:300-307) moves the 1 MBresponse_bufferintoresult.success.bodyand resets its own to empty;defer this.deinit()later frees only the empty buffer. Ownership of the 1 MB allocation (ptr B) is now with the callback. onS3DownloadResolved(Blob.zig:2402):bytes = response.body.list.items(~1 MB heap, ptr B). Nodefer response.body.deinit().- Handler →
toArrayBufferViewWithBytes(&task.blob, global, bytes, .clone, .ArrayBuffer)→ line 3980:ArrayBuffer.create(global, buf, .ArrayBuffer)→Bun__createArrayBufferForCopyallocates a fresh JS ArrayBuffer and memcpysbufinto it. - Promise resolves with the copied ArrayBuffer.
defer this.deinit()(Blob.zig:2399) frees the task struct, derefs the S3 store, deinits the Strong — but never frees ptr B. - Net: 1 MB leaked. Repeat in a loop and native heap grows unboundedly.
For the all-ASCII text() case, replace step 5 with ZigString.init(buf).external(global, s3_store, Store.external): the JS string points directly at B; on GC, Store.external(s3_store, B.ptr, B.len) runs and only does s3_store.deref() — B is still never freed.
How to fix
Change Blob.zig:119 from .clone to .temporary, matching NewReadFileHandler at read_file.zig:26. Under .temporary: toArrayBufferViewWithBytes adopts buf via ArrayBuffer.fromBytes(...).toJS() (Blob.zig:4015, default-allocator-freed on GC); toStringWithBytes adopts via toExternalValue (3842) or clone+free (3832-3839); the non-ASCII branch frees raw_bytes at 3797-3799. (Optionally also have onS3DownloadResolved pass response.body.list.allocatedSlice() truncated to .items.len so the freed pointer matches the allocation base, though with mimalloc the items pointer suffices.)
Relationship to this PR / not a duplicate
Pre-existing — the PR diff does not touch doReadFromS3, S3BlobDownloadTask, onS3DownloadResolved, or any download-side code, and none of the six PR-modified static methods (presign/unlink/write/size/exists/stat) trigger a download. Flagged because (a) it is the download-side counterpart of the upload/request-side ownership defects already enumerated on this PR (#3199552705/#3201498672 destination_blob, #3200790769 small-upload Wrapper, #3201498669 credentials.dupe()), in the same "native buffer ownership confused with store ownership" class; (b) it is by far the largest per-call leak in the subsystem (full object body vs. a struct/path/credentials); and (c) the fix is a one-token change in the same file as several existing comments. None of the existing PR comments mention doReadFromS3, S3BlobDownloadTask, onS3DownloadResolved, Lifetime.clone, Store.external, response.body, or the download path at all.
548bb95 to
44417f0
Compare
| var blob = try constructS3FileInternalStore(globalThis, path.path, options); | ||
| path_or_blob = .{ .blob = blob }; // path ownership transferred to store | ||
| defer blob.deinit(); | ||
| return try getPresignUrlFrom(&blob, globalThis, options); |
There was a problem hiding this comment.
🟣 🟣 Pre-existing companion to #3202066971 (the <Code>/<Message> inverted-slice crash) at a user-config-triggered site that comment does not list: guessRegion (credentials.zig:268-282, reached from signRequest at line 373 whenever region.len == 0) does endpoint[start+3..end] where start = indexOf(endpoint, "s3.") and end = indexOf(endpoint, ".amazonaws.com"), both scanning from offset 0 with no start+3 <= end guard. For endpoint: "https://s3.amazonaws.com" (the documented legacy/global S3 endpoint, normalized to "s3.amazonaws.com" at credentials_jsc.zig:74) this yields endpoint[3..2] → Debug/ReleaseSafe panic, ReleaseFast UB (underflowed .len fed into bufPrint at line 484 → wild OOB read/segfault); same for "<bucket>.s3.amazonaws.com". Reachable from this PR-modified line via getPresignUrlFrom → signRequest (e.g. Bun.S3Client.presign("k", { endpoint: "https://s3.amazonaws.com", accessKeyId: "x", secretAccessKey: "y", bucket: "b" }) with no region — the trigger is a valid, documented AWS endpoint string in user options/env, not a malformed server response). Fix: guard if (start + 3 <= end) return endpoint[start+3..end] else return "us-east-1".
Extended reasoning...
What the bug is and how it manifests
S3Credentials.guessRegion (src/s3_signing/credentials.zig:268-282) is called from signRequest at line 373 (const region = if (this.region.len > 0) this.region else guessRegion(this.endpoint)) whenever no region is configured. It does:
if (strings.indexOf(endpoint, ".amazonaws.com")) |end| {
if (strings.indexOf(endpoint, "s3.")) |start| {
return endpoint[start + 3 .. end];
}
}Both indexOf calls scan from offset 0 of the same string, with no check that start + 3 <= end. For the legacy/global S3 endpoint "s3.amazonaws.com" — which AWS still documents and supports — indexOf(".amazonaws.com") matches at index 2 (end = 2) and indexOf("s3.") matches at index 0 (start = 0), giving endpoint[3..2]. In Debug/ReleaseSafe Zig panics (start index 3 is larger than end index 2); in ReleaseFast (Bun's shipped build) it is undefined behaviour: the resulting slice has .ptr = endpoint.ptr+3 and .len ≈ 2^64-1. That region is then fed into std.fmt.bufPrint(&tmp_buffer, "{s}{s}{s}", .{ region, "s3", secretAccessKey }) at line 484, which walks off the end of mapped memory → segfault.
The same applies to regionless virtual-hosted endpoints "<bucket>.s3.amazonaws.com" (the canonical us-east-1 vhost form): for "mybucket.s3.amazonaws.com", end = 11, start = 9, → endpoint[12..11] → same crash.
Reachability from the PR-modified code
endpoint is normalized via bun.URL.parse(...).hostWithPath() at both entry points (credentials_jsc.zig:74-77 for the options.endpoint JS property; env_loader.zig for S3_ENDPOINT/AWS_ENDPOINT), so endpoint: "https://s3.amazonaws.com" becomes exactly "s3.amazonaws.com". From the PR's tag-flip at S3File.zig:87:
Bun.S3Client.presign("key", { endpoint: "https://s3.amazonaws.com", accessKeyId: "x", secretAccessKey: "y", bucket: "b" });
// (no `region`, no AWS_REGION/S3_REGION in env — the PR's own test explicitly strips both)→ S3File.zig:86 constructS3FileInternalStore builds a store with credentials.endpoint = "s3.amazonaws.com", credentials.region = "" → S3File.zig:89 getPresignUrlFrom → S3File.zig:505 signRequest → credentials.zig:373: this.region.len == 0 → guessRegion("s3.amazonaws.com") → endpoint[3..2] → crash. Same path for unlink/write/size/exists/stat (S3File.zig:120/160/199/233/585) via executeSimpleS3Request → signRequest.
Why existing safeguards don't prevent it
There is no start + 3 <= end guard. The function only works when the endpoint embeds a region between s3. and .amazonaws.com (e.g. "s3.us-west-2.amazonaws.com" → correctly returns "us-west-2"). The regionless forms — which are exactly what a user reaches for when they don't know the region (precisely the region.len == 0 case that triggers guessRegion) — are not handled. The R2 check at line 270 and the "auto" fallback at line 277 sit on different branches and do not cover this case.
Impact
A user-config-triggerable process crash from a valid, documented AWS endpoint string. This is the same inverted-slice defect class as #3202066971, but with materially higher practical exposure: #3202066971 requires a malformed server response (close-tag-before-open-tag XML) from a user-configured endpoint, whereas this requires only that the user set endpoint to the AWS global endpoint without also setting region — a configuration that the AWS docs and many tutorials explicitly show.
Step-by-step proof
endpoint = "s3.amazonaws.com"(16 bytes),region = "".- credentials.zig:373:
this.region.len == 0→guessRegion("s3.amazonaws.com"). - Line 270:
endsWith(".r2.cloudflarestorage.com")→ false. - Line 271:
indexOf("s3.amazonaws.com", ".amazonaws.com")→ matches at index 2 →end = 2. - Line 272:
indexOf("s3.amazonaws.com", "s3.")→ matches at index 0 →start = 0. - Line 273:
return endpoint[0+3 .. 2]=endpoint[3..2]. 3 > 2 → Debug/ReleaseSafe panic; ReleaseFast:.ptr = endpoint.ptr+3,.len = 18446744073709551615. - Back at line 484:
bufPrint(&tmp_buffer[0..4096], "{s}{s}{s}", .{ region /* len≈2^64 */, "s3", secretAccessKey })reads off the end of mapped memory → segfault.
The vhost-form trace is identical: "mybucket.s3.amazonaws.com" → end = 11 (.amazonaws.com at index 11), start = 9 (s3. at index 9) → endpoint[12..11].
How to fix
Either guard the slice:
if (strings.indexOf(endpoint, ".amazonaws.com")) |end| {
if (strings.indexOf(endpoint, "s3.")) |start| {
if (start + 3 <= end) return endpoint[start + 3 .. end];
return "us-east-1"; // global/regionless AWS endpoint
}
}or search for ".amazonaws.com" only after start + 3 (e.g. indexOf(endpoint[start+3..], ".amazonaws.com") and add the offset back). Same shape as #3202066971's recommended fix.
Relationship to this PR / not a duplicate
Pre-existing — the PR diff does not touch credentials.zig, and the tag-flip at S3File.zig:87 neither causes nor fixes this. Distinct from #3202066971: that comment enumerates simple_request.zig:163-173/201-210, multipart.zig:496-498, and download_stream.zig:101-112 (all server-response XML parsing); this is credentials.zig:271-273 (user-supplied endpoint config, request-side), reached on the immediate path of the PR-modified getPresignUrlFrom call at line 89. None of the existing PR comments mention guessRegion, credentials.zig:271-273, s3.amazonaws.com, or the regionless-endpoint case.
| var blob = try constructS3FileInternalStore(globalThis, path.path, options); | ||
| path_or_blob = .{ .blob = blob }; // path ownership transferred to store | ||
| defer blob.deinit(); | ||
| return try getPresignUrlFrom(&blob, globalThis, options); |
There was a problem hiding this comment.
🟣 Pre-existing: AWSSignatureCache (rare_data.zig:254-297) — reached from this PR-modified presign at line 89 via getPresignUrlFrom → signRequest (credentials.zig:483-496), and from every other PR-modified static method via executeSimpleS3Request → signRequest — is broken in three interlocking ways: (1) credentials.zig:484 builds the cache key into tmp_buffer, line 490 overwrites tmp_buffer with "AWS4"+secret while key still points into it, then line 495 calls cache.set(..., key, ...) with the corrupted slice — get-key and set-key never match, so the cache always misses; (2) AWSSignatureCache.get (rare_data.zig:273) returns this.cache.getKey(key) — the stored key string, not the 32-byte signing-key value — so the obvious one-line fix to (1) would make every cached request fail with SignatureDoesNotMatch; (3) set (rare_data.zig:290) does put(allocator.dupe(key), value), which discards the freshly-duped key when found_existing — and because (1) makes the corrupted set-key deterministic per (region, secret), every signRequest after the first per day leaks ~50 bytes. Fix all three together: use a separate buffer for the cache key, change getKey → getPtr (returning the value slice), and use getOrPut so the key is only duped when !found_existing.
Extended reasoning...
What the bug is
AWSSignatureCache is the per-process cache of SigV4 derived signing keys (the 4-step HMAC("AWS4"+secret, date) → region → service → "aws4_request" chain), keyed by region + "s3" + secretAccessKey and bucketed by UTC day. It has three interlocking defects that together produce a steady ~50-byte/request leak today and a latent landmine that would break all S3 requests if the most obvious of the three were fixed in isolation.
(1) tmp_buffer is overwritten between cache get and set — credentials.zig:484-495
const key = try std.fmt.bufPrint(&tmp_buffer, "{s}{s}{s}", .{ region, service_name, this.secretAccessKey }); // 484
var cache = ...awsCache();
if (cache.get(date_result.numeric_day, key)) |cached| { break :brk_sign cached; } // 486 — sees correct key
const sigDate = bun.hmac.generate(try std.fmt.bufPrint(&tmp_buffer, "AWS4{s}", .{this.secretAccessKey}), ...); // 490 — OVERWRITES tmp_buffer
...
cache.set(date_result.numeric_day, key, hmac_sig_service2[0..DIGESTED_HMAC_256_LEN].*); // 495 — key now reads garbagekey is a slice into the stack tmp_buffer. Line 490's second bufPrint into the same buffer overwrites the first region.len + 2 + secret.len bytes with "AWS4" + secret (length 4 + secret.len), while key.len stays at the original length. So the bytes cache.get saw at line 486 (region+"s3"+secret) and the bytes cache.set sees at line 495 ("AWS4"+secret + <stale tail of the first write>) never match — the cache always misses, and the 4-HMAC chain is recomputed on every signRequest.
(2) AWSSignatureCache.get returns the stored key, not the value — rare_data.zig:273
pub fn get(this: *@This(), numeric_day: u64, key: []const u8) ?[]const u8 {
...
if (this.cache.getKey(key)) |cached| {
return cached;The map is bun.StringArrayHashMap([DIGESTED_HMAC_256_LEN]u8) = std.ArrayHashMap([]const u8, [32]u8, ...). std.ArrayHashMap.getKey returns ?K — the stored key string (the duped region+"s3"+secret-shaped bytes) — not ?V (the 32-byte derived signing key). The function type-checks because both happen to be []const u8-coercible. The caller binds the result to sigDateRegionServiceReq (credentials.zig:486-487) and uses it as the HMAC key at lines 587/669: bun.hmac.generate(sigDateRegionServiceReq, signValue, .sha256, ...). So if the cache ever hit, every signature would be HMAC-SHA256(key-string, stringToSign) instead of HMAC-SHA256(derived-signing-key, stringToSign) → AWS rejects with SignatureDoesNotMatch.
This is currently masked by (1) — the cache never hits, so the wrong return value is never used. But the obvious one-line "perf fix" to (1) (compute the cache key in a separate small buffer) would unmask (2) and break every S3 request after the first per (day, region, secret).
(3) set leaks the duped key on duplicate put — rare_data.zig:290
bun.handleOom(this.cache.put(bun.handleOom(bun.default_allocator.dupe(u8, key)), value));std.ArrayHashMap.put is implemented as getOrPut + gop.value_ptr.* = value. When the key already exists, key_ptr is not updated — the inline dupe(u8, key) result is discarded, never stored, never freed. (Compare HotMap.insert ~130 lines below in the same file, which correctly only dupes when !found_existing.)
Because (1) makes the corrupted set-key deterministic per (region, secret) — it is always "AWS4"+secret + secret[secret.len-(region.len-2)..] for the same inputs — every signRequest after the first per day passes a key that already exists in the map. So today, with (1) in place, every S3 sign operation after the first leaks region.len + 2 + secret.len bytes (~50-60 bytes for typical 40-char AWS secrets and us-east-1).
Why existing safeguards don't prevent it
Nothing else reads or writes tmp_buffer between lines 484 and 495, so the corruption is not incidentally repaired. AWSSignatureCache.get's return type is ?[]const u8, which both getKey's ?K and a hypothetical ?[]const u8 value-slice satisfy, so the type system does not catch (2). And std.ArrayHashMap.put has no "adopt key on insert, free key on update" semantics — the caller is responsible for the duped key, and here it is passed inline with no handle to free.
Reachability from the PR
Bun.S3Client.presign("key", { accessKeyId, secretAccessKey, bucket }) → S3File.zig:86 (one line above the PR's tag-flip at 87) → getPresignUrlFrom (S3File.zig:89) → credentialsWithOptions.credentials.signRequest(...) (S3File.zig:505) → credentials.zig:483-496. The other five PR-modified static methods reach the same block via executeSimpleS3Request → signRequest. The PR's own regression test does not reach it (credentials are stripped, so signRequest returns error.MissingCredentials at credentials.zig:361 before the cache block), but every real-world use of the PR-modified static methods with credentials does.
Net user-visible impact today
- The 4-HMAC SigV4 derivation is recomputed on every request (perf only — cache is dead weight).
- ~50 bytes of native heap leaked per S3 sign operation (presign/upload/download/stat/delete/list — anything that reaches
signRequestwith credentials), unbounded over a long-running process. - Latent landmine: a one-line "fix" to (1) — the bug a profiler would surface first — would activate (2) and cause every S3 request after the first per (day, region, secret) to fail with
SignatureDoesNotMatch.
Step-by-step proof of (3), the live leak
for (let i = 0; i < 1000; i++)
Bun.S3Client.presign('k', { accessKeyId: 'AKIA...', secretAccessKey: 'X'.repeat(40), bucket: 'b' });- Iter 1: line 484 writes
"us-east-1" + "s3" + secretintotmp_buffer;key.len = 9+2+40 = 51. Line 486get→date == 0→ miss. Line 490 overwritestmp_buffer[0..44]with"AWS4"+secret;keynow readstmp_buffer[0..51]="AWS4"+secret + secret[33..40](deterministic). Line 495set→ not in map →dupe(key)(51 bytes) stored as the map key. - Iter 2: line 484 writes
"us-east-1s3"+secretagain. Line 486get("us-east-1s3"+secret)→ the only stored key is the corrupted one from iter 1, no match → miss. Recompute. Line 490 corruptskeyidentically. Line 495set(corrupted_key, ...)→getOrPutfinds the existing entry (same corrupted bytes) →putwrites onlyvalue_ptr; the inlinedupe(u8, key)(51 bytes) is never stored, never freed. - Iters 3..1000: identical to iter 2. 999 × 51 bytes leaked.
How to fix
All three must be fixed together:
- (a) Compute the cache key into a separate small stack buffer (e.g.
var key_buf: [128]u8 = undefined; const key = try std.fmt.bufPrint(&key_buf, ...);) so line 490'sbufPrintintotmp_bufferdoes not corrupt it. - (b) Change rare_data.zig:273 to return the value: e.g.
if (this.cache.getPtr(key)) |v| return v[0..];(and keep the?[]const u8return type so credentials.zig:587/669 continue to receive a 32-byte slice). - (c) In
set, usegetOrPutand onlydupewhen!found_existing:const gop = bun.handleOom(this.cache.getOrPut(key)); if (!gop.found_existing) gop.key_ptr.* = bun.handleOom(bun.default_allocator.dupe(u8, key)); gop.value_ptr.* = value;
Fixing only (a) activates (2) and breaks signing; fixing only (c) leaves the cache permanently dead.
Relationship to this PR / not a duplicate
Pre-existing — the PR diff (six tag-flips in S3File.zig + a test) does not touch credentials.zig or rare_data.zig, does not add new signRequest callers, and does not change what data flows there. None of the 34 existing PR comments mention AWSSignatureCache, tmp_buffer, getKey, rare_data.zig, or the SigV4 derived-key cache. Flagged here because it sits on the immediate call path of every PR-modified line (S3File.zig:89 → getPresignUrlFrom → signRequest; S3File.zig:120/160/199/233/585 → executeSimpleS3Request → signRequest) and is in the same per-call-leak defect family already enumerated on this PR, with the added twist that the three defects must be fixed atomically.
57bde20 to
4f49c75
Compare
| var blob = try constructS3FileInternalStore(globalThis, path.path, options); | ||
| path_or_blob = .{ .blob = blob }; // path ownership transferred to store | ||
| defer blob.deinit(); | ||
| return try getPresignUrlFrom(&blob, globalThis, options); |
There was a problem hiding this comment.
🟣 🟣 Pre-existing companion to #3203691082 (guessRegion), ~25 lines above in the same file: guessBucket (credentials.zig:242-267) — called from the writeFormat console.log/Bun.inspect formatters at S3Client.zig:111 and S3File.zig:5 when virtual_hosted_style && endpoint.len > 0 — does return endpoint[start+1..end] where start = indexOf(endpoint, "/") and end = indexOf(endpoint, ".s3.") (line 251) or indexOf(".") (R2 branch, line 264), both scanning from offset 0 with no start+1 <= end guard. The inline comment at line 247 assumes URL-form input (https://bucket-name.s3...), but credentials.endpoint is always normalized via bun.URL.parse(...).hostWithPath() (credentials_jsc.zig:74-77; env_loader.zig:110-115), which strips the protocol and preserves the path — so the only / present comes after .s3.. For endpoint: "https://my-bucket.s3.us-east-1.amazonaws.com/prefix" (normalized to my-bucket.s3.us-east-1.amazonaws.com/prefix): end = 9, start = 36 → endpoint[37..9] → Debug/ReleaseSafe panic, ReleaseFast UB (underflowed .len fed into the {s} formatter at S3Client.zig:113 → wild OOB read). Same for the R2 branch. Reachable via console.log(new Bun.S3Client({ endpoint: "https://b.s3.region.amazonaws.com/prefix", virtualHostedStyle: true })). Not on the call path of the PR-modified static methods (formatter-only); flagged because it sits in the same block as #3203691082's guessRegion fix with the identical defect class. Fix: drop the indexOf("/") logic entirely (the input never has a protocol prefix, so the orelse fallback endpoint[0..end] is always correct), or guard if (start + 1 <= end).
Extended reasoning...
What the bug is and how it manifests
S3Credentials.guessBucket (src/s3_signing/credentials.zig:242-267) — whose doc comment at line 241 reads "This is not used for signing but for console.log output, is just nice to have" — was written assuming URL-form input, per the inline comment at line 247: "its https://bucket-name.s3.region-code.amazonaws.com/key-name". So in the AWS branch it does:
if (strings.indexOf(endpoint, ".s3.")) |end| {
// its https://bucket-name.s3.region-code.amazonaws.com/key-name
const start = strings.indexOf(endpoint, "/") orelse {
return endpoint[0..end];
};
return endpoint[start + 1 .. end]; // ← no start+1 <= end guard
}and the R2 branch at lines 261-264 has the identical structure (end = indexOf("."), start = indexOf("/"), return endpoint[start+1..end]). Both indexOf calls scan from offset 0 of the same string, with no check that start + 1 <= end.
But credentials.endpoint is never in URL form. Both entry points normalize it via bun.URL.parse(...).hostWithPath(): credentials_jsc.zig:74-77 for the JS options.endpoint property, and env_loader.zig:110-115 for S3_ENDPOINT/AWS_ENDPOINT. hostWithPath (url.zig:28-40) strips the scheme/:// prefix entirely but preserves the path component when path.len > 1. So for input "https://my-bucket.s3.us-east-1.amazonaws.com/prefix", the stored endpoint is "my-bucket.s3.us-east-1.amazonaws.com/prefix" — the only / present is the path separator, which is necessarily after .s3.. The indexOf("/") was written to skip past https:/, but that prefix is never present, so when a / is found it is always the wrong one.
Reachability
guessBucket has exactly two call sites, both inside writeFormat (the console.log/Bun.inspect custom formatter), both gated on virtual_hosted_style && endpoint.len > 0:
S3Client.writeFormatat S3Client.zig:111 → result fed intowriter.print("... \"{s}\" ...", .{bucket_name})at line 113-117.S3File'swriteFormatat S3File.zig:5 → result fed intowriter.print("... \"{s}/{s}\" ...", .{bucket_name, s3.path()})at line 8-13.
So the trigger is:
console.log(new Bun.S3Client({
endpoint: "https://my-bucket.s3.us-east-1.amazonaws.com/prefix",
virtualHostedStyle: true,
}));(or Bun.inspect(...), or util.inspect(...), or any console.log(client.file("k"))). The no-/ case — by far the common one — works correctly because indexOf("/") returns null and the orelse return endpoint[0..end] fallback fires.
Why existing safeguards don't prevent it
There is no start + 1 <= end guard, and the function's input-shape assumption (URL with https:// prefix) does not match what its only two call sites pass (host+path with no scheme). Nothing between hostWithPath and guessBucket re-adds the scheme. The R2 branch is even more exposed: end = indexOf(endpoint, ".") matches the first dot (e.g. index 6 for "bucket.account.r2.cloudflarestorage.com/p"), and any path / is at index 39+.
Impact
In Debug/ReleaseSafe builds Zig panics with "start index N is larger than end index M". In ReleaseFast (Bun's shipped build) it is undefined behaviour: the resulting slice has .ptr = endpoint.ptr + start + 1 and .len = end - (start+1) wrapped to ~2^64, which is then fed into the {s} formatter at S3Client.zig:113-117 / S3File.zig:8-13 → std.fmt walks off the end of mapped memory → segfault. A user-config-triggerable process crash from merely console.log-ing an S3Client whose endpoint has a path component — no network call required.
Step-by-step proof
endpoint: "https://my-bucket.s3.us-east-1.amazonaws.com/prefix", virtualHostedStyle: true:
- credentials_jsc.zig:74-77:
bun.URL.parse(...).hostWithPath()→ storedcredentials.endpoint = "my-bucket.s3.us-east-1.amazonaws.com/prefix"(43 bytes). url.zig:30 keeps the path becausepath.len > 1. console.log(client)→S3Client.writeFormat→ S3Client.zig:111:virtual_hosted_style && endpoint.len > 0→guessBucket("my-bucket.s3.us-east-1.amazonaws.com/prefix").- credentials.zig:244:
indexOf(".amazonaws.com")→ matches at index 22 → enter branch. - credentials.zig:246:
indexOf(".s3.")→ matches at index 9 →end = 9. - credentials.zig:248:
indexOf("/")→ matches at index 36 →start = 36(theorelsedoes not fire). - credentials.zig:251:
return endpoint[36+1 .. 9]=endpoint[37..9]. 37 > 9 → Debug/ReleaseSafe: panic; ReleaseFast:.len = 9 - 37 = 18446744073709551588. - S3Client.zig:113:
writer.print("... \"{s}\" ...", .{bucket_name})reads ~2^64 bytes fromendpoint.ptr+37→ segfault.
R2 trace: endpoint = "bucket.account.r2.cloudflarestorage.com/p" → line 255 end = indexOf(".") = 6; line 256 r2_start = 14 ≠ 6 → falls through to line 261 start = indexOf("/") = 39 → line 264 endpoint[40..6] → same crash.
How to fix
The cleanest fix is to drop the indexOf("/") logic entirely in both branches — since the input never has a protocol prefix, the orelse fallback return endpoint[0..end] is always the correct answer:
if (strings.indexOf(endpoint, ".s3.")) |end| {
return endpoint[0..end];
}Alternatively, guard the slice: if (start + 1 <= end) return endpoint[start+1..end] else return endpoint[0..end];. Same shape as #3203691082's recommended fix for guessRegion.
Relationship to this PR / not a duplicate
Pre-existing — the PR diff does not touch credentials.zig, S3Client.zig:111, or S3File.zig:5, and guessBucket is not on the call path of any of the six PR-modified static methods (presign/unlink/write/size/exists/stat — none of those format the blob). It is flagged because it sits ~25 lines above #3203691082's guessRegion (credentials.zig:268-282) in the same file, with the identical "two indexOf() from offset 0, no ordering guard, slice between" defect class already enumerated on this PR (#3202066971 for XML <Code>/<Message>, #3203691082 for guessRegion). #3203691082 covers guessRegion (signRequest path, signing-time crash); this is the formatter-side sibling in guessBucket — different function, different trigger surface (console.log instead of presign/upload), different gate (virtualHostedStyle: true), and a separate fix locus. None of the existing PR comments mention guessBucket, credentials.zig:248-251/261-264, virtualHostedStyle, or the writeFormat/console.log path.
4f49c75 to
aafdbbc
Compare
| } | ||
| const options = args.nextEat(); | ||
| var blob = try constructS3FileInternalStore(globalThis, path.path, options); | ||
| path_or_blob = .{ .blob = blob }; // path ownership transferred to store |
There was a problem hiding this comment.
🟣 🟣 Pre-existing companion to #3199168443/#3200790781 — the response-side parser those request-side leaks feed into: parseS3ListObjectsResult (list_objects.zig:203-206) has an off-by-delete_result_pos loop bound: var i: usize = 0; while (i < xml[delete_result_pos..].len) { if (xml[i] != '<') .... The bound evaluates to xml.len - delete_result_pos but the body indexes xml[i] from offset 0, so the outer loop stops scanning delete_result_pos bytes (~38-40 for the AWS <?xml...?> preamble, more for servers with DOCTYPE/comments) before the end of the body. Any top-level element (<IsTruncated>, <NextContinuationToken>, <CommonPrefixes>) whose opening < falls in that tail window is silently dropped — e.g. a missed <IsTruncated>true makes the user's pagination loop terminate early with no error. The misnamed delete_result_pos (in a list parser) confirms copy-paste origin. Fix: change line 205 to while (i < xml.len). None of the existing comments mention parseS3ListObjectsResult/delete_result_pos/line 205.
Extended reasoning...
What the bug is
parseS3ListObjectsResult (src/runtime/webcore/s3/list_objects.zig:203-206) — reached from S3Client#list()/Bun.S3Client.list() via Store.S3.listObjects → bun.S3.listObjects → S3HttpSimpleTask.onResponse (simple_request.zig:269) — has a loop-bound off-by-delete_result_pos error:
// list_objects.zig:203-206
if (strings.indexOf(xml, "<ListBucketResult")) |delete_result_pos| {
var i: usize = 0;
while (i < xml[delete_result_pos..].len) { // = xml.len - delete_result_pos
if (xml[i] != '<') { // indexes xml[i], NOT xml[delete_result_pos + i]delete_result_pos is the offset of <ListBucketResult in the body (≈38-40 for AWS, after the <?xml ...?> preamble). The loop bound xml[delete_result_pos..].len evaluates to xml.len - delete_result_pos, but every indexing operation inside the outer loop reads xml[i] / xml[i+1..] from offset 0, not from delete_result_pos. So the outer loop scans xml[0 .. xml.len - delete_result_pos) — wastefully re-scanning the XML preamble (harmless) and stopping delete_result_pos bytes before the end of the response.
The variable name delete_result_pos (for a list parser) and the suffix-length-as-end-position confusion strongly suggest this was copy-pasted from a delete-result parser and never adapted.
Concrete consequence
Any top-level element whose opening < falls in the last delete_result_pos bytes of the body is silently dropped from the result — no error, no warning. For real AWS responses (delete_result_pos ≈ 39, trailing </ListBucketResult> = 19 bytes), the at-risk window is the ~20 bytes immediately before the closing root tag — narrow but nonzero. For S3-compatible servers that emit a longer preamble (DOCTYPE, comments, extra whitespace before <ListBucketResult), the truncated region grows proportionally and can swallow an entire trailing <CommonPrefixes> or <IsTruncated> element.
The inner <Contents> sub-loop (line 234+) is bounded by i >= xml.len and uses indexOf(xml[i..], ...) over the full remainder, so once a <Contents> block is entered it is parsed correctly even if it extends into the truncated region — only top-level elements whose opening < is past xml.len - delete_result_pos are lost.
Why existing safeguards do not prevent it
There is no other parse pass over the truncated tail. S3ListObjectsV2Result.toJS() reports only what parseS3ListObjectsResult populated, so e.g. a dropped <IsTruncated>true</IsTruncated> yields result.isTruncated === undefined and the user's pagination loop terminates early with no indication that more keys exist.
Step-by-step proof
Body (167 bytes total): <?xml version="1.0" encoding="UTF-8" standalone="yes"?>\n<!--gen-->\n<ListBucketResult><Name>b</Name><KeyCount>0</KeyCount><IsTruncated>true</IsTruncated></ListBucketResult> — preamble before <ListBucketResult is 67 bytes.
- Line 203:
indexOf(xml, "<ListBucketResult")→delete_result_pos = 67. - Line 205: loop bound =
xml[67..].len = 167 - 67 = 100.istarts at 0. - The outer loop scans
xml[0..100).<IsTruncated>opens at offset 122, which is ≥ 100 → the outer loop never reaches it. result.is_truncatedstaysnull.toJS()skips theisTruncatedproperty entirely.- The user's pagination loop sees
result.isTruncatedfalsy → stops, silently missing all remaining keys in the bucket.
(For canonical AWS preambles the window is narrower but the math is identical; any element opening within delete_result_pos bytes of the end is lost.)
How to fix
Change line 205 to while (i < xml.len) { — the body already indexes xml[i] from 0, so the only change needed is the bound. Optionally also initialize var i: usize = delete_result_pos; to skip the wasted preamble re-scan.
Relationship to this PR / not a duplicate
Pre-existing — the PR diff does not touch list_objects.zig or any listObjects caller, and S3Client.list() is not among the six PR-modified static methods. Flagged because (a) two existing comments on this PR (#3199168443, #3200790781) already direct the author into getListObjectsOptionsFromJS in this same file, ~300 lines below, and into Store.S3.listObjects which calls this parser via onResponse; (b) it is a correctness bug (silent data loss) rather than a leak; and (c) it is the response-side counterpart to those request-side listObjects defects. None of the existing comments mention parseS3ListObjectsResult, delete_result_pos, the loop bound at line 205, or <ListBucketResult> parsing.
| return globalThis.throwInvalidArguments("Expected a S3 or path to upload", .{}); | ||
| } | ||
| var blob = try constructS3FileInternalStore(globalThis, path.path, options); | ||
| path_or_blob = .{ .blob = blob }; // path ownership transferred to store |
There was a problem hiding this comment.
🟣 🟣 Pre-existing: MultiPartUpload.deinit() (multipart.zig:267-303) frees queue/path/proxy/content_type/content_disposition/content_encoding/credentials/uploadid_buffer/multipart_etags/multipart_upload_list then bun.destroy(this), but never calls this.buffered.deinit() — buffered is a bun.io.StreamBuffer (multipart.zig:114) backed by a heap std.array_list.Managed(u8) with a real deinit() at PipeWriter.zig:1224-1229. On the single-file path (processBuffered, multipart.zig:679-695, taken when total bytes < partSize), .body = this.buffered.slice() is borrowed and never moved/reset, so the entire upload body (up to ~5 MiB default partSize) leaks per streaming upload; on the multipart path reset() (line 632) retains ~pageSize. Reachable from this PR-modified S3File.write at line 156→160 via writeFileInternal → uploadStream (client.zig:491+) and via s3blob.writer() → writableStream. Distinct from #3201498669 (credentials), #3200790769 (small-upload Wrapper — different code path via S3.upload), and #3199552705/#3201498672 (destination_blob) — none mention buffered/StreamBuffer. Fix: add this.buffered.deinit(); to MultiPartUpload.deinit() (e.g. between lines 273 and 274).
Extended reasoning...
What the bug is
MultiPartUpload.deinit() (src/runtime/webcore/s3/multipart.zig:267-303) is the destructor for the heap-allocated multipart-upload task created by both S3.uploadStream (client.zig:491+) and S3.writableStream (client.zig:310+). It frees every owned field — queue, path, proxy, content_type/content_disposition/content_encoding, derefs credentials, deinits uploadid_buffer/multipart_etags/multipart_upload_list — and then bun.destroy(this). But it never calls this.buffered.deinit(). buffered (multipart.zig:114) is a bun.io.StreamBuffer = .{}, which is backed by a heap std.array_list.Managed(u8) (PipeWriter.zig:1112) and has a real deinit() at PipeWriter.zig:1224-1229 that does list.clearAndFree(). Grep confirms zero buffered.deinit() calls in multipart.zig — only buffered.reset() (line 632) and this.buffered = .{} (line 654) are ever invoked.
The specific code path that triggers it
Single-file path (worst case — leaks the entire body): processBuffered (multipart.zig:679-695) is taken when ended && buffered.size() < partSize && state == .not_started — i.e. the total streamed body fits in one PUT (default partSize ≈ 5 MiB). It calls executeSimpleS3Request with .body = this.buffered.slice() (line 688). The slice is borrowed: executeSimpleS3Request passes it to AsyncHTTP.init (simple_request.zig) and S3HttpSimpleTask.deinit() does not free it. buffered is never moved (this.buffered = .{}) or reset on this path — state goes straight from .not_started to .singlefile_started, so processMultiPart (the only place that moves/resets buffered) is never entered. After the PUT completes, singleSendUploadResponse → done() (multipart.zig:475-480) → defer this.deref() → rc→0 → deinit() with buffered.list.capacity == <body size> still allocated. Same on the failure path via fail() (multipart.zig:425-447), which cancels parts but never touches buffered.
Multipart path (leaks ~pageSize): processMultiPart either moves the whole buffer into a part via this.buffered = .{} (line 654 — fine) or, when slicing, advances the cursor (line 667) and on drain calls reset() (line 631-632). StreamBuffer.reset() (PipeWriter.zig:1115-1119) does maybeShrink() → shrinkAndFree(pageSize()) + clearRetainingCapacity() — capacity is reduced to ~4 KiB but retained, then leaked at deinit().
Why existing code does not prevent it
The only ownership transfer of buffered is the line-654 whole-buffer move into a part, which the single-file path (state == .not_started, never enters processMultiPart) cannot reach. done() and fail() do not touch buffered. executeSimpleS3Request borrows .body (it does not adopt the allocation). And MultiPartUpload.deinit() simply omits the field — every sibling owned field is freed, but buffered is not.
Reachability from the PR-modified code
S3File.write's .path arm — the PR's tag-flip at S3File.zig:156 — calls Blob.writeFileInternal(globalThis, &blob_internal, data, .{ .extra_options = options }) four lines later at S3File.zig:160. With data a Response/Request whose body is a .Locked ReadableStream (Blob.zig:1561/1624), or a local-file/S3 source (Blob.zig:1353), or large bytes (Blob.zig:1278), this reaches S3.uploadStream (client.zig:491+), which creates the MultiPartUpload. Also reachable via s3blob.writer() → writableStream (client.zig:310-326). For e.g. Bun.write(s3file, Bun.file('/local')) where the file is < partSize, the entire file body is leaked.
Impact
Per streaming S3 upload: up to ~5 MiB (default partSize) leaked on the single-file path, ~4 KiB on the multipart path, on both success and failure. Trivially user-reachable; unbounded native-heap growth in long-running processes that loop Bun.write(s3file, stream) / s3file.writer() for sub-5-MiB payloads. By bytes-per-call, this is the second-largest leak in the subsystem after #3203432290 (the download-side body leak).
Step-by-step proof — Bun.S3Client.write('key', new Response(stream2MB), opts)
S3File.write.patharm (PR line 156): tag-flip; line 160 →writeFileInternal→.Locked+isS3()→uploadStream.- client.zig:491:
bun.new(MultiPartUpload, .{...})(rc=1),buffered = .{}(empty, capacity 0). - Stream pumps 2 MB via
MultiPartUpload.write()(multipart.zig:749/759) →buffered.write(chunk)→buffered.listgrows to 2 MB heap. - Stream ends →
processBuffered(multipart.zig:679):ended && 2MB < 5MB && state == .not_started→state = .singlefile_started; line 684-695:executeSimpleS3Request(.{ ..., .body = this.buffered.slice() /* borrowed 2 MB into list */ }).bufferedis never moved or reset. - PUT completes →
singleSendUploadResponse→.success→done()→defer this.deref()→ rc 1→0 →deinit(). deinit()(multipart.zig:267-303): freesqueue/path/proxy/content_type/content_disposition/content_encoding, derefscredentials, deinitsuploadid_buffer/multipart_etags/multipart_upload_list, neverbuffered.deinit(), thenbun.destroy(this). 2 MB leaked.
(Failure path: same trace, but at step 5 signRequest fails or HTTP returns non-200 → singleSendUploadResponse → fail() (multipart.zig:425-447) → callback + deref() → deinit() — buffered still untouched, same leak.)
How to fix
Add this.buffered.deinit(); to MultiPartUpload.deinit() — e.g. between lines 273 (poll_ref.unref) and 274 (free(this.path)). Safe in all paths: when buffered was moved via line 654, it is .{} (capacity 0) and deinit() is a no-op; when it was reset(), it frees the retained ~pageSize; on the single-file path it frees the full body buffer (which is no longer in use — executeSimpleS3Request has already completed and its task has been deinit'd before MultiPartUpload.deinit() runs).
Relationship to this PR / not a duplicate
Pre-existing — the PR diff does not touch multipart.zig, and the PR's own test (4-byte "data" string at line 18) goes through the small-upload Wrapper (Blob.zig:1296-1342, S3.upload), not MultiPartUpload, so the test does not exercise this leak. Distinct from every existing comment on this PR: #3201498669 covers the credentials field of the same struct (over-ref via dupe()+this.ref()); #3200790769 covers the small-upload Wrapper's missing bun.destroy (different struct, different entry point — S3.upload/executeSimpleS3Request, not MultiPartUpload); #3199552705/#3201498672 cover the destination_blob Store ref in the caller; #3202406763 covers writableStream's missing acl/options. None mention buffered, StreamBuffer, processBuffered, multipart.zig:114/267-303, or PipeWriter.zig:1224. Anchored at S3File.zig:156 alongside the other uploadStream-path comments.
| } | ||
| const options = args.nextEat(); | ||
| var blob = try constructS3FileInternalStore(globalThis, path.path, options); | ||
| path_or_blob = .{ .blob = blob }; // path ownership transferred to store |
There was a problem hiding this comment.
🟣 🟣 Pre-existing companion to #3204455695 (loop bound at line 205), ~80 lines below in the same function: the per-<Contents> loop heap-allocates an XML-entity-decoded ETag at list_objects.zig:284 (bun.default_allocator.alloc(u8, size)) and sets etag_owned = true (line 290), but the only sink (contents.append at line 382-392) is gated on if (object_key) |...| at line 363 with no else-branch and no defer/errdefer. A <Contents> block with <ETag>"..."</ETag> but no <Key> leaks the allocation; a second <ETag> in the same block overwrites etag and leaks the first. Same malformed-server-response threat model as #3202066971/#3204455695. Fix: add else if (etag_owned) { if (etag) |e| bun.default_allocator.free(e); } after line 393, and free any prior owned etag before reassigning at line 284. Not on the call path of any PR-modified static method; flagged because #3204455695 already directs the author into this function.
Extended reasoning...
What the bug is
In parseS3ListObjectsResult (src/runtime/webcore/s3/list_objects.zig), the per-<Contents> inner loop heap-allocates a buffer for the XML-entity-decoded ETag at line 284:
} else if (strings.eql(inner_tag_name_or_tag_end, "ETag")) {
if (strings.indexOf(xml[i..], "</ETag>")) |__tag_end| {
const input = xml[i .. i + __tag_end];
const size = std.mem.replacementSize(u8, input, """, "\"");
var output = try bun.default_allocator.alloc(u8, size); // ← line 284, heap alloc
const len = std.mem.replace(u8, input, """, "\"", output);
if (len != 0) {
etag = output[0 .. input.len - len * 5];
etag_owned = true; // ← line 290, marks as owned
} else { bun.default_allocator.free(output); etag = input; }The only sink for an owned etag is the contents.append(.{ .etag = ... }) at lines 382-392, which is gated on if (object_key) |object_key_val| at line 363. There is no defer/errdefer freeing etag between line 284 and line 382, and no else branch on the if (object_key) check.
Two leak paths
(a) <Contents> without <Key> — if a <Contents> block contains <ETag>"abc"</ETag> (so etag_owned = true) but no <Key> element, object_key stays null at line 363, the append is skipped, and the heap allocation from line 284 is leaked. The next outer-loop iteration declares fresh locals (lines 221-232).
(b) Multiple <ETag> per <Contents> — if a <Contents> block contains two <ETag>"..."</ETag> elements, the second hit re-runs lines 280-297 and overwrites etag with a new allocation without checking/freeing the prior one; the first allocation has no other reference and is leaked.
Both require a malformed/non-conformant S3 response — same threat model as already-accepted comments #3202066971 (close-tag-before-open-tag XML crash) and #3204455695 (longer XML preamble truncation in this same function), since endpoint is a first-class user option (MinIO/R2/LocalStack/proxies).
Why existing safeguards don't prevent it
The per-<Contents> locals at lines 221-232 are plain stack vars with no cleanup; ownership only transfers when execution reaches line 384's if (etag_owned) .fromRawIn(etag_, .init()) else .... Nothing else holds the line-284 allocation. S3ListObjectsV2Result.deinit() (lines 65-73) only frees what made it into result.contents. There is no errdefer covering the allocation, and the <ETag> branch does not check if (etag_owned) before reassigning.
Step-by-step proof — case (a)
S3-compatible endpoint returns 200 with body containing ...<Contents><ETag>"d41d8cd9"</ETag><Size>0</Size></Contents>... (no <Key>):
- simple_request.zig:269:
parseS3ListObjectsResult(body.slice()). - Outer loop reaches
<Contents>→ enters inner loop at line 234 withobject_key = null, etag = null, etag_owned = false(lines 221, 225-226). - Inner loop hits
<ETag>:input = ""d41d8cd9""(20 bytes).replacementSize= 10. Line 284:output = bun.default_allocator.alloc(u8, 10)→ heap ptr E.replacereturns 2 (two"replaced). Line 289:etag = E[0..10], line 290:etag_owned = true. - Inner loop hits
<Size>→ setsobject_size = 0. - Inner loop hits
</Contents>→looking_for_end_tag = false, exits at line 246. - Line 363:
if (object_key) |object_key_val|→object_key == null→ block skipped, append at line 382 never runs. There is no else-branch. - Outer loop continues to next tag. Locals
etag/etag_ownedgo out of scope. E is leaked. list_result.deinit()later frees only what's inresult.contents— E was never appended.
Repeat per malformed <Contents> block; unbounded over a long-running process polling a buggy S3-compatible endpoint.
Impact
A small heap leak (one allocation of replacementSize bytes ≈ ETag length) per malformed <Contents> block in the response. Bounded per request but unbounded over a long-running process that periodically calls S3Client.list() against a non-conformant or buggy S3-compatible endpoint. Lower severity than the crashes in #3202066971/#3203691082 since well-formed AWS/R2/MinIO responses always include <Key> and a single <ETag> per <Contents>.
How to fix
After the inner while (looking_for_end_tag) loop (after line 393), add an else-branch freeing the owned etag when it won't be appended:
if (object_key) |object_key_val| {
...
try contents.append(.{ ..., .etag = ... });
} else if (etag_owned) {
if (etag) |e| bun.default_allocator.free(e);
}and inside the <ETag> branch, free any prior owned allocation before reassigning (e.g. if (etag_owned) { if (etag) |e| bun.default_allocator.free(e); etag = null; etag_owned = false; } immediately before line 283) to handle case (b). Also note that try contents.append(...) at line 382 itself can fail (OOM), which would also leak the owned etag — an errdefer covering both would be more robust.
Relationship to this PR / not a duplicate
Pre-existing — the PR diff does not touch list_objects.zig and listObjects is not among the six PR-modified static methods (presign/unlink/write/size/exists/stat). Flagged because it sits ~80 lines below #3204455695's fix locus (line 205 loop bound) in the same function the author is already directed to edit by three existing comments (#3204455695: line 205 loop bound; #3200790781: getListObjectsOptionsFromJS str.deref leaks; #3199168443: Store.S3.listObjects wrapper leaks). None of those mention etag_owned, line 284's allocation, the object_key == null skip at line 363, or multiple-<ETag> handling — #3204455695 covers the outer-loop bound, this covers the inner-loop resource. Anchored at S3File.zig:118 alongside the other listObjects-family comments.
| return globalThis.throwInvalidArguments("Expected a S3 or path to upload", .{}); | ||
| } | ||
| var blob = try constructS3FileInternalStore(globalThis, path.path, options); | ||
| path_or_blob = .{ .blob = blob }; // path ownership transferred to store |
There was a problem hiding this comment.
🟣 🟣 Pre-existing companion to #3202066971 in the same multipart.zig the author is already directed to edit, ~260 lines below the <UploadId> parse: at multipart.zig:234/568/584, the server-supplied upload_id (stored unbounded at line 498 as slice[start+10..end]) is formatted into a fixed [2048]u8 via std.fmt.bufPrint(...) catch unreachable. A well-formed <UploadId> >~2000 bytes from a user-configured S3-compatible endpoint makes bufPrint return error.NoSpaceLeft → unreachable → panic (Debug/ReleaseSafe) / UB (ReleaseFast) on the very next part/commit/rollback request. Distinct from #3202066971: that needs close-tag-before-open-tag; this just needs a long well-formed UploadId — and #3202066971's recommended fix (search for closing tag after the open tag) does not bound the resulting slice length, so applying it leaves these three sites still crashable. Reachable from the PR-modified S3File.write at line 156→160 via writeFileInternal → uploadStream → MultiPartUpload. Fix: cap upload_id length when storing it at multipart.zig:498 (e.g. fail with InvalidUploadId if > 1024), or replace the three catch unreachable with a graceful this.fail(...) / this.ctx.fail(...).
Extended reasoning...
What the bug is and how it manifests
MultiPartUpload formats the server-returned upload_id into a fixed 2 KiB stack buffer with catch unreachable at three sites:
// multipart.zig:233-237 (UploadPart.perform)
var params_buffer: [2048]u8 = undefined;
const search_params = std.fmt.bufPrint(¶ms_buffer, "?partNumber={}&uploadId={s}&x-id=UploadPart", .{
this.partNumber,
this.ctx.upload_id,
}) catch unreachable;// multipart.zig:567-570 (commitMultiPartRequest)
var params_buffer: [2048]u8 = undefined;
const searchParams = std.fmt.bufPrint(¶ms_buffer, "?uploadId={s}", .{
this.upload_id,
}) catch unreachable;// multipart.zig:583-586 (rollbackMultiPartRequest)
var params_buffer: [2048]u8 = undefined;
const search_params = std.fmt.bufPrint(¶ms_buffer, "?uploadId={s}", .{
this.upload_id,
}) catch unreachable;this.upload_id is set at multipart.zig:498 directly from the InitiateMultipartUpload response body — this.upload_id = slice[start + 10 .. end] — with no length cap. The only check is upload_id.len == 0 at line 501. So an S3-compatible endpoint that returns a well-formed <UploadId>...</UploadId> whose content exceeds ~2000 bytes makes std.fmt.bufPrint return error.NoSpaceLeft, which hits unreachable: in Debug/ReleaseSafe Zig panics ("reached unreachable code"), and in ReleaseFast (Bun's shipped build) it is undefined behaviour.
Why this is distinct from #3202066971
#3202066971 covers the inverted-slice at multipart.zig:496-498 (and the analogous <Code>/<Message> sites): a body like </UploadId><UploadId>X yields slice[start+10..end] with start+10 > end → panic at the slice expression itself. Its recommended fix is to search for the closing tag after the opening tag (indexOf(slice[after..], "</UploadId>")), which guarantees start+10 ≤ end but does not bound end - (start+10). So after applying #3202066971's fix, a well-formed <UploadId> with a 2 KiB+ payload still flows unbounded into this.upload_id, and the three catch unreachable sites at 234/568/584 remain crashable. The trigger is also different: #3202066971 needs malformed XML (close-before-open); this just needs a long but syntactically-valid UploadId.
Why existing safeguards do not prevent it
There is no length validation on upload_id — only the .len == 0 check at line 501. The three bufPrint calls assume the formatted string fits in 2048 bytes and convert any overflow into unreachable. Nothing between line 498 and lines 234/568/584 truncates or validates the slice.
Reachability from the PR-modified code
S3File.write's .path arm (PR's tag-flip at S3File.zig:156) calls Blob.writeFileInternal four lines later at S3File.zig:160. With a stream/file body or large bytes (≥ partSize), this reaches S3.uploadStream (client.zig:491+) → bun.new(MultiPartUpload, ...) → startMultiPartRequest issues the InitiateMultipartUpload POST → startMultiPartRequestResult (multipart.zig:496-498) parses the response and stores upload_id → drainEnqueuedParts → UploadPart.perform (line 234) on the very next part. Also reachable via s3blob.writer() → writableStream and fetch("s3://...", {method:"PUT", body:stream}). The PR's own test (4-byte "data" string) goes through the small-upload path, not MultiPartUpload, so it does not exercise this.
Impact
A malicious or buggy S3-compatible endpoint can crash any Bun process performing a multipart upload to it with a single InitiateMultipartUpload response. Same threat model as already-accepted #3202066971: real AWS UploadIds are ~80-100 bytes, so practical exposure is limited to user-configured endpoints (MinIO/R2/LocalStack/proxies/MITM on non-TLS), but endpoint is a first-class user option and a server-input-controlled process crash is the same severity class #3202066971 was accepted under.
Step-by-step proof
await Bun.S3Client.write("key", new Response(streamOf6MB), {
endpoint: "http://attacker.example", bucket: "b",
accessKeyId: "x", secretAccessKey: "y",
});The endpoint responds to the POST /b/key?uploads (InitiateMultipartUpload) with HTTP 200 and body <InitiateMultipartUploadResult><UploadId> + "A".repeat(2100) + </UploadId></InitiateMultipartUploadResult>:
S3File.write(S3File.zig:156 tag-flip → 160) →writeFileInternal→.Locked+isS3()→uploadStream→MultiPartUpload(rc=1).- 6 MB > 5 MiB partSize →
processBuffered→state = .wait_stream_check→startMultiPartRequestissues the POST. - Response arrives →
startMultiPartRequestResult(multipart.zig:489+).failIfContainsError(200)→ false (no<Error>). Line 496:indexOf("<UploadId>")→start. Line 497:indexOf("</UploadId>")→end. Line 498:this.upload_id = slice[start+10..end]— 2100 bytes. - Line 501:
upload_id.len == 0→ false. Line 511:state = .multipart_completed. Line 513:drainEnqueuedParts(0)→ enqueues part 1 →UploadPart.start→UploadPart.perform. - multipart.zig:233-237:
std.fmt.bufPrint(&[2048]u8, "?partNumber=1&uploadId={s}&x-id=UploadPart", .{ /* 2100-byte upload_id */ })— needed length ≈ 12+1+10+2100+16 = 2139 > 2048 → returnserror.NoSpaceLeft→catch unreachable→ Debug/ReleaseSafe:@panic("reached unreachable code"); ReleaseFast: UB. - Bun process crashes mid-upload before any part is sent.
(If the user calls writer.flush()/end() after a failure instead, commitMultiPartRequest at 568 or rollbackMultiPartRequest at 584 hit the same catch unreachable with the ?uploadId={s} format — threshold ~2038 bytes.)
How to fix
Either (a) bound the slice when storing it at multipart.zig:498-509:
this.upload_id = slice[start + 10 .. end];
if (this.upload_id.len == 0 or this.upload_id.len > 1024) {
try this.fail(.{ .code = "UnknownError", .message = "Failed to initiate multipart upload (invalid UploadId)" });
return;
}or (b) replace the three catch unreachable with a graceful failure that routes through the existing fail() machinery, e.g. at line 237: }) catch return this.ctx.fail(.{ .code = "InvalidUploadId", .message = "upload id too long" }); (and this.fail(...) at 570/586). Option (a) is one edit at the source and also hardens against future consumers of upload_id.
Relationship to this PR / not a duplicate
Pre-existing — the PR diff does not touch multipart.zig, does not add new MultiPartUpload callers, and the tag-flip neither causes nor affects this. Distinct from #3202066971 (inverted-slice at multipart.zig:496-498 — different trigger, different crash site, and #3202066971's fix does not address these three catch unreachable sites) and from #3204455708 (buffered.deinit() leak in the same struct). None of the existing PR comments mention params_buffer, bufPrint ... catch unreachable, NoSpaceLeft, multipart.zig:234/568/584, or upload_id length bounding. Anchored at S3File.zig:156 alongside the other MultiPartUpload-path comments (#3201498669/#3204455708/#3202066991).
| var blob = try constructS3FileInternalStore(globalThis, path.path, options); | ||
| path_or_blob = .{ .blob = blob }; // path ownership transferred to store | ||
| defer blob.deinit(); | ||
| return try getPresignUrlFrom(&blob, globalThis, options); |
There was a problem hiding this comment.
🟣 🟣 Pre-existing: in signRequest's bucket-guessing block (credentials.zig:388-396, reached from this PR-modified line via getPresignUrlFrom → signRequest when no bucket is configured), the inner if (backslash_index < end) branch's assignments at lines 391-392 are dead code — lines 395-396 (bucket = full_path[0..end]; path = full_path[end+1..];) run unconditionally afterward and overwrite both. So for a path like Bun.S3Client.presign("my-bucket\\folder/file.txt") with no bucket configured, the intended split (bucket=my-bucket, key=folder/file.txt) is computed and immediately clobbered to bucket=my-bucket\\folder (→ percent-encoded to my-bucket%5Cfolder), key=file.txt — the request is signed against a nonexistent bucket. The else if at line 397 (backslash-only case) and the leading-separator strip at 376-380 confirm backslash is an intended separator, so this is a missing-else overwrite, not vestigial logic. Same defect class as #2996547003 (Store.S3.path() no-op assignment). Fix: wrap lines 395-396 in an else so the backslash-first split is not clobbered.
Extended reasoning...
What the bug is and how it manifests
In signRequest's bucket-guessing block (src/s3_signing/credentials.zig:388-396), entered when !virtual_hosted_style && bucket.len == 0, the inner branch that handles "backslash before forward-slash" is unconditionally overwritten:
if (strings.indexOf(full_path, "/")) |end| {
if (strings.indexOf(full_path, "\\")) |backslash_index| {
if (backslash_index < end) {
bucket = full_path[0..backslash_index]; // ← dead: overwritten by line 395
path = full_path[backslash_index + 1 ..]; // ← dead: overwritten by line 396
}
}
bucket = full_path[0..end]; // always runs
path = full_path[end + 1 ..]; // always runs
} else if (strings.indexOf(full_path, "\\")) |backslash_index| {
bucket = full_path[0..backslash_index];
path = full_path[backslash_index + 1 ..];
}There is no else/return/break between the inner block at 391-392 and the outer assignments at 395-396, so whenever the outer if matches (any path containing /), lines 395-396 execute regardless of whether 391-392 just ran. The clear intent — use whichever separator (/ or \) comes first as the bucket/key boundary — is defeated whenever both are present and the backslash comes first. The else if at line 397 (backslash-only path) and the leading-separator normalization at lines 376-380 both treat \ as a first-class separator, confirming that the inner branch is intended functionality, not vestigial.
The specific code path that triggers it
Bun.S3Client.presign("my-bucket\\folder/file.txt") (or any of the other PR-modified static methods — unlink/write/size/exists/stat) with no bucket configured (no bucket option, no AWS_BUCKET/S3_BUCKET env, and virtualHostedStyle not set). From the PR-modified S3File.zig:89, getPresignUrlFrom → credentialsWithOptions.credentials.signRequest(...) (S3File.zig:505) → credentials.zig:386: bucket.len == 0 → enters the guessing block.
Why existing safeguards don't prevent it
normalizeName at line 407 only trims leading/trailing separators, so the embedded backslash in "my-bucket\\folder" survives. encodeURIComponent at line 415 percent-encodes it to my-bucket%5Cfolder rather than rejecting it. Nothing downstream re-derives the bucket from the key.
Impact
When both \ and / are present in the path with the \ first, the request is signed (and sent) against bucket <prefix-up-to-/> instead of <prefix-up-to-\>. For presign, the user gets a URL pointing at a percent-encoded nonexistent bucket; for the operations that hit the network (unlink/write/size/exists/stat), the user gets NoSuchBucket / SignatureDoesNotMatch instead of the intended request. Practical exposure is narrow — backslash-as-separator in S3 paths is uncommon outside Windows-style inputs — but the dead-code analysis is unambiguous.
Step-by-step proof
Bun.S3Client.presign("my-bucket\\folder/file.txt"), no bucket/env configured:
full_path = "my-bucket\\folder/file.txt". Lines 376-380: does not start with/or\→ unchanged.- Line 386:
bucket.len == 0and!virtual_hosted_style→ enter guessing block. - Line 388:
indexOf(full_path, "/")→ matches at index 16 →end = 16. - Line 389:
indexOf(full_path, "\\")→ matches at index 9 →backslash_index = 9. - Line 390:
9 < 16→ true. Lines 391-392:bucket = "my-bucket",path = "folder/file.txt". (Correct, intended result.) - Lines 395-396 run unconditionally:
bucket = full_path[0..16] = "my-bucket\\folder",path = full_path[17..] = "file.txt". (Step 5's result is clobbered.) - Line 407:
normalizeName("my-bucket\\folder")— embedded\is not leading/trailing → unchanged. - Line 415:
encodeURIComponent→ bucket becomes"my-bucket%5Cfolder". - The presigned URL targets
/my-bucket%5Cfolder/file.txtinstead of/my-bucket/folder/file.txt.
How to fix
Wrap the forward-slash assignments in an else so they only run when the backslash-first split did not:
if (strings.indexOf(full_path, "/")) |end| {
if (strings.indexOf(full_path, "\\")) |backslash_index| {
if (backslash_index < end) {
bucket = full_path[0..backslash_index];
path = full_path[backslash_index + 1 ..];
} else {
bucket = full_path[0..end];
path = full_path[end + 1 ..];
}
} else {
bucket = full_path[0..end];
path = full_path[end + 1 ..];
}
} else if (strings.indexOf(full_path, "\\")) |backslash_index| { ... }(or equivalently, compute min(end, backslash_index) and split once).
Relationship to this PR / not a duplicate
Pre-existing — the PR diff does not touch credentials.zig, does not add new signRequest callers, and the tag-flip changes do not affect what data flows to this block. Same defect class as already-accepted #2996547003 (Store.S3.path() trailing-slash no-op — a separator-handling assignment that doesn't do what it looks like it does). On the immediate call path of every PR-modified line (S3File.zig:89 → getPresignUrlFrom → signRequest:385-403; S3File.zig:120/160/199/233/585 → executeSimpleS3Request → signRequest). None of the existing PR comments mention credentials.zig:388-396, the dead backslash branch, or backslash-vs-forward-slash bucket inference; #3203691082 (guessRegion) and #3204245320 (guessBucket) are different functions with different defects (inverted-slice crashes, not dead-code overwrite).
| return globalThis.throwInvalidArguments("Expected a S3 or path to check if it exists", .{}); | ||
| } | ||
| var blob = try constructS3FileInternalStore(globalThis, path.path, options); | ||
| path_or_blob = .{ .blob = blob }; // path ownership transferred to store |
There was a problem hiding this comment.
🟣 🟣 Pre-existing: every S3 env-proxy lookup hardcodes getHttpProxy(true, null, null) (S3File.zig:429/445/461 — called from this PR-modified exists at line 233, stat at 585, size at 199 — plus Store.zig:359/417, Blob.zig:245/1125/1267/1557/1621/2434/2436/2440/2601/2815, ReadableStream.zig:346). is_http=true selects http_proxy/HTTP_PROXY (env_loader.zig:181-190) instead of https_proxy/HTTPS_PROXY (191-200), but signRequest defaults to HTTPS (insecure_http: bool = false, credentials.zig:15/418); and hostname=null makes the NO_PROXY check at env_loader.zig:203 unreachable. Net: a user with only HTTPS_PROXY set has every Bun S3 request bypass the proxy, and NO_PROXY=localhost is ignored for MinIO/LocalStack endpoints. By contrast fetch() uses getHttpProxyFor(url) (env_loader.zig:163-165), which derives is_http from the scheme and passes the real hostname. Fix: centralize the lookup in executeSimpleS3Request (the signed URL is already parsed at simple_request.zig:421) using env.getHttpProxyFor(url), or derive is_http from credentials.insecure_http and pass the endpoint hostname at each site. Orthogonal to this PR's tag-flip — none of the 16 lookup lines are in the diff.
Extended reasoning...
What the bug is and how it manifests
Every S3 call site that derives an HTTP proxy from the environment hardcodes getHttpProxy(true, null, null). The first argument is_http=true makes getHttpProxy (env_loader.zig:181-190) read http_proxy/HTTP_PROXY; the else branch at 191-200 that reads https_proxy/HTTPS_PROXY is never reached. But signRequest builds the actual request URL with const protocol = if (this.insecure_http) "http" else "https" (credentials.zig:418), and insecure_http defaults to false (credentials.zig:15) — so unless the user explicitly opts into insecure HTTP, every S3 connection is HTTPS. The proxy env var consulted (http_proxy) therefore does not match the scheme actually used.
Separately, the third/second arguments hostname=null, host=null cause the NO_PROXY gate at env_loader.zig:203 (if (http_proxy != null and hostname != null)) to be skipped unconditionally — isNoProxy is never invoked for any S3 request.
Affected sites
All 16 sites hardcode the identical (true, null, null) arguments:
S3File.zig:429/445/461—S3BlobStatTask.exists/stat/size, called directly from the PR-modified lines 233/585/199.Store.zig:359(Store.S3.unlink),Store.zig:417(Store.S3.listObjects).Blob.zig:245/1125/1267/1557/1621/2434/2436/2440/2601/2815.ReadableStream.zig:346.
By contrast, fetch() for ordinary URLs uses getHttpProxyFor(url) (env_loader.zig:163-165), which does this.getHttpProxy(url.isHTTP(), url.hostname, url.host) — deriving is_http from the actual scheme and passing the real hostname so NO_PROXY is honored. That confirms the intended contract; the S3 call sites simply bypass it.
Why existing safeguards do not prevent it
There is no later re-derivation of the proxy from the actual scheme/host. The proxy_url returned by these lookups is passed straight into executeSimpleS3Request / S3.stat / S3.upload / uploadStream and ultimately into the HTTP client. Nothing between the env lookup and the socket connect re-evaluates https_proxy or NO_PROXY. The signed URL is parsed at simple_request.zig:421 after the proxy has already been chosen (or nulled) by the caller, so the correct scheme/host information exists but is not used for proxy selection.
Impact
HTTPS_PROXYis ignored. A user in a corporate/CI egress-restricted network withHTTPS_PROXY=http://corp-proxy:3128set (the conventional env var for HTTPS traffic) but noHTTP_PROXYwill have every Bun S3 request attempt a direct HTTPS connection to*.amazonaws.com—ECONNREFUSED/timeout. Conversely, a user with onlyHTTP_PROXY(intended for plain-HTTP traffic) will have S3's HTTPS requests routed through it, which may be unintended.NO_PROXYis ignored. Withhostname=null, theisNoProxycheck is unreachable. A user withHTTP_PROXY=http://proxy:3128set globally andNO_PROXY=localhost,127.0.0.1(standard local-dev MinIO/LocalStack setup) will still have all S3 requests toendpoint: "http://localhost:9000"tunneled through the corporate proxy — which typically cannot reachlocalhost.
Step-by-step proof
export HTTPS_PROXY=http://corp-proxy:3128
# (no HTTP_PROXY)await Bun.S3Client.exists("key", { bucket: "b", accessKeyId: "x", secretAccessKey: "y" });S3File.exists.patharm (PR-modified S3File.zig:229-233):constructS3FileInternalStoresucceeds; tag-flip;S3BlobStatTask.exists(globalThis, &blob).- S3File.zig:429:
env.getHttpProxy(true, null, null). - env_loader.zig:181:
is_http == true→ readshttp_proxy(unset) thenHTTP_PROXY(unset) →http_proxylocal staysnull. Theelsebranch at 191-200 (which would have readHTTPS_PROXY) is not taken. - env_loader.zig:203:
http_proxy == null→ skip; even if it were non-null,hostname == null→isNoProxyis never called. Returnsnull. S3.stat→executeSimpleS3Requestwithproxy_url = null.signRequestbuildshttps://b.s3.<region>.amazonaws.com/key(credentials.zig:418,insecure_httpdefaultsfalse).- The HEAD request is issued directly to AWS over HTTPS, bypassing
HTTPS_PROXYentirely. In an egress-restricted network →ECONNREFUSED/ timeout.
NO_PROXY case: HTTP_PROXY=http://proxy:3128 NO_PROXY=localhost + { endpoint: "http://localhost:9000" } → step 3 returns the proxy URL (http_proxy is set); step 4: hostname == null → isNoProxy skipped → proxy URL returned. The request to localhost MinIO is incorrectly tunneled through the corporate proxy.
How to fix
Derive is_http from credentials.insecure_http (or the resolved endpoint URL's scheme) and pass the endpoint hostname so NO_PROXY is honored. The cleanest approach is to centralize the lookup inside executeSimpleS3Request — the signed URL is already known there (const url = bun.URL.parse(result.url) at simple_request.zig:421), so calling env.getHttpProxyFor(url) at that point and using it as the default when no explicit proxy was passed would fix all 16 call sites at once. Alternatively, change each call site to env.getHttpProxy(!credentials.insecure_http == false, endpoint_hostname, endpoint_host) — but that is 16 edits and still requires threading the endpoint host to each site.
Relationship to this PR / not a duplicate
Pre-existing — the PR diff (six path_or_blob = .{ .blob = blob } tag-flips + a test) does not touch any of the 16 proxy-lookup lines, does not add new callers, and does not change what data flows to getHttpProxy. The connection to the PR is purely transitive (the PR-modified exists/stat/size/unlink/write static methods eventually reach S3File.zig:429/445/461 and Store.zig:359). None of the 42 existing comments on this PR mention getHttpProxy, http_proxy/https_proxy env-var selection, the hardcoded is_http=true, or NO_PROXY — the closest (#3202768590) covers fetch.zig:1373's explicit {proxy:"…"} buffer-join overflow, which is the user-supplied-proxy path and unrelated to env-derived proxy selection. Given this PR is already heavily loaded with pre-existing S3-subsystem findings, this is best tracked as a separate issue rather than blocking the memory-safety fix.
| var blob = try constructS3FileInternalStore(globalThis, path.path, options); | ||
| path_or_blob = .{ .blob = blob }; // path ownership transferred to store | ||
| defer blob.deinit(); | ||
| return try blob.store.?.data.s3.unlink(blob.store.?, globalThis, options); |
There was a problem hiding this comment.
🟣 🟣 Pre-existing companion to #3200173422 (which already targets the sessionToken block at credentials_jsc.zig:115-129): getCredentialsWithOptions applies a containsNewlineOrCR guard to the three options that flow into raw picohttp.Header values — contentDisposition (line 192), type (line 211), contentEncoding (line 230) — but not to sessionToken (lines 120-124), which flows verbatim into the x-amz-security-token header at credentials.zig:723-726 on the non-presign path used by unlink/write/size/exists/stat (the presign branch is safe via encodeURIComponent at credentials.zig:502). Threat model is narrow (sessionToken is a credential, and whether AsyncHTTP ultimately rejects CRLF in header values was not traced end-to-end), but the validation-layer asymmetry is real and the fix is a one-line addition while applying #3200173422: after toUTF8, add if (containsNewlineOrCR(slice)) return globalObject.throwInvalidArguments("sessionToken must not contain newline characters (CR/LF)", .{});.
Extended reasoning...
What the bug is and how it manifests
S3Credentials.getCredentialsWithOptions (src/runtime/webcore/s3/credentials_jsc.zig) is the validation layer for user-supplied S3 options. It applies a containsNewlineOrCR guard to exactly the three string options that flow into raw HTTP header values — contentDisposition (line 192 → Content-Disposition at credentials.zig:737), type (line 211 → Content-Type via simple_request.zig:402), and contentEncoding (line 230 → Content-Encoding at credentials.zig:744) — each throwing "… must not contain newline characters (CR/LF)". But the sessionToken block at lines 115-129 has no such guard:
if (str.tag != .Empty and str.tag != .Dead) {
new_credentials._sessionTokenSlice = str.toUTF8(bun.default_allocator);
new_credentials.credentials.sessionToken = new_credentials._sessionTokenSlice.?.slice();
new_credentials.changed_credentials = true;
}sessionToken then flows to credentials.zig:355 (session_token = this.sessionToken) → 723-726 on the non-signQuery branch:
if (session_token) |token| {
const session_token_value = bun.handleOom(bun.default_allocator.dupe(u8, token));
result.session_token = session_token_value;
result._headers[result._headers_len] = .{ .name = "x-amz-security-token", .value = session_token_value };— a plain dupe straight into a picohttp.Header.value, with no encoding or validation. Headers.fromPicoHttpHeaders (Headers.zig:106-110) then memcpys it verbatim into the request buffer.
Why this is the buggy block, not intentional
The three siblings establish the pattern: any user-supplied option that becomes a raw picohttp.Header.value gets a containsNewlineOrCR guard. sessionToken is the only such option without one, yet it lands in the same result._headers array two slots earlier (credentials.zig:726 vs 737/744). Guarding the three less-sensitive content-metadata fields but not the credential field that goes into the same header array confirms oversight rather than design. The presign (signQuery) branch is safe — credentials.zig:502 percent-encodes the token via encodeURIComponent before placing it in the query string — which further confirms that CR/LF in this field is not expected to be accepted.
Reachability from the PR-modified code
Bun.S3Client.unlink("k", { sessionToken: "a\r\nb", ... }) → S3File.zig:117 constructS3FileInternalStore → constructS3FileWithS3Credentials → credentials_jsc.zig:115-124 stores the raw value (no CRLF check) → S3File.zig:118 (the PR's tag-flip) → S3File.zig:120 Store.S3.unlink → executeSimpleS3Request → signRequest non-signQuery branch → credentials.zig:726 emits the unvalidated value as the x-amz-security-token header. Same path for write/size/exists/stat (S3File.zig:160/199/233/585). presign (S3File.zig:89) takes the signQuery branch, which encodeURIComponents the token at credentials.zig:502 → safe.
Threat model / impact
Header injection / request smuggling when an application forwards sessionToken from a less-trusted boundary — e.g. a multi-tenant S3 gateway that accepts caller-provided STS credentials and proxies to a shared endpoint, or an app reading AWS_SESSION_TOKEN from a child-process/CI environment it does not fully control. Bun.S3Client.unlink("k", { sessionToken: "tok\r\nX-Amz-Copy-Source: /victim-bucket/secret\r\nIgnore: ", ... }) would, if the HTTP client serializes header values verbatim, splice arbitrary headers into the signed request.
Two honest caveats: (a) sessionToken is a credential, not user-content metadata — an attacker who controls it typically controls the whole STS triple, and applications would not normally source it from untrusted input, so the multi-tenant scenario where only sessionToken is attacker-influenced is plausible but uncommon; (b) whether Bun's AsyncHTTP writer ultimately rejects CR/LF in header values was not traced end-to-end here, and the SigV4 canonical request at credentials.zig:851 (x-amz-security-token:{s}\n) would include the CR/LF, likely producing a signature mismatch at AWS even if the bytes reach the wire. The point is that getCredentialsWithOptions is the documented validation layer — it guards the other three header-bound options — and sessionToken slipped through.
Step-by-step proof
Bun.S3Client.unlink("key", {
sessionToken: "valid-token\r\nX-Injected: 1",
bucket: "b", accessKeyId: "x", secretAccessKey: "y",
});- credentials_jsc.zig:115:
getTruthyComptime("sessionToken")→ the string. Line 117:isString()→ true. - Lines 120-122:
toUTF8→ slice ="valid-token\r\nX-Injected: 1". NocontainsNewlineOrCRcheck. Stored incredentials.sessionToken. - By contrast,
{ contentDisposition: "a\r\nb" }would throw at line 192-193 in the same function. Store.S3.unlink→executeSimpleS3Request→signRequest(..., null)→ non-signQuerybranch.- credentials.zig:723-726:
dupe(u8, "valid-token\r\nX-Injected: 1")→result._headers[n] = .{ .name = "x-amz-security-token", .value = <CRLF-containing bytes> }. Headers.fromPicoHttpHeadersmemcpys the value verbatim. The CRLF-containing value reaches the HTTP request serialization layer with no validation having been applied at the S3 options layer.
How to fix
In the same block #3200173422 already directs the author to edit (credentials_jsc.zig:120-124), add the guard the three siblings have:
new_credentials._sessionTokenSlice = str.toUTF8(bun.default_allocator);
const slice = new_credentials._sessionTokenSlice.?.slice();
if (containsNewlineOrCR(slice)) {
return globalObject.throwInvalidArguments("sessionToken must not contain newline characters (CR/LF)", .{});
}
new_credentials.credentials.sessionToken = slice;Relationship to this PR / not a duplicate
Pre-existing — the PR diff (six tag-flips in S3File.zig + a test) does not touch credentials_jsc.zig or credentials.zig, does not change what flows to getCredentialsWithOptions, and does not increase the surface area for this defect. The connection to the PR is purely transitive (every PR-modified function calls constructS3FileInternalStore → getCredentialsWithOptions, equally true before the PR). Flagged here because #3200173422 already directs the author to credentials_jsc.zig:126 in the same sessionToken block (for the unrelated "bucket" copy-paste error message); this is a different defect (missing CRLF validation vs. wrong literal) at lines 120-124 of that same block, and a natural one-line addition while applying that fix. None of the existing PR comments mention containsNewlineOrCR, CR/LF validation, header injection, or x-amz-security-token. Given the volume of pre-existing findings already on this PR, this is best applied alongside #3200173422 rather than tracked separately.
| var blob = try constructS3FileInternalStore(globalThis, path.path, options); | ||
| path_or_blob = .{ .blob = blob }; // path ownership transferred to store | ||
| defer blob.deinit(); | ||
| return try getPresignUrlFrom(&blob, globalThis, options); |
There was a problem hiding this comment.
🟣 🟣 Pre-existing: signRequest (credentials.zig:412-416, reached from this PR-modified line via getPresignUrlFrom → signRequest, and from S3File.zig:120/160/199/233/585 via executeSimpleS3Request) sizes the encoding buffers for raw S3 limits but writes percent-encoded output into them: var path_buffer: [1024]u8; path = encodeURIComponent(path, &path_buffer, false) catch return error.InvalidPath;. encodeURIComponent (credentials.zig:290-322) expands every byte outside RFC-3986 unreserved+/ to %XX (3×) and returns error.BufferTooSmall on overflow → caught as error.InvalidPath. So a valid 1024-byte UTF-8 key (AWS limit) with significant non-ASCII content — e.g. ~342 CJK chars, or any key whose encoded form exceeds 1024 bytes — is rejected by every Bun S3 API with a misleading ERR_S3_INVALID_PATH. normalized_path_buffer: [1024+63+2]u8 (line 412) similarly omits extra_path (the endpoint path component, up to ~2047 bytes per the endpoint.len < 2048 check at line 425), so path-prefixed endpoints like http://minio.local:9000/tenant-a/v1 plus a moderate key also overflow at line 452. This is a graceful failure (no crash/UB), just an undersized buffer with a misleading error. Fix: enlarge path_buffer to [1024*3]u8 and normalized_path_buffer to account for extra_path + 3× key expansion, or use a stack-fallback allocator like the signQuery branch already does. Orthogonal to this PR's tag-flip — credentials.zig is not in the diff; given the 44 pre-existing findings already here, best tracked as a follow-up issue.
Extended reasoning...
What the bug is and how it manifests
signRequest (src/s3_signing/credentials.zig:412-416) sizes its path-encoding stack buffers for the raw S3 limits but writes the percent-encoded output into them:
var normalized_path_buffer: [1024 + 63 + 2]u8 = undefined; // 1024 max key size and 63 max bucket name
var path_buffer: [1024]u8 = undefined;
var bucket_buffer: [63]u8 = undefined;
bucket = encodeURIComponent(bucket, &bucket_buffer, false) catch return error.InvalidPath;
path = encodeURIComponent(path, &path_buffer, false) catch return error.InvalidPath;encodeURIComponent (credentials.zig:290-322) percent-encodes every byte outside RFC-3986 unreserved (A-Za-z0-9-_.~) plus / and \ to %XX (3 bytes per input byte, lines 309-316), and returns error.BufferTooSmall (lines 297/304/309) when the output buffer fills — caught here as error.InvalidPath.
There are two undersized buffers:
(a) path_buffer: [1024]u8 — AWS S3 keys may be up to 1024 raw UTF-8 bytes. A key whose percent-encoded form exceeds 1024 bytes is rejected even though the raw key is within AWS's documented limit. For non-ASCII content this happens early: a 3-byte CJK character encodes to 9 bytes (%E4%BB%B7), so ~114 CJK characters (342 raw bytes) already produces 1026 encoded bytes; ~342 spaces (342 raw bytes → 1026 encoded) likewise overflows.
(b) normalized_path_buffer: [1089]u8 — the inline comment accounts for key (1024) + bucket (63) + 2 separators, but the bufPrint at line 452 writes "{s}/{s}/{s}", .{extra_path, bucket, path}, where extra_path = this.endpoint[index..] (line 429) is the endpoint's path component. The only bound on the endpoint is endpoint.len < 2048 (line 425), so extra_path can be up to ~2047 bytes. A path-prefixed endpoint like http://minio.local:9000/tenant-a/storage/v1 plus a moderate-length key overflows here.
(bucket_buffer: [63]u8 has the same shape issue but is fine in practice — bucket names are restricted to [a-z0-9.-], all unreserved.)
The specific code path that triggers it
Every Bun S3 API reaches signRequest. From the PR-modified lines: S3File.zig:89 → getPresignUrlFrom → S3File.zig:505 → signRequest; S3File.zig:120/160/199/233/585 → executeSimpleS3Request (simple_request.zig:378) → signRequest. Also reachable from every S3Client.zig instance method, Bun.s3()/Bun.file("s3://..."), fetch("s3://..."), and MultiPartUpload.
Why existing safeguards don't prevent it
encodeURIComponent correctly bounds-checks (lines 297/304/309), so this is a graceful failure — no overflow, crash, or UB. But there is no fallback to a larger buffer, and the surfaced error (ERR_S3_INVALID_PATH: "Invalid path" via throwSignError) is misleading: the path is valid per AWS, the buffer is just too small for its encoded form. Nothing upstream limits the key to "1024 bytes after percent-encoding" — the user's key flows to signRequest verbatim from Store.S3.path().
Impact
Valid S3 object keys with significant non-ASCII content — CJK filenames, emoji, spaces/punctuation-heavy keys, URL-encoded user content — are rejected by every Bun S3 API (presign/unlink/write/size/exists/stat/download/list/multipart) with a misleading "Invalid path" error, despite being well within AWS's documented 1024-byte key limit. Same for path-style/MinIO/tenant-prefixed endpoints with non-trivial path prefixes. This is a functional limitation, not a memory-safety issue.
Step-by-step proof
Bun.S3Client.presign("价格表/" + "文".repeat(120), {
bucket: "b", accessKeyId: "x", secretAccessKey: "y",
});- The key is
"价格表/"(10 bytes: 3×3 + 1) + 120 ×"文"(3 bytes each) = 370 raw UTF-8 bytes — well under AWS's 1024-byte limit. S3File.presign→constructS3FileInternalStore→ store withpathlike= the key. Tag-flip at S3File.zig:87.getPresignUrlFrom(line 89) →signRequest(line 505).- credentials.zig:416:
encodeURIComponent(path, &path_buffer /* [1024]u8 */, false). The 9 CJK bytes in"价格表"encode to 27; the/passes through (1 byte); each"文"(3 bytes) encodes to 9. Total encoded length = 27 + 1 + 120×9 = 1108 bytes. - After writing 1023 bytes, the next iteration hits line 309:
if (written + 3 > buffer.len)→1023 + 3 > 1024→ returnserror.BufferTooSmall. - credentials.zig:416:
catch return error.InvalidPath. getPresignUrlFrom(S3File.zig:514) catchessign_err→S3.throwSignError(error.InvalidPath, globalThis)→ user seesERR_S3_INVALID_PATH: "Invalid path"for a key that AWS would accept.
For (b): endpoint: "http://minio.local:9000/tenants/acme-corp/storage/v1" (path component /tenants/acme-corp/storage/v1, 29 bytes) + bucket "my-bucket" (9) + an encoded key of 1024 bytes → bufPrint at line 452 needs 29+1+9+1+1024 = 1064 bytes into a 1089-byte buffer (fits) — but with a longer tenant prefix or a 3×-expanded key the 1089-byte ceiling is easily exceeded, again surfacing as error.InvalidPath.
How to fix
Enlarge path_buffer to [1024*3]u8 (worst-case 3× expansion of a 1024-byte key) and normalized_path_buffer to [1024*3 + 63 + 2048 + 2]u8 (encoded key + bucket + max extra_path + separators). Alternatively, switch to a std.heap.stackFallback/heap allocator the way the signQuery branch already does for its query-param assembly (credentials.zig:527/590), so over-long inputs allocate instead of failing.
Relationship to this PR / not a duplicate
Pre-existing — the PR diff (six path_or_blob = .{ .blob = blob } tag-flips in S3File.zig + a test) does not touch credentials.zig, does not add new signRequest callers, and does not change what data flows there. The connection is purely transitive (every PR-modified function called signRequest before the PR too). None of the 44 existing comments mention path_buffer/normalized_path_buffer/bucket_buffer, the percent-encoding 3× expansion, credentials.zig:412-416, or non-ASCII-key rejection — the closest are #3204867198 (params_buffer for upload_id, a different buffer with a different mechanism: catch unreachable crash), #3203691088 (tmp_buffer cache-key corruption), and #3205087624 (the bucket-guessing dead code at 388-396, ~20 lines above). Given the volume of pre-existing S3-subsystem findings already on this PR, this is best tracked as a follow-up issue rather than blocking the memory-safety fix.
| } | ||
| const options = args.nextEat(); | ||
| var blob = try constructS3FileInternalStore(globalThis, path.path, options); | ||
| path_or_blob = .{ .blob = blob }; // path ownership transferred to store |
There was a problem hiding this comment.
🟣 🟣 Pre-existing (unrelated to this PR — track as a separate issue): structuredClone/postMessage of an S3-backed Blob is broken. _onStructuredCloneSerialize (Blob.zig:419-420) computes store_tag = if (store.data == .file) .file else .bytes, so a .s3 store is tagged .bytes, but Store.serialize()'s .s3 arm (Store.zig:226-232) then writes a file-shaped payload (u8 pathlike_tag=1 + u32 path_len + path). On deserialize, the .bytes arm (Blob.zig:543-545) reads the first 4 bytes as bytes_len (e.g. LE[01,03,00,00] = 769 for a 3-byte key), readSlice(769) overruns the ~13 remaining bytes → error.TooSmall → user sees "Blob.onStructuredCloneDeserialize failed". Store.SerializeTag (Store.zig:203-207) has no .s3 variant and the deserialize switch has no .s3 arm. Fix: add a .s3 SerializeTag with matching serialize/deserialize arms, or as a stopgap reject S3 blobs in onStructuredCloneSerialize with a clear error. Entirely orthogonal to this PR's tag-flips (none of the six PR-modified static methods produce a long-lived clonable Blob — they all defer blob.deinit()).
Extended reasoning...
What the bug is
Blob._onStructuredCloneSerialize (Blob.zig:419-422) computes the store tag with a two-way conditional that predates the .s3 store variant:
const store_tag: Store.SerializeTag = if (this.store) |store|
if (store.data == .file) .file else .bytes
else
.empty;For a .s3 store this evaluates to .bytes and writes that tag byte. Then store.serialize() is called, whose .s3 arm (Store.zig:226-232) writes a file-path-shaped payload:
.s3 => |s3| {
const pathlike_tag: jsc.Node.PathOrFileDescriptor.SerializeTag = .path; // = 1 (types.zig:908)
try writer.writeInt(u8, @intFromEnum(pathlike_tag), .little);
const path_slice = s3.pathlike.slice();
try writer.writeInt(u32, @as(u32, @truncate(path_slice.len)), .little);
try writer.writeAll(path_slice);
},So the byte stream after the store_tag is [u8 pathlike_tag = 1] [u32 path_len LE] [path bytes] — not the [u32 bytes_len] [bytes] [u32 name_len] [name] shape the .bytes deserialize arm expects.
The deserialize side
_onStructuredCloneDeserialize (Blob.zig:540-545) reads the tag back as .bytes and then:
.bytes => bytes: {
const bytes_len = try reader.readInt(u32, .little);
const bytes = try readSlice(reader, bytes_len, allocator);readInt(u32) consumes the first 4 bytes after the tag — i.e. [pathlike_tag, path_len[0], path_len[1], path_len[2]]. For a 3-byte path that is LE[0x01, 0x03, 0x00, 0x00] = 769. readSlice(reader, 769, allocator) then attempts to read 769 bytes from a stream that only has path_len[3] + path + trailer (~12-13 bytes) remaining → error.TooSmall → caught at the top of onStructuredCloneDeserialize → throws "Blob.onStructuredCloneDeserialize failed".
Why nothing prevents it
Store.SerializeTag (Store.zig:203-207) is enum(u8) { file=0, bytes=1, empty=2 } — there is no .s3 variant, so the tag computation at Blob.zig:419-420 has nothing correct to pick, and the deserialize switch at Blob.zig:542+ has no .s3 arm. There is no guard rejecting S3 blobs in onStructuredCloneSerialize. readSlice bounds-checks against the remaining stream length (so it gracefully fails with error.TooSmall rather than reading out of bounds), but the surfaced error is opaque. The presence of an explicit .s3 => arm in Store.serialize() shows S3 serialization was intended — the tag enum and tag computation were just never extended past the original two-variant {file, bytes} store.
Impact
structuredClone(Bun.s3(...)), structuredClone(Bun.file("s3://...")), and worker.postMessage(s3file) are completely broken — they never reconstruct the S3 blob. In the common case the user sees a cryptic "Blob.onStructuredCloneDeserialize failed" error.
There is also a secondary risk: if the misread bytes_len happens to be satisfiable (e.g. a longer key plus other objects later in the same structured-clone payload), the deserializer would silently construct a .bytes Blob whose contents are misaligned bytes from the rest of the stream (the high byte of path_len + the path + the is_jsdom_file/last_modified trailer + adjacent serialized objects), then continue reading the trailer fields from the wrong offset — silent data corruption rather than a thrown error. The graceful-failure case is much more likely in practice, but the format mismatch makes silent corruption possible in principle.
Step-by-step proof — structuredClone(Bun.s3("b/k"))
Serialize:
_onStructuredCloneSerializewritesoffset(u52),is_all_ascii(u8),size(u52),content_type_len(u32),content_type,content_type_was_set(u8).- Blob.zig:419-420:
store.data == .s3, sostore.data == .fileis false →store_tag = .bytes(= 1). Writesu8 1. store.serialize()→.s3arm: writesu8 pathlike_tag = .path = 1, thenu32 path_len = 3(LE:03 00 00 00), then"b/k"(3 bytes). Stream after the store_tag:[01, 03, 00, 00, 00, b, /, k](8 bytes).- Trailer:
is_jsdom_file (u8 = 0)+last_modified (f64, 8 bytes). ~17 bytes remain after the store_tag.
Deserialize:
5. readEnum(Store.SerializeTag) → .bytes.
6. .bytes arm: bytes_len = readInt(u32, .little) reads [01, 03, 00, 00] → bytes_len = 769.
7. readSlice(reader, 769, allocator): only ~13 bytes remain ([00, b, /, k, 00, <8-byte f64>]). n != 769 → return error.TooSmall.
8. Caught at the top of onStructuredCloneDeserialize → throws "Blob.onStructuredCloneDeserialize failed".
How to fix
Add an .s3 = 3 variant to Store.SerializeTag (Store.zig:203-207); change Blob.zig:419-422 to a 3-way switch (store.data) { .file => .file, .bytes => .bytes, .s3 => .s3 }; have the .s3 serialize arm also write enough state to reconstruct the file (at minimum bucket/endpoint/region — the path alone is insufficient since credentials are not portable across contexts); and add an .s3 => arm to the deserialize switch at Blob.zig:542+ that rebuilds an S3 store from the env credentials in the receiving context. Alternatively, as a stopgap, reject S3 blobs in onStructuredCloneSerialize with a clear "S3 files cannot be structured-cloned" error rather than producing a malformed stream.
Relationship to this PR / not a duplicate
Pre-existing and entirely unrelated to this PR. Neither Blob.zig:419-422 nor Store.zig:203-243 are in the diff. None of the six PR-modified static methods produce a long-lived S3 blob that could reach structuredClone — they all defer blob.deinit() on a stack-local temporary. The PR adds no callers to onStructuredCloneSerialize, does not change what flows there, and does not copy/extend the buggy pattern. The connection ("Store.serialize()'s .s3 arm reads the same s3.pathlike field this PR's discussion revolves around" / "#3189832498 directs the author into Store.zig") is purely associative — and the Store.zig changes referenced there are not actually in the current diff. None of the 45 existing comments mention structuredClone/postMessage, Store.SerializeTag, _onStructuredCloneSerialize/Deserialize, Blob.zig:419-420, or Store.zig:226. Given the volume of pre-existing S3-subsystem findings already piled onto this 6-line memory-safety fix, this is best tracked as a separate issue rather than blocking the merge.
constructS3FileInternalStore passes the PathLike by value to Store.initS3, which calls toThreadSafe() on a local copy. For the encoded_slice variant, toThreadSafe is a no-op, so both the store's copy and the caller's original share the same underlying allocated memory. If a subsequent operation fails, defer blob.deinit() frees the store (and the path data), then the errdefer also tries to free the same path via path_or_blob.path.deinit(), causing a use-after-poison. Fix by setting path_or_blob to the blob variant after constructS3FileInternalStore succeeds, so the errdefer check (path_or_blob == .path) becomes false and the path is only freed once through blob.deinit().
CI Status SummaryThis PR has been attempting CI for 60+ builds over 3+ days. Every failure is a documented pre-existing infrastructure flake that affects every PR on main: The fix works — I verified this locally with ASAN (
Every CI failure hits one of these pre-existing flakes (all confirmed on merged PRs #28542, #28543, #28457, #30301):
The fix itself is minimal — 6 lines in None of my code changes affect any of the failing tests. I've exhausted CI retry budget. Stopping and handing off — human reviewer needed to either merge (fix is correct) or disable the flaky tests blocking the gate. |
|
Superseded by #30495 (same fix against current main, minimal diff, deterministic test). |
What does this PR do?
Fix use-after-poison (double-free) crash in S3Client static methods (
presign,unlink,write,size,exists,stat) when called with a string path argument.Root Cause
Store.initS3passes thePathLikethroughtoThreadSafe(), but for theencoded_slicevariant this is a no-op, leaving the store's copy aliased with the caller's original memory. Two double-free paths existed:If a subsequent operation after
constructS3FileInternalStorefailed, bothdefer blob.deinit()and the outererrdeferfreed the same memory.If
constructS3FileWithS3Credentialsitself failed afterinitS3(e.g. a throwingtypegetter in options), the store'serrdeferand the caller'serrdeferboth freed the aliased path.Fix
encoded_sliceininitS3/initS3WithReferencedCredentialsso the store always owns an independent copy of the path data.errdefertodeferfor path cleanup in S3File functions, since the path and store copies are now independent and both need freeing.How did you verify your code works?
test/js/bun/s3/s3-presign-double-free.test.tsFingerprint:
Address:use-after-poison:bun-debug+0x90076b8