Skip to content

Fix bun.String.toOwnedSliceReturningAllASCII#23925

Merged
Jarred-Sumner merged 2 commits into
mainfrom
taylor.fish/fix-tosraa
Oct 22, 2025
Merged

Fix bun.String.toOwnedSliceReturningAllASCII#23925
Jarred-Sumner merged 2 commits into
mainfrom
taylor.fish/fix-tosraa

Conversation

@taylordotfish

@taylordotfish taylordotfish commented Oct 21, 2025

Copy link
Copy Markdown
Contributor

bun.String.toOwnedSliceReturningAllASCII is supposed to return a boolean indicating whether or not the string is entirely composed of ASCII characters. However, the current implementation frequently produces incorrect results:

  • If the string is a ZigString, it always returns true, even though ZigStrings can be UTF-16 or Latin-1.
  • If the string is a StaticZigString, it always returns false, even though StaticZigStrings can be all ASCII.
  • If the string is a 16-bit WTFStringImpl, it always returns false, even though 16-bit WTFStrings can be all ASCII.
  • If the string is empty, it always returns false, even though empty strings are valid ASCII strings.

toOwnedSliceReturningAllASCII is currently used in two places, both of which assume its answer is accurate:

  • bun.webcore.Blob.fromJSWithoutDeferGC
  • bun.api.ServerConfig.fromJS

(For internal tracking: fixes ENG-21249)

`bun.String.toOwnedSliceReturningAllASCII` is supposed to return a
boolean indicating whether or not the string is entirely composed of
ASCII characters. However, the current implementation frequently
produces incorrect results:

* If the string is a `ZigString`, it always returns true, even though
  `ZigString`s can be UTF-16 or Latin-1.
* If the string is a `StaticZigString`, it always returns false, even
  though `StaticZigStrings` can be all ASCII.
* If the string is a 16-bit `WTFStringImpl`, it always returns false,
  even though 16-bit `WTFString`s can be all ASCII.

`toOwnedSliceReturningAllASCII` is currently used in two places, both of
which assume its answer is accurate:

* `bun.webcore.Blob.fromJSWithoutDeferGC`
* `bun.api.ServerConfig.fromJS`
@robobun

robobun commented Oct 21, 2025

Copy link
Copy Markdown
Collaborator
Updated 7:42 PM PT - Oct 21st, 2025

@taylordotfish, your commit 7619c57 has 1 failures in Build #29930 (All Failures):


🧪   To try this PR locally:

bunx bun-pr 23925

That installs a local version of the PR into your bun-23925 executable, so you can run:

bun-23925 --bun

@linear

linear Bot commented Oct 21, 2025

Copy link
Copy Markdown

@coderabbitai

coderabbitai Bot commented Oct 21, 2025

Copy link
Copy Markdown
Contributor

Walkthrough

Refactors ASCII tracking by introducing a new public AsciiStatus enum, replaces Blob.charset's previous Charset enum with strings.AsciiStatus, changes String.toOwnedSlice to return OOM![]u8 and adds an internal toOwnedSliceImpl returning bytes + AsciiStatus, and adjusts ZigString.Slice.mut to use @constcast for mutable views.

Changes

Cohort / File(s) Change Summary
ZigString binding update
src/bun.js/bindings/ZigString.zig
ZigString.Slice.mut now asserts the allocator is non-null and returns a mutable view by using @constCast on the internal pointer instead of reconstructing a mutable pointer from an integer.
Blob charset type change
src/bun.js/webcore/Blob.zig
Replaced the public Charset enum usage with strings.AsciiStatus for Blob.charset; removed the public Charset enum and replaced fromIsAllASCII calls with AsciiStatus.fromBool in initialization and setter paths.
String API and ASCII status refactor
src/string.zig, src/string/immutable.zig
Added public AsciiStatus enum (unknown, all_ascii, non_ascii) with fromBool(?bool). Changed pub fn toOwnedSlice signature to return OOM![]u8 and introduced toOwnedSliceImpl returning { []u8, AsciiStatus }; updated toOwnedSliceReturningAllASCII and internal string paths to use the new AsciiStatus result.

Possibly related PRs

Suggested reviewers

  • Jarred-Sumner
  • zackradisic
  • pfgithub

Pre-merge checks

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Description Check ⚠️ Warning The PR description provides detailed context about the bug and explains the specific scenarios where ASCII detection fails, as well as identifying where the function is used. However, the description does not follow the required template structure specified in the repository. The template requires explicit sections for "What does this PR do?" and "How did you verify your code works?", but the description lacks these section headers and does not include any verification information about how the fix was tested or validated. The author should restructure the PR description to match the repository template. Add the "What does this PR do?" section header with the existing explanation of the ASCII detection bugs, and add a new "How did you verify your code works?" section describing the testing approach, test cases, or validation methods used to confirm that the fix correctly handles ZigString, StaticZigString, 16-bit WTFStringImpl, and empty string cases.
✅ Passed checks (3 passed)
Check name Status Explanation
Title Check ✅ Passed The PR title "Fix bun.String.toOwnedSliceReturningAllASCII" is specific and directly references the function being fixed. The title accurately reflects the main objective of the pull request, which is to address bugs in ASCII detection within this function. The title is clear and concise, allowing reviewers scanning history to immediately understand that this PR concerns fixing a specific bug in the toOwnedSliceReturningAllASCII function.
Linked Issues Check ✅ Passed The code changes address the requirements outlined in ENG-21249 by implementing proper ASCII detection across all string representations. The PR introduces a new AsciiStatus enum in string/immutable.zig, refactors string.zig to implement accurate ASCII status tracking via the toOwnedSliceImpl helper function, and updates Blob.zig (a key caller) to use the new AsciiStatus mechanism instead of its own Charset enum. These changes directly target fixing the documented bugs where ZigString, StaticZigString, 16-bit WTFStringImpl, and empty strings were returning incorrect ASCII detection results.
Out of Scope Changes Check ✅ Passed The changes appear to be a coordinated refactoring effort to implement proper ASCII status tracking across the codebase. The modifications to ZigString.zig (Slice.mut method), Blob.zig (charset field and related logic), string.zig (toOwnedSlice signatures and implementation), and string/immutable.zig (new AsciiStatus enum) are all interdependent parts of implementing the fix. Each change supports the core objective of accurately detecting ASCII across different string types, and the updates to callers (like Blob.zig) ensure consistency with the new implementation.

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 4d0fc9e and 7619c57.

📒 Files selected for processing (1)
  • src/bun.js/bindings/ZigString.zig (1 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.zig

📄 CodeRabbit inference engine (.cursor/rules/javascriptcore-class.mdc)

**/*.zig: Declare the extern C symbol in Zig and export a Zig-friendly alias for use
Wrap the Bun____toJS extern in a Zig method that takes a JSGlobalObject and returns JSC.JSValue

Files:

  • src/bun.js/bindings/ZigString.zig
src/bun.js/**/*.zig

📄 CodeRabbit inference engine (.cursor/rules/zig-javascriptcore-classes.mdc)

src/bun.js/**/*.zig: In Zig binding structs, expose generated bindings via pub const js = JSC.Codegen.JS and re-export toJS/fromJS/fromJSDirect
Constructors and prototype methods should return bun.JSError!JSC.JSValue to integrate Zig error handling with JS exceptions
Use parameter name globalObject (not ctx) and accept (*JSC.JSGlobalObject, *JSC.CallFrame) in binding methods/constructors
Implement getters as get(this, globalObject) returning JSC.JSValue and matching the .classes.ts interface
Provide deinit() for resource cleanup and finalize() that calls deinit(); use bun.destroy(this) or appropriate destroy pattern
Access JS call data via CallFrame (argument(i), argumentCount(), thisValue()) and throw errors with globalObject.throw(...)
For properties marked cache: true, use the generated Zig accessors (NameSetCached/GetCached) to work with GC-owned values
In finalize() for objects holding JS references, release them using .deref() before destroy

Files:

  • src/bun.js/bindings/ZigString.zig
src/**/*.zig

📄 CodeRabbit inference engine (.cursor/rules/building-bun.mdc)

When adding debug logs in Zig, create a scoped logger and log via Bun APIs: const log = bun.Output.scoped(.${SCOPE}, .hidden); then log("...", .{})

src/**/*.zig: Use private fields in Zig with the # prefix (e.g., struct { #foo: u32 };)
Prefer decl literals in Zig (e.g., const decl: Decl = .{ .binding = 0, .value = 0 };)
Prefer placing @import statements at the bottom of the Zig file (formatter may reorder automatically)
Prefer @import("bun") rather than @import("root").bun or @import("../bun.zig")

In Zig code, be careful with allocators and use defer for cleanup

Files:

  • src/bun.js/bindings/ZigString.zig
🧠 Learnings (2)
📚 Learning: 2025-08-30T00:13:36.815Z
Learnt from: CR
PR: oven-sh/bun#0
File: .cursor/rules/zig-javascriptcore-classes.mdc:0-0
Timestamp: 2025-08-30T00:13:36.815Z
Learning: Applies to src/bun.js/**/*.zig : In Zig binding structs, expose generated bindings via pub const js = JSC.Codegen.JS<ClassName> and re-export toJS/fromJS/fromJSDirect

Applied to files:

  • src/bun.js/bindings/ZigString.zig
📚 Learning: 2025-08-30T00:13:36.815Z
Learnt from: CR
PR: oven-sh/bun#0
File: .cursor/rules/zig-javascriptcore-classes.mdc:0-0
Timestamp: 2025-08-30T00:13:36.815Z
Learning: Applies to src/bun.js/bindings/generated_classes_list.zig : Update src/bun.js/bindings/generated_classes_list.zig to include new classes

Applied to files:

  • src/bun.js/bindings/ZigString.zig
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Format

Warning

Review ran into problems

🔥 Problems

Errors were encountered while retrieving linked issues.

Errors (1)
  • UTF-16: Entity not found: Issue - Could not find referenced Issue.

Comment @coderabbitai help to get the list of available commands and usage tips.

Comment on lines 414 to 416
pub fn mut(this: Slice) []u8 {
return @as([*]u8, @ptrFromInt(@intFromPtr(this.ptr)))[0..this.len];
return @as([*]u8, @constCast(this.ptr))[0..this.len];
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thankfully this function looks unused, so we should delete

/// JavaScriptCore strings are either latin1 or UTF-16
/// When UTF-16, they're nearly always due to non-ascii characters
charset: Charset = .unknown,
charset: strings.AsciiStatus = .unknown,

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should charset be renamed to ascii_status or is_ascii?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe. I think it's okay. charset == .all_ascii etc. still make sense.

@Jarred-Sumner Jarred-Sumner merged commit d846e9a into main Oct 22, 2025
58 of 60 checks passed
@Jarred-Sumner Jarred-Sumner deleted the taylor.fish/fix-tosraa branch October 22, 2025 01:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants