Fix bun.String.toOwnedSliceReturningAllASCII#23925
Conversation
`bun.String.toOwnedSliceReturningAllASCII` is supposed to return a boolean indicating whether or not the string is entirely composed of ASCII characters. However, the current implementation frequently produces incorrect results: * If the string is a `ZigString`, it always returns true, even though `ZigString`s can be UTF-16 or Latin-1. * If the string is a `StaticZigString`, it always returns false, even though `StaticZigStrings` can be all ASCII. * If the string is a 16-bit `WTFStringImpl`, it always returns false, even though 16-bit `WTFString`s can be all ASCII. `toOwnedSliceReturningAllASCII` is currently used in two places, both of which assume its answer is accurate: * `bun.webcore.Blob.fromJSWithoutDeferGC` * `bun.api.ServerConfig.fromJS`
|
Updated 7:42 PM PT - Oct 21st, 2025
❌ @taylordotfish, your commit 7619c57 has 1 failures in
🧪 To try this PR locally: bunx bun-pr 23925That installs a local version of the PR into your bun-23925 --bun |
WalkthroughRefactors ASCII tracking by introducing a new public AsciiStatus enum, replaces Blob.charset's previous Charset enum with strings.AsciiStatus, changes String.toOwnedSlice to return OOM![]u8 and adds an internal toOwnedSliceImpl returning bytes + AsciiStatus, and adjusts ZigString.Slice.mut to use @constcast for mutable views. Changes
Possibly related PRs
Suggested reviewers
Pre-merge checks❌ Failed checks (1 warning)
✅ Passed checks (3 passed)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: ASSERTIVE Plan: Pro Disabled knowledge base sources:
📒 Files selected for processing (1)
🧰 Additional context used📓 Path-based instructions (3)**/*.zig📄 CodeRabbit inference engine (.cursor/rules/javascriptcore-class.mdc)
Files:
src/bun.js/**/*.zig📄 CodeRabbit inference engine (.cursor/rules/zig-javascriptcore-classes.mdc)
Files:
src/**/*.zig📄 CodeRabbit inference engine (.cursor/rules/building-bun.mdc)
Files:
🧠 Learnings (2)📚 Learning: 2025-08-30T00:13:36.815ZApplied to files:
📚 Learning: 2025-08-30T00:13:36.815ZApplied to files:
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
Warning Review ran into problems🔥 ProblemsErrors were encountered while retrieving linked issues. Errors (1)
Comment |
| pub fn mut(this: Slice) []u8 { | ||
| return @as([*]u8, @ptrFromInt(@intFromPtr(this.ptr)))[0..this.len]; | ||
| return @as([*]u8, @constCast(this.ptr))[0..this.len]; | ||
| } |
There was a problem hiding this comment.
thankfully this function looks unused, so we should delete
| /// JavaScriptCore strings are either latin1 or UTF-16 | ||
| /// When UTF-16, they're nearly always due to non-ascii characters | ||
| charset: Charset = .unknown, | ||
| charset: strings.AsciiStatus = .unknown, |
There was a problem hiding this comment.
should charset be renamed to ascii_status or is_ascii?
There was a problem hiding this comment.
Maybe. I think it's okay. charset == .all_ascii etc. still make sense.
bun.String.toOwnedSliceReturningAllASCIIis supposed to return a boolean indicating whether or not the string is entirely composed of ASCII characters. However, the current implementation frequently produces incorrect results:ZigString, it always returns true, even thoughZigStrings can be UTF-16 or Latin-1.StaticZigString, it always returns false, even thoughStaticZigStringscan be all ASCII.WTFStringImpl, it always returns false, even though 16-bitWTFStrings can be all ASCII.toOwnedSliceReturningAllASCIIis currently used in two places, both of which assume its answer is accurate:bun.webcore.Blob.fromJSWithoutDeferGCbun.api.ServerConfig.fromJS(For internal tracking: fixes ENG-21249)