Route large ANSI escape scans through the dispatched Highway kernel#31490
Conversation
findEscapeCharacter (shared by Bun.stripANSI, stringWidth, wrapAnsi and sliceAnsi) used WTF's SIMD helpers, which compile to whatever -march the translation unit is built with. On the linux-x64-baseline build that is -march=nehalem (SSE4.2, no AVX2), so the no-escape scan ran at SSE width while the rest of Bun's SIMD string kernels get AVX2/AVX-512 at runtime via Highway's HWY_DYNAMIC_DISPATCH. A 16 KB no-ANSI stripANSI scan regressed ~17% on baseline versus the default build as a result. Add IndexOfEscapeChar8Impl / IndexOfEscapeChar16Impl Highway kernels that replicate findEscapeCharacter's two-stage logic exactly: a broad range mask (c & 0x70) == 0x10 matches 0x10-0x1F and 0x90-0x9F, then an exact 8-value match (0x1B, 0x90, 0x98, 0x9B, 0x9C, 0x9D, 0x9E, 0x9F) refines a broad hit down to the real escape introducers. These are exported through HWY_DYNAMIC_DISPATCH so they pick the widest ISA at runtime regardless of the build's static -march. findEscapeCharacter delegates to the dispatched kernel once the scan is at least 1 KB; shorter scans keep the inlined path. stripANSI's regressing case is a single whole-string scan of a multi-KB string, whereas sliceAnsi and wrapAnsi call findEscapeCharacter once per short visible run — routing those tiny calls through the dispatch indirection would cost more than it saves, and there is no baseline width gap to close on a scan that fits in a few vectors. The new runtime-gated symbols are added to the static baseline-verify allowlists (x64, aarch64, windows).
StatusRoot cause: Fix: new Allowlist entries for the new runtime-gated symbols added to all three baseline-verify files. Tests: 276 pass in Waiting on CI. |
|
Updated 12:31 AM PT - May 28th, 2026
✅ @robobun, your commit cecca273505cca736019e791091977c8804fe9df passed in 🧪 To try this PR locally: bunx bun-pr 31490That installs a local version of the PR into your bun-31490 --bun |
|
Found 1 issue this PR may fix:
🤖 Generated with Claude Code |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (6)
WalkthroughAdds runtime-dispatched Highway SIMD kernels for locating the first ANSI escape introducer in u8/u16 buffers, introduces threshold-based dispatch in the C interface, implements broad-mask + exact-refinement SIMD scanners, updates allowlists across architectures, and adds large-input tests exercising the dispatched path. ChangesANSI Escape-Character Detection via Highway SIMD
Suggested reviewers
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@test/js/bun/util/stripANSI.test.ts`:
- Around line 521-527: Replace the manual for...of loop that iterates over
sequences and creates individual test(...) cases with a parameterized
describe.each(sequences) block so the suite uses the repo-standard matrix style;
locate the block referencing sequences, the test name `dispatched scan matches
inlined for ${JSON.stringify(seq)}`, and assertions calling Bun.stripANSI(prefix
+ seq) and Bun.stripANSI(prefix + seq + suffix) and move those assertions into a
describe.each handler (using the current seq variable, prefix and suffix) so
each sequence is a table-driven sub-suite rather than dynamically created tests.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: abbc6e4c-c0c4-4933-b2d7-e3b6f9184855
📒 Files selected for processing (6)
scripts/verify-baseline-static/allowlist-aarch64.txtscripts/verify-baseline-static/allowlist-x64-windows.txtscripts/verify-baseline-static/allowlist-x64.txtsrc/jsc/bindings/ANSIHelpers.hsrc/jsc/bindings/highway_strings.cpptest/js/bun/util/stripANSI.test.ts
|
Addressed CodeRabbit's review: switched the sequence-parity matrix to Not adding |
Note on testingThis is a behavior-preserving change: I verified this with a differential comparison against the release binary over ~9,000 inputs — single escapes at every offset crossing the 1 KB threshold and SIMD-chunk boundaries, every C1 introducer, broad-mask false positives, no-ANSI strings, UTF-16 (two-byte) strings, trailing partial sequences, random fuzz up to 256 KB: zero differences. Because there's no behavioral delta, the value of this PR is measured by the benchmark ( The added tests in Ready for review/merge by a maintainer; the perf win needs the canary to confirm. |
There was a problem hiding this comment.
Thanks for addressing both doc nits in 9f1cbd8. I didn't find any correctness issues, but a new multi-target Highway kernel on the stripANSI/stringWidth/sliceAnsi/wrapAnsi hot path (plus the threshold choice and per-ISA allowlist ceilings) is worth a human sign-off.
Extended reasoning...
Overview
This PR adds two runtime-dispatched Highway kernels (IndexOfEscapeChar8Impl / IndexOfEscapeChar16Impl) in src/jsc/bindings/highway_strings.cpp and routes ANSI::findEscapeCharacter (in ANSIHelpers.h) through them when the scan length is ≥ 1 KB, so the no-escape fast path uses the widest ISA the CPU supports at runtime instead of being pinned to the build's static -march. The new symbols are added to all three baseline-verify allowlists, and stripANSI.test.ts gains 11 tests covering the dispatched path (zero-copy, parity vs. inlined, ESC-at-every-offset, UTF-16, broad-mask false positives).
I previously left two inline doc nits (stale sliceAnsi/wrapAnsi rationale on the len < N guard, and a 0x90-0x97 vs 0x91-0x97 off-by-one in two comments); both were addressed in 9f1cbd8 and the threads are resolved. CodeRabbit's describe.each suggestion was also applied. No outstanding review comments remain.
Security risks
None identified. The kernel is a read-only scan over caller-provided string buffers with the same bounds discipline as the dozen sibling kernels in this file (simd_len = len - (len % N) loop, scalar tail over [i, len), len < N early-out). No untrusted-length arithmetic, no writes, no auth/crypto/permission surface.
Level of scrutiny
Moderate-to-high. This is a behaviour-preserving perf change, but it introduces a second implementation of the escape-introducer scan that must stay byte-for-byte equivalent to the inlined WTF-SIMD path it shadows, compiled across 6+ Highway targets per architecture. The two-stage broad-mask + 8-way exact-match logic checks out against isEscapeCharacter + 0x9C and the existing exactEscapeMatch, and the u16 broad mask (static_cast<uint16_t>(~0b10001111U) = 0xFF70) correctly rejects code units ≥ 0x100. The new tests are well-designed (the PR notes they fail when the kernel is deliberately broken). Still, the 1 KB threshold, the decision to keep two parallel implementations rather than always dispatch, and the conservative per-namespace feature-ceiling unions in the allowlists are design choices a maintainer should ratify.
Other factors
- All prior review threads (mine and CodeRabbit's) are resolved.
- The bug-hunting system found no issues.
- 276 tests pass locally per the author; CI build #58523 is in flight.
- No CODEOWNERS entries cover these paths.
- The allowlist edits are mechanical and follow the established pattern, but the feature ceilings for the new entries are intentionally supersets (per the PR description) and could be tightened later.
Given the scope (new SIMD kernel on a hot path shared by four user-facing APIs, multi-target codegen, and ISA allowlist additions), I'm deferring to a human reviewer rather than auto-approving.
|
@robobun update PR description with a comprehensive benchmark showing the performance difference. |
Routing on total remaining length alone regressed dense SGR input ~11% on the baseline build: stripANSI re-scans the (still large) remainder after each escape sequence, so every call dispatched to the Highway kernel and paid its per-call setup even though the next escape sat at offset 0. Peek the first vector chunk inline first; only delegate to the kernel when that chunk is clean. Escape-at-start stays on the cheap inlined path, while a genuinely long no-escape run still gets the runtime-dispatched wide SIMD. Behavior is unchanged (differential fuzz vs the non-dispatched build is byte-identical); the 16 KB no-ANSI scan stays ~2.3x faster on baseline and the dense long-ANSI case no longer regresses.
|
Updated the PR description with a comprehensive benchmark (4 release builds: Headline: 16 KB no-ANSI scan 404 → ~174 ns on baseline (2.3×), now at parity with the default build. All large no-ANSI sizes (1 KB–256 KB) improve. Two things the benchmarking surfaced, both now handled:
|
|
This is ready from my side, @Jarred-Sumner.
Since the output is identical before/after by design, the win is the benchmark/canary rather than a behavioral test. |
|
@robobun fix the conflicts |
…si-highway-escape-scan # Conflicts: # scripts/verify-baseline-static/allowlist-aarch64.txt # scripts/verify-baseline-static/allowlist-x64-windows.txt # scripts/verify-baseline-static/allowlist-x64.txt # src/jsc/bindings/highway_strings.cpp
|
Conflicts fixed, @Jarred-Sumner — merged latest |
|
CI re-triggered (cecca27, empty commit). The prior run's only failures were |
|
@Jarred-Sumner — this is ready to merge; the only red CI is infra flake on the macOS runners, not the diff. Current run (cecca27): 73 pass, 0 non-macOS failures. The single failure is Why it's not the diff: the failing lanes shift between re-runs and are macOS-only — last run The change is behavior-preserving (byte-identical |
Problem
Bun.stripANSI(andBun.stringWidth/Bun.sliceAnsi/Bun.wrapAnsi) scan for ANSI escape introducers viaANSI::findEscapeCharacterinsrc/jsc/bindings/ANSIHelpers.h, which uses WTF'sSIMD::helpers (wtf/SIMDHelpers.h, built on SIMDe). Those compile down to whatever-marchthe translation unit is built with, at a fixed 128-bit width (simde_uint8x16_t).On the
linux-x64-baselinebuild that is-march=nehalem(SSE4.2, no AVX2), so the no-escape scan runs at SSE width — while the rest of Bun's SIMD string kernels inhighway_strings.cpppick AVX2/AVX-512 at runtime via Google Highway'sHWY_DYNAMIC_DISPATCH. Thelinux-x64vslinux-x64-baselinecanary comparison showed a 16 KB no-ANSIBun.stripANSIscan regress ~17% on baseline (294 ns → 344 ns). The dispatched kernel that does the actual stripping is fine (the long-ANSI case is even ~9% faster on baseline); the gap was the-march-pinned scan in front of it.Fix
Add two runtime-dispatched Highway kernels in
src/jsc/bindings/highway_strings.cpp:IndexOfEscapeChar8Impl/IndexOfEscapeChar16Impl— index of the first ANSI escape introducer, orlenif none.They replicate
findEscapeCharacter's two-stage logic exactly:(c & 0x70) == 0x10matches0x10-0x1Fand0x90-0x9Fin one compare (the hot no-escape path pays only this per chunk).0x1B, 0x90, 0x98, 0x9B, 0x9C, 0x9D, 0x9E, 0x9F) refines a broad hit down to the real introducers —0x9C(C1 ST) included, matching the existing scalar contract (isEscapeCharacter+0x9C).Exported via
HWY_EXPORT/HWY_DYNAMIC_DISPATCH, so they use the widest ISA the CPU supports at runtime regardless of the build's static-march.findEscapeCharacterdelegates to the dispatched kernel only for a long scan whose first vector chunk is clean (len >= 1 KBand no escape candidate in the first 16 bytes). Two call patterns are deliberately kept on the inlined path:sliceAnsi/wrapAnsicallfindEscapeCharacteronce per short visible run) — the per-call dispatch indirection would cost more than it saves, and there's no baseline width gap to close on a scan that fits in a few vectors.\x1b[…mevery few bytes) makesstripANSIre-scan the still-large remainder after each sequence. Gating on total length alone made every one of those calls dispatch and pay the kernel's per-call setup even though the next escape sat at offset 0; peeking the first chunk inline keeps them cheap.The new runtime-gated symbols are added to the static baseline-verify allowlists (
allowlist-x64.txt,allowlist-aarch64.txt,allowlist-x64-windows.txt) with per-target feature ceilings (the conservative per-namespace union; the kernel's op set — load/And/Eq/Or/FindFirstTrue — is a strict subset, so a superset ceiling is safe and can be tightened from a baseline build's instruction report if desired).Benchmark
Release builds,
Bun.stripANSI, best-of-Nns/op(lower is better). Four binaries built from this branch:baseline/default× with-fix/without-fix. "baseline" =--baseline=on(-march=nehalem); "default" =-march=haswell. Measured on an AVX-512-capable x64 host, so the dispatched kernel runs AVX-512 at runtime while the WTF-SIMD path is 128-bit.\x1b[31mred\x1b[39m)Takeaways:
Reproduce with
bench/snippets/strip-ansi.mjs(or the self-contained script in the verification notes); the real baseline-vs-default delta is what the canary tracks.Verification
bun bd test test/js/bun/util/stripANSI.test.ts→ 276 pass, 0 fail (11 new tests covering the dispatched path).0x10-0x1F/0x90-0x9F) placed at every offset for lengths 1–200 (covering all the 4x-unrolled SIMD block boundaries), the same for UTF-16 (two-byte) strings, multi-escape adversarial patterns, no-ANSI strings of every size crossing the 1 KB threshold, trailing partial sequences, and random fuzz up to 256 KB → byte-for-byte identical output in every case. Behavior is unchanged; the dispatched path is equivalent to the inlined one it shadows.stripANSI+sliceAnsi+wrapAnsi+stringWidth) → 608 pass, 0 fail. (sliceAnsi-fuzz's "quadratic-slow" stress test times out under the slow debug+ASAN build here — confirmed it times out identically on cleanmain, i.e. pre-existing and unrelated to this change.)