Optimize Buffer.toString('hex'/'base64') for large buffers by robobun · Pull Request #31421 · oven-sh/bun

robobun · 2026-05-26T04:09:40Z

What does this PR do?

Buffer.prototype.toString regressed vs the last Zig-based release for large buffers on the binary-to-text encodings (bench/snippets/buffer-to-string.mjs: hex of a 110 KB buffer ~3.7× slower, base64/base64url ~17% slower on a quiet Sapphire Rapids host). Small-buffer cases were already faster than Zig and stay on their current paths.

Cause

hex: the Rust port of encode_bytes_to_hex replaced the Zig @Vector(16, u8) nibble-interlace fast path with a scalar, bounds-checked table loop (bytes_to_hex_lower) — the port even left a PERF(port) note about it. Every byte of a 110 KB buffer goes through that loop.
base64/base64url: the encode itself is simdutf::binary_to_base64 in both the Zig and Rust builds, so the kernel was never the problem. The Zig implementation encoded into a default_allocator (mimalloc) buffer and handed it to an external WTF string; the Rust port switched to BunString::create_uninitialized_latin1, which cycles a ~146 KB block through WTF's string allocator on every call. mimalloc reuses blocks of that size much more cheaply than WTF's allocator does.

Fix

New EncodeHexLowerImpl Highway kernel in highway_strings.cpp (same HWY_EXPORT + HWY_DYNAMIC_DISPATCH pattern as the existing kernels): split each byte into nibbles, map both nibble vectors through the 16-byte hex-digit table with TableLookupBytes, and write the interleaved digits with StoreInterleaved2. Exposed as bun_highway::encode_hex_lower; bun_core::strings::encode_bytes_to_hex dispatches to it for inputs ≥ 64 bytes and keeps the scalar LUT loop below that (small buffers were already ahead of Zig). Per-target symbols added to the verify-baseline-static allowlists.
Buffer.toString("base64"/"base64url") (to_bun_string_comptime) now encodes outputs ≥ 32 KB into a mimalloc-backed buffer wrapped in an external WTF string — the strategy the Zig implementation used — via a shared encode_base64_to_bun_string helper. Smaller outputs keep the uninitialized-WTF-string path, which is what makes the 16-byte cases faster than Zig today.

Output bytes are unchanged for every encoding and size (see verification).

Benchmarks

Shared/noisy Ice Lake Xeon container, medians of interleaved rounds; ratios are the signal, not absolute numbers. Zig is bun 1.3.14, before is canary at 49c97de, after is this PR (both release builds from this tree).

bench/snippets/buffer-to-string.mjs:

case	Zig 1.3.14	before	after
`Buffer(110000).toString('hex')`	34.6–37.6 µs	92.7–93.8 µs	24.0–24.8 µs
`Buffer(16).toString('hex')`	105–109 ns	117–120 ns	103–112 ns
`Buffer(110000).toString('base64')`	16.0–17.9 µs	15.3–16.1 µs	16.1–16.9 µs
`Buffer(110000).toString('base64url')`	16.9–18.7 µs	15.0–15.4 µs	16.1–16.7 µs
`Buffer(16).toString('base64')`	~119 ns	86–91 ns	104–120 ns (unchanged path, container noise)
`Buffer(110000).toString('ascii')`	13.4–14.0 µs	9.5–11.4 µs	11.6–11.8 µs (untouched path)

The 110 KB base64 delta (~1 µs on the reporting host) is below this container's noise floor, so it reads as a wash above; the allocation-strategy effect is visible once the output is big enough to clear the noise, in bench/snippets/buffer-base64.mjs (toString direction, two rounds):

size	Zig 1.3.14	before	after
64 KB	10.2 µs	10.5–15.6 µs	10.3–10.5 µs
512 KB	86.0 µs	96.5–107.0 µs	89.8–90.7 µs
8 MB	1.97 ms	2.18–2.24 ms	1.97–2.04 ms

Hex is ~3.8× faster than before this change and ~30–40% faster than the Zig build at 110 KB; large base64 recovers the gap to Zig (and beats it at 8 MB).

How did you verify your code works?

bun bd test test/js/node/buffer.test.js — 468 pass, including the new Buffer.prototype.toString binary-to-text encodings block: byte-for-byte comparison against pure-JS reference encoders across every length 0–130 plus 192/256/512/1024 ±1 (vector-width boundaries and the scalar tail), unaligned subarray views, every length % 3 padding shape, and a 110 000-byte buffer checked against SHA-256 digests that were cross-verified against bun 1.3.14 (pre-rewrite Zig) and the JS reference.
Vendored Node suites that exercise these paths: test-buffer-tostring.js, test-buffer-alloc.js, test-stringbytes-external.js (~500 KB hex/base64 encode+decode round-trips), and test/js/node/string_decoder/string-decoder.test.js — all pass with the debug (ASAN) build.
fs.readFile(path, "hex" | "base64" | "base64url") spot-checked against Buffer.toString output (the readFile hex path shares encode_bytes_to_hex).
cargo clippy -p bun_core -p bun_highway -p bun_runtime clean.
Baseline-ISA allowlist entries were generated by running scripts/verify-baseline-static against the locally built bun-profile and copying the feature sets it reported per target. The local toolchain doesn't compile the AVX10_2 target, so that entry mirrors the SPR/DL ceilings (+AVX512VL); if the baseline/aarch64 CI lanes report different sets I'll update the entries to match, same as the CopyAsciiPrefixImpl addition did.
A throughput regression guard in the same describe block asserts that toString('hex') of a 110 KB buffer stays within 6× the cost of a toString('latin1') copy of the same buffer (median of GC-isolated samples). The scalar per-byte loop sits at ≥10× that baseline while the SIMD kernel stays at ~2–3×, so this test fails on the pre-SIMD implementation and passes with this PR; it is skipped on debug/ASAN builds, where the unoptimized, instrumented native kernels make timing ratios meaningless. The full before/after picture is the benchmark tables above, reproducible via bench/snippets/buffer-to-string.mjs and bench/snippets/buffer-base64.mjs.

Restore a SIMD hex encoder (Highway kernel, runtime-dispatched) for encode_bytes_to_hex, and encode large base64/base64url outputs into a mimalloc-backed buffer wrapped in an external WTF string instead of cycling large blocks through WTF's string allocator.

coderabbitai · 2026-05-26T04:09:46Z

Warning

Review limit reached

@robobun, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 55 minutes and 34 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 2e525239-3a58-4b00-a918-9541170270a6

📥 Commits

Reviewing files that changed from the base of the PR and between 81397ae and 5f0e161.

📒 Files selected for processing (3)

scripts/verify-baseline-static/allowlist-x64-windows.txt
src/highway/lib.rs
test/js/node/buffer.test.js

Walkthrough

This PR adds Highway-based SIMD acceleration for hex encoding with a 64-byte threshold, refactors base64 encoding into a length-aware helper, registers symbols across ARM64 and x64 platforms, and introduces comprehensive tests validating encoding correctness at SIMD and scalar boundaries.

Changes

Hex SIMD and Encoding Refactoring

Layer / File(s)	Summary
C++ Highway Hex Encoder Implementation `src/jsc/bindings/highway_strings.cpp`	`EncodeHexLowerImpl` uses Highway SIMD to convert bytes to lowercase hex via nibble splitting and table lookup, with scalar tail handling. Function is exported via dispatch table and wrapped by extern "C" `highway_encode_hex_lower`.
Rust FFI Binding and Public Wrapper `src/highway/lib.rs`	Adds `unsafe extern "C"` FFI binding and exposes a public `#[inline(always)]` wrapper `encode_hex_lower` with buffer size validation and early-exit for empty input.
Hex Encoding Path Integration `src/bun_core/string/immutable.rs`	`encode_bytes_to_hex` now dispatches to highway SIMD for inputs ≥64 bytes, falling back to scalar LUT implementation for smaller sizes.
Static Symbol Allowlist Registration `scripts/verify-baseline-static/allowlist-aarch64.txt`, `allowlist-x64.txt`, `allowlist-x64-windows.txt`	Registers `EncodeHexLowerImpl` symbols across all Highway variants: ARM64 SVE/SVE2, x64 AVX2/AVX3/AVX10, and Windows MSVC-mangled forms; section symbol counts incremented.
Base64 Encoding Length-Aware Refactoring `src/runtime/webcore/encoding.rs`	`to_bun_string_comptime` delegates base64/base64url to new `encode_base64_to_bun_string` helper that chooses strategy by output length: small outputs encode into uninitialized `BunString`, large outputs allocate `Vec`, fill via `fill_spare`, and return external string.
Comprehensive Encoding Test Suite `test/js/node/buffer.test.js`	Adds pure-JS reference encoders and deterministic tests validating `Buffer.prototype.toString` byte-exactness at SIMD boundaries, unaligned views, large buffers via SHA-256 digest, and small buffers across padding shapes.

Suggested reviewers

Jarred-Sumner
alii

🚥 Pre-merge checks | ✅ 4

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the main objective: optimizing Buffer.toString for hex and base64 encodings on large buffers, which is the core focus of the changeset.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description check	✅ Passed	The PR description comprehensively explains the regression, root causes, fixes, benchmarks, and verification for all encoding optimizations (hex and base64).

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

robobun · 2026-05-26T04:09:50Z

^{Updated 1:32 AM PT - May 26th, 2026}

✅ @robobun, your commit 5f0e161821d79b253994f0e9af607a36edd2ffba passed in Build #58173! 🎉

🧪 To try this PR locally:

bunx bun-pr 31421

That installs a local version of the PR into your bun-31421 executable, so you can run:

bun-31421 --bun

github-actions · 2026-05-26T04:13:06Z

Found 1 issue this PR may fix:

Switch from Zig’s @Vector to Google Highway SIMD lib #8782 - This PR adds a new Highway SIMD kernel (EncodeHexLowerImpl) for hex encoding, directly implementing the migration from Zig's @Vector to Google Highway that this issue proposes

If this is helpful, copy the block below into the PR description to auto-close these issues on merge.

Fixes #8782

🤖 Generated with Claude Code

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/highway/lib.rs`:
- Around line 265-275: The function encode_hex_lower currently uses
debug_assert! which is removed in release builds and can allow out-of-bounds FFI
writes; replace the debug-only check with a runtime capacity check: compute
needed = src.len().checked_mul(2) (handle overflow by treating as too-large),
then assert!(dst.len() >= needed) or return a Result indicating insufficient
capacity so the public API cannot call highway_encode_hex_lower with a too-small
dst; update encode_hex_lower to perform this checked validation before the
unsafe call to highway_encode_hex_lower and include a clear error message
referencing src/dst sizes.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 11eb8c5c-77bf-47db-8ff2-b490d6fe9928

📥 Commits

Reviewing files that changed from the base of the PR and between ec68d39 and 81397ae.

📒 Files selected for processing (8)

scripts/verify-baseline-static/allowlist-aarch64.txt
scripts/verify-baseline-static/allowlist-x64-windows.txt
scripts/verify-baseline-static/allowlist-x64.txt
src/bun_core/string/immutable.rs
src/highway/lib.rs
src/jsc/bindings/highway_strings.cpp
src/runtime/webcore/encoding.rs
test/js/node/buffer.test.js

claude

I didn't find any correctness issues, but this adds a new Highway SIMD kernel plus an allocation-strategy change on a hot path (and the ISA allowlist entries are partly speculative per the description), so it's worth a human pass.

Extended reasoning...

Overview

This PR restores large-buffer Buffer.toString('hex'/'base64') throughput that regressed in the Zig→Rust port. It adds a new EncodeHexLowerImpl Highway kernel in highway_strings.cpp (nibble split → TableLookupBytes → StoreInterleaved2), wires it through a Rust FFI wrapper in src/highway/lib.rs and a 64-byte threshold in encode_bytes_to_hex, and refactors the base64/base64url arms of to_bun_string_comptime into a helper that switches to a mimalloc-backed external WTF string above 32 KB of output. Supporting changes: per-target symbol entries in three verify-baseline-static allowlists, and a substantial new test block in buffer.test.js (byte-exact reference comparisons across vector-width boundaries, unaligned views, SHA-256 digests of a 110 KB buffer, and a timing-ratio regression guard).

Security risks

None apparent. Inputs are in-process Buffer bytes; outputs are freshly allocated strings. The one memory-safety surface — the FFI write into dst — is now guarded by a runtime assert!(dst.len() / 2 >= src.len()) (added after CodeRabbit's comment), and the only caller slices dst to exactly 2 * to_read. The C++ kernel's HWY_RESTRICT aliasing assumption is satisfied by the &/&mut split on the Rust side. The fill_spare usage for the large-base64 path correctly reserves to_len on an empty Vec before slicing spare[..to_len].

Level of scrutiny

This warrants human review rather than bot approval. It is a new hand-written SIMD kernel that will run on every ≥64-byte hex encode across SSE/AVX2/AVX-512/SVE targets, relying on Highway semantics (TableLookupBytes index-range behaviour, StoreInterleaved2 write bounds, ShiftRight<4> on u8) that look correct to me but are exactly the kind of thing a maintainer familiar with this file's other kernels should eyeball. The base64 change swaps the allocation/ownership model on a hot path (uninitialized WTF string → external mimalloc string with finalizer), which is a design choice worth a human nod. The PR description also explicitly flags that the AVX10_2 allowlist feature set is extrapolated and may need adjusting once the baseline CI lanes report.

Other factors

Both prior review comments (my subarray end-index nit and CodeRabbit's debug_assert!→assert! request) have been addressed in follow-up commits and the threads are resolved. Test coverage is thorough and includes round-trip decode checks. The new timing-based regression guard (hex ≤ 6× latin1) is skipped on debug/ASAN and uses a median of 13 GC-isolated samples with wide margin, which seems reasonable but is the kind of thing that occasionally flakes on shared CI — worth a maintainer's judgment on whether that belongs in the suite vs. the bench scripts.

…VX3_DL hex allowlist ceiling The in-suite timing baseline gets skewed by the surrounding tests' heap and GC activity, so run the hex-vs-latin1 measurement in a fresh process. The Windows baseline ISA scan reports AVX512DQ/AVX512VL encodings in the AVX3_DL hex kernel, so include them in that entry's feature ceiling.

claude

No bugs found, but this adds a new Highway SIMD kernel with unsafe FFI plumbing, changes the large-buffer base64 allocation strategy, and touches ISA allowlists across three platforms — worth a human pass on the kernel and the 32 KB / 64 B thresholds.

Extended reasoning...

Overview

This PR restores pre-Rust-port performance for Buffer.toString('hex'|'base64'|'base64url') on large buffers. It adds a new Highway kernel EncodeHexLowerImpl in highway_strings.cpp (nibble split → TableLookupBytes → StoreInterleaved2), wires it through a new extern "C" shim and a safe Rust wrapper in src/highway/lib.rs, and dispatches to it from encode_bytes_to_hex for inputs ≥ 64 bytes. Separately, to_bun_string_comptime's base64/base64url arms are refactored into encode_base64_to_bun_string, which switches from create_uninitialized_latin1 to a mimalloc-backed Vec + external WTF string when the encoded output is ≥ 32 KB. Three verify-baseline-static allowlists gain per-target EncodeHexLowerImpl entries, and buffer.test.js gains a ~190-line correctness suite plus a subprocess-based throughput regression guard.

Security risks

None apparent. The new FFI write path (highway_encode_hex_lower) is guarded by a runtime assert!(dst.len() / 2 >= src.len()) in the safe wrapper (added in 55b86a4 after CodeRabbit feedback), and the only caller slices dst to exactly 2 * src.len(). The HWY_RESTRICT on input/output is satisfied because the Rust wrapper takes &/&mut (non-overlapping by construction). The kernel's index vectors (ShiftRight<4> and And(..., 0x0F)) are bounded to [0,15], so TableLookupBytes behaves identically across PSHUFB / NEON TBL / AVX-512 VPERMB without high-bit zeroing surprises. No user-controlled data reaches allowlists or build config.

Level of scrutiny

High. Buffer.toString is one of the hottest user-facing paths in the runtime, and this change introduces (a) a brand-new per-ISA-target SIMD kernel that will execute on every ≥64-byte hex encode, (b) a heuristic allocation-strategy fork at 32 KB that trades WTF-string allocation for an external-string finalizer, and (c) hand-written ISA feature-ceiling entries in the baseline allowlists — one of which (N_AVX3_DL on Windows) already needed a follow-up commit after CI flagged a mismatch, and the PR description notes the AVX10_2 entry was extrapolated rather than measured. None of this is mechanical.

Other factors

Test coverage is strong: byte-exact comparison against pure-JS reference encoders across 0–130 and vector-width ±1 boundaries, unaligned subarray views, a 110 KB buffer cross-checked via SHA-256 against Node and the JS reference, and decode round-trips. The throughput guard runs in a subprocess with GC disabled and uses a 6× ratio with wide margin, but timing-based tests are always a flake risk worth a second opinion. Both prior review threads (my subarray nit and CodeRabbit's debug_assert! → assert!) are resolved. CodeRabbit's walkthrough suggests Jarred-Sumner / alii as reviewers, which seems right for Highway + allocator changes.

…t ceilings The Windows baseline ISA scan reports the same AVX512DQ/AVX512VL encodings in the AVX3_SPR and AVX3_ZEN4 variants of EncodeHexLowerImpl as in AVX3_DL.

claude

No issues found after the fixes, but this adds a new Highway SIMD kernel, an unsafe FFI boundary, and changes the large-buffer base64 allocation strategy — worth a human pass on the kernel correctness and the 32 KB / 64 B thresholds.

Extended reasoning...

Overview

This PR restores large-buffer Buffer.toString('hex'/'base64') performance to (and past) the pre-Rust-port baseline. It adds a new EncodeHexLowerImpl Highway kernel in highway_strings.cpp with the standard HWY_EXPORT/HWY_DYNAMIC_DISPATCH plumbing, a Rust FFI wrapper in bun_highway, a 64-byte dispatch threshold in encode_bytes_to_hex, and a refactored encode_base64_to_bun_string that switches to a mimalloc-backed external string above 32 KB output. It also registers the new per-target symbols in all three verify-baseline-static allowlists and adds ~190 lines of byte-exact reference tests plus a throughput regression guard.

Security risks

None identified. The new unsafe surface is the highway_encode_hex_lower FFI call, which is now guarded by a runtime assert!(dst.len() / 2 >= src.len()) (upgraded from debug_assert! per CodeRabbit feedback), and the only caller slices dst to exactly 2 * src.len(). The base64 path's fill_spare usage commits only the bytes the encoder reports as written. Inputs are arbitrary user buffer bytes, but the encoders are pure data transforms with fixed-ratio output sizes.

Level of scrutiny

Medium-high. This is a hot-path performance change in Buffer.prototype.toString that introduces hand-written SIMD across six x86 targets and four ARM SVE targets, plus an allocation-strategy change whose correctness depends on create_external_globally_allocated_latin1 ownership semantics. The kernel itself (ShiftRight<4> / And 0x0F / TableLookupBytes / StoreInterleaved2) is a standard nibble-interlace and the scalar tail is straightforward, but SIMD kernels and ISA allowlist ceilings are exactly the kind of thing where a second pair of eyes pays off — the PR already needed two follow-up commits to get the Windows SPR/ZEN4 ceilings right, and the description notes the AVX10_2 entry is extrapolated rather than measured.

Other factors

Test coverage is strong: byte-for-byte comparison against pure-JS reference encoders across 0–130 and vector-width ±1 boundaries, unaligned subarray views (my earlier nit about the subarray end index was fixed in 77f5e52), SHA-256 cross-checks against a 110 KB buffer, and round-trip decodes. The throughput guard runs in a subprocess and is skipped on debug/ASAN. Both prior review comments (mine and CodeRabbit's) are resolved. Still, the combination of new SIMD code, unsafe FFI, magic thresholds (64 B, 32 KB), and hand-maintained per-ISA allowlist entries puts this outside what I'd auto-approve without human sign-off.

Jarred-Sumner

What about the buffer.write code path?

robobun · 2026-05-26T06:55:44Z

buffer.write() (and Buffer.from(string, encoding)) is the decode direction, and this PR doesn't touch it — write_u8's hex arm still goes through the scalar decode_hex_to_bytes_truncate, and base64/base64url writes go through bun_base64::decode (simdutf + the forgiving-base64 fallback), none of which are in this diff. The existing hexWrite/base64Write/base64urlWrite tests and the new round-trip decode checks in buffer.test.js cover it and are unchanged.

I did measure it to make sure there isn't a lurking port regression on that side. Same methodology as the PR benchmarks (110 000-byte payload, median of GC-isolated samples, three interleaved rounds on a noisy shared box — ranges below are min–max of the three rounds):

case	Zig 1.3.14	current canary (identical to this branch for these paths)
`buf.write(hex, "hex")` (220 000 chars)	119–250 µs	98–248 µs
`buf.write(b64, "base64")`	9.3–12.6 µs	7.2–12.0 µs
`buf.write(b64url, "base64url")`	440–556 µs	240–477 µs
`Buffer.from(hex, "hex")`	109–248 µs	138–258 µs
`Buffer.from(b64, "base64")`	16–27 µs	18–26 µs

So: write-hex and write/from-base64 are at parity with (or slightly ahead of) the Zig build — hex decode was a scalar loop in Zig too, and base64 decode is the same simdutf call in both. Buffer.from(hex) may be modestly behind Zig but it's within this box's noise. The one eyebrow-raiser is write(…, "base64url") being ~25–40× slower than write(…, "base64") in both builds: the write path calls the standard-alphabet decoder, which rejects -/_ and falls through to the scalar forgiving decoder, so it's a pre-existing quirk rather than a port regression.

Two natural follow-ups I'd keep out of this PR to keep it reviewable, but am happy to do next:

A decode-direction Highway kernel (SIMD hex decode) for write(…, "hex") / Buffer.from(…, "hex"), mirroring the encoder added here.
Pass the URL-safe flag through for write(…, "base64url") so it uses the simdutf base64url decoder instead of the forgiving fallback.

robobun · 2026-05-26T07:07:19Z

CI note: the only red check on the current head is darwin-14-aarch64-test-bun, which expired waiting for an agent (no macOS 14 arm64 runner picked it up) rather than failing any test — the same suite is green on the other Darwin/Linux/Windows lanes, and everything else (73 checks) has passed. It just needs a job retry in Buildkite when an agent is available.

* oven/main (3 new commits): build: support cross-compiling Windows (x64 and arm64) from Linux (oven-sh#31300) Bun.serve: restore per-request GC memory accounting to fix elevated RSS under HTTP load (oven-sh#31422) Optimize Buffer.toString('hex'/'base64') for large buffers (oven-sh#31421) Auto-merged: scripts/build.ts, scripts/build/bun.ts, scripts/build/deps/webkit.ts, scripts/build/flags.ts, scripts/build/rust.ts, scripts/build/source.ts, scripts/build/tools.ts Resolved conflict in scripts/build/config.ts: kept OHOS closing block + upstream Windows cross-compilation block

### What does this PR do? `Buffer.from(str, "hex")` (and `buf.write(str, "hex")`) currently decode two characters per loop iteration through a lookup table. This PR adds Highway SIMD kernels for hex decoding and routes the shared decoder through them, so whole vectors of input characters are validated and converted at a time. - **`DecodeHex8Impl` / `DecodeHex16Impl`** in `highway_strings.cpp` (same `HWY_EXPORT` + `HWY_DYNAMIC_DISPATCH` pattern as the existing kernels): classify digits with two range checks, compute nibble values arithmetically, pair them with `ConcatEven`/`ConcatOdd`, and stop at the first vector block containing a non-hex character — the scalar tail then pinpoints the exact pair, preserving Node's "stop at the first invalid pair" semantics byte-for-byte. A capped 128-bit cleanup loop handles the 16–63-pair remainder on wide-vector targets so digest-sized inputs (md5/sha256 strings) still vectorize on AVX-512 machines. - **`bun_highway::decode_hex` / `decode_hex_u16`** expose the kernels to Rust. - **`_decode_hex_to_bytes`** (the shared helper behind `Buffer.from`, `buf.write`, CSRF token parsing, etc.) takes the Highway path for inputs of at least 16 byte pairs, for both Latin-1 and UTF-16 source strings; shorter inputs keep the existing scalar loop. The UTF-16 kernel classifies on the full code unit, so units above 0xFF whose low byte looks like a hex digit are still rejected, matching the scalar decoder. - `verify-baseline-static` allowlists gain entries for the new dispatch targets, with feature ceilings taken from the tool's own report on the compiled kernels. Complementary to #31421, which covers the encode direction (`toString('hex')`); the two touch the same files but different functions. Related to #8782 (adopting Highway for runtime-dispatched SIMD) — this covers the hex decode path only, so it is not marked as closing that issue. ### Benchmarks `bench/snippets/buffer-from-hex.mjs` (added in this PR), median of 3 interleaved rounds on linux x64 (Ice Lake, AVX-512), comparing a release build of `main` (4324120) with this PR at the same commit: | input | main | this PR | speedup | |---|---|---|---| | 32 chars → 16 B | 237 ns | 235 ns | ~1.0x | | 64 chars → 32 B | 255 ns | 237 ns | 1.08x | | 128 chars → 64 B | 273 ns | 243 ns | 1.12x | | 1 KiB chars → 512 B | 796 ns | 375 ns | 2.1x | | 128 KiB chars → 64 KB | 65.4 µs | 9.8 µs | 6.7x | | 1 MiB chars → 512 KB | 513 µs | 88 µs | 5.8x | | 16 MiB chars → 8 MB | 8.50 ms | 2.22 ms | 3.8x | Small inputs are dominated by the fixed `Buffer.from` overhead, so they sit at parity; the decode itself is the win for everything from a few hundred characters up. ### How did you verify your code works? - `bun bd test test/js/node/buffer.test.js` — 472 pass with the debug (ASAN) build; also passes with `USE_SYSTEM_BUN=1` since this is a behavior-preserving optimization. `test/js/bun/util/csrf.test.ts`, `buffer-utf16`, `buffer-concat`, and `buffer-from-encoding-leak` (with the ASAN quarantine accounted for) also pass. - New regression coverage in `test/js/node/buffer.test.js` sweeps the SIMD block boundaries: every length around the vector widths (15–1024 pairs, with and without a trailing lone digit), an invalid character at every position of an 80-pair input for ten different invalid bytes, UTF-16 code units above 0xFF whose low byte is a valid hex digit placed at block boundaries, and `buf.write` destination-limit behavior — all cross-checked against a plain JS reference decoder, on both the Latin-1 and two-byte string paths. These tests pass before and after this change by design (there is no functional delta to assert); the before/after evidence for the optimization itself is the benchmark table above. - A standalone harness compiled the kernels for every Highway target and checked them against a scalar reference for all input lengths 0–600, invalid characters at every position, and the UTF-16 high-byte cases, including "never writes past the reported count". - `cargo clippy -p bun_core -p bun_highway` is clean. - The baseline-ISA allowlist entries are the feature sets `verify-baseline-static` reports for the new kernels when compiled with the same LLVM toolchain; if the CI baseline builds report different sets I'll update the entries to match. Co-authored-by: Jarred Sumner <jarred@jarredsumner.com>

github-actions Bot added the claude label May 26, 2026

[autofix.ci] apply automated fixes

81397ae

claude Bot reviewed May 26, 2026

View reviewed changes

Comment thread test/js/node/buffer.test.js Outdated

coderabbitai Bot reviewed May 26, 2026

View reviewed changes

Comment thread src/highway/lib.rs

robobun added 3 commits May 26, 2026 04:24

test: use fixed-length views in the unaligned hex subarray case

77f5e52

highway: enforce the hex destination size at runtime in the safe wrapper

55b86a4

test: add a throughput regression guard for the bulk hex encoder

5caf980

claude Bot reviewed May 26, 2026

View reviewed changes

Add AVX512DQ/AVX512VL to the Windows SPR and ZEN4 hex kernel allowlis…

5f0e161

…t ceilings The Windows baseline ISA scan reports the same AVX512DQ/AVX512VL encodings in the AVX3_SPR and AVX3_ZEN4 variants of EncodeHexLowerImpl as in AVX3_DL.

claude Bot reviewed May 26, 2026

View reviewed changes

Jarred-Sumner reviewed May 26, 2026

View reviewed changes

robobun mentioned this pull request May 26, 2026

Decode Buffer.from(str, 'hex') with Highway SIMD #31425

Merged

Jarred-Sumner merged commit cab4feb into main May 26, 2026
78 checks passed

Jarred-Sumner deleted the farm/265ef138/buffer-tostring-hex-base64 branch May 26, 2026 10:14

robobun mentioned this pull request May 27, 2026

Strip comment blocks longer than 3 lines from .rs files #31201

Closed

claude Bot mentioned this pull request May 29, 2026

sql: keep all-digit column names with '_' separators as named keys #31572

Merged

coderabbitai Bot mentioned this pull request Jun 1, 2026

base64: encode URL-safe via simdutf directly instead of the WTF C++ shim #31666

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize Buffer.toString('hex'/'base64') for large buffers#31421

Optimize Buffer.toString('hex'/'base64') for large buffers#31421
Jarred-Sumner merged 7 commits into
mainfrom
farm/265ef138/buffer-tostring-hex-base64

robobun commented May 26, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented May 26, 2026 •

edited

Loading

Review limit reached

Uh oh!

robobun commented May 26, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 26, 2026

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

claude Bot left a comment

Uh oh!

claude Bot left a comment

Uh oh!

claude Bot left a comment

Uh oh!

Jarred-Sumner left a comment

Uh oh!

robobun commented May 26, 2026

Uh oh!

robobun commented May 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

robobun commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Benchmarks

How did you verify your code works?

Uh oh!

coderabbitai Bot commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review limit reached

Walkthrough

Changes

Suggested reviewers

Uh oh!

robobun commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented May 26, 2026

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Overview

Security risks

Level of scrutiny

Other factors

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Overview

Security risks

Level of scrutiny

Other factors

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Overview

Security risks

Level of scrutiny

Other factors

Uh oh!

Jarred-Sumner left a comment

Choose a reason for hiding this comment

Uh oh!

robobun commented May 26, 2026

Uh oh!

robobun commented May 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

robobun commented May 26, 2026 •

edited

Loading

coderabbitai Bot commented May 26, 2026 •

edited

Loading

robobun commented May 26, 2026 •

edited

Loading