Skip to content

Optimize Buffer.toString('hex'/'base64') for large buffers#31421

Merged
Jarred-Sumner merged 7 commits into
mainfrom
farm/265ef138/buffer-tostring-hex-base64
May 26, 2026
Merged

Optimize Buffer.toString('hex'/'base64') for large buffers#31421
Jarred-Sumner merged 7 commits into
mainfrom
farm/265ef138/buffer-tostring-hex-base64

Conversation

@robobun

@robobun robobun commented May 26, 2026

Copy link
Copy Markdown
Collaborator

What does this PR do?

Buffer.prototype.toString regressed vs the last Zig-based release for large buffers on the binary-to-text encodings (bench/snippets/buffer-to-string.mjs: hex of a 110 KB buffer ~3.7× slower, base64/base64url ~17% slower on a quiet Sapphire Rapids host). Small-buffer cases were already faster than Zig and stay on their current paths.

Cause

  • hex: the Rust port of encode_bytes_to_hex replaced the Zig @Vector(16, u8) nibble-interlace fast path with a scalar, bounds-checked table loop (bytes_to_hex_lower) — the port even left a PERF(port) note about it. Every byte of a 110 KB buffer goes through that loop.
  • base64/base64url: the encode itself is simdutf::binary_to_base64 in both the Zig and Rust builds, so the kernel was never the problem. The Zig implementation encoded into a default_allocator (mimalloc) buffer and handed it to an external WTF string; the Rust port switched to BunString::create_uninitialized_latin1, which cycles a ~146 KB block through WTF's string allocator on every call. mimalloc reuses blocks of that size much more cheaply than WTF's allocator does.

Fix

  • New EncodeHexLowerImpl Highway kernel in highway_strings.cpp (same HWY_EXPORT + HWY_DYNAMIC_DISPATCH pattern as the existing kernels): split each byte into nibbles, map both nibble vectors through the 16-byte hex-digit table with TableLookupBytes, and write the interleaved digits with StoreInterleaved2. Exposed as bun_highway::encode_hex_lower; bun_core::strings::encode_bytes_to_hex dispatches to it for inputs ≥ 64 bytes and keeps the scalar LUT loop below that (small buffers were already ahead of Zig). Per-target symbols added to the verify-baseline-static allowlists.
  • Buffer.toString("base64"/"base64url") (to_bun_string_comptime) now encodes outputs ≥ 32 KB into a mimalloc-backed buffer wrapped in an external WTF string — the strategy the Zig implementation used — via a shared encode_base64_to_bun_string helper. Smaller outputs keep the uninitialized-WTF-string path, which is what makes the 16-byte cases faster than Zig today.

Output bytes are unchanged for every encoding and size (see verification).

Benchmarks

Shared/noisy Ice Lake Xeon container, medians of interleaved rounds; ratios are the signal, not absolute numbers. Zig is bun 1.3.14, before is canary at 49c97de, after is this PR (both release builds from this tree).

bench/snippets/buffer-to-string.mjs:

case Zig 1.3.14 before after
Buffer(110000).toString('hex') 34.6–37.6 µs 92.7–93.8 µs 24.0–24.8 µs
Buffer(16).toString('hex') 105–109 ns 117–120 ns 103–112 ns
Buffer(110000).toString('base64') 16.0–17.9 µs 15.3–16.1 µs 16.1–16.9 µs
Buffer(110000).toString('base64url') 16.9–18.7 µs 15.0–15.4 µs 16.1–16.7 µs
Buffer(16).toString('base64') ~119 ns 86–91 ns 104–120 ns (unchanged path, container noise)
Buffer(110000).toString('ascii') 13.4–14.0 µs 9.5–11.4 µs 11.6–11.8 µs (untouched path)

The 110 KB base64 delta (~1 µs on the reporting host) is below this container's noise floor, so it reads as a wash above; the allocation-strategy effect is visible once the output is big enough to clear the noise, in bench/snippets/buffer-base64.mjs (toString direction, two rounds):

size Zig 1.3.14 before after
64 KB 10.2 µs 10.5–15.6 µs 10.3–10.5 µs
512 KB 86.0 µs 96.5–107.0 µs 89.8–90.7 µs
8 MB 1.97 ms 2.18–2.24 ms 1.97–2.04 ms

Hex is ~3.8× faster than before this change and ~30–40% faster than the Zig build at 110 KB; large base64 recovers the gap to Zig (and beats it at 8 MB).

How did you verify your code works?

  • bun bd test test/js/node/buffer.test.js — 468 pass, including the new Buffer.prototype.toString binary-to-text encodings block: byte-for-byte comparison against pure-JS reference encoders across every length 0–130 plus 192/256/512/1024 ±1 (vector-width boundaries and the scalar tail), unaligned subarray views, every length % 3 padding shape, and a 110 000-byte buffer checked against SHA-256 digests that were cross-verified against bun 1.3.14 (pre-rewrite Zig) and the JS reference.
  • Vendored Node suites that exercise these paths: test-buffer-tostring.js, test-buffer-alloc.js, test-stringbytes-external.js (~500 KB hex/base64 encode+decode round-trips), and test/js/node/string_decoder/string-decoder.test.js — all pass with the debug (ASAN) build.
  • fs.readFile(path, "hex" | "base64" | "base64url") spot-checked against Buffer.toString output (the readFile hex path shares encode_bytes_to_hex).
  • cargo clippy -p bun_core -p bun_highway -p bun_runtime clean.
  • Baseline-ISA allowlist entries were generated by running scripts/verify-baseline-static against the locally built bun-profile and copying the feature sets it reported per target. The local toolchain doesn't compile the AVX10_2 target, so that entry mirrors the SPR/DL ceilings (+AVX512VL); if the baseline/aarch64 CI lanes report different sets I'll update the entries to match, same as the CopyAsciiPrefixImpl addition did.
  • A throughput regression guard in the same describe block asserts that toString('hex') of a 110 KB buffer stays within 6× the cost of a toString('latin1') copy of the same buffer (median of GC-isolated samples). The scalar per-byte loop sits at ≥10× that baseline while the SIMD kernel stays at ~2–3×, so this test fails on the pre-SIMD implementation and passes with this PR; it is skipped on debug/ASAN builds, where the unoptimized, instrumented native kernels make timing ratios meaningless. The full before/after picture is the benchmark tables above, reproducible via bench/snippets/buffer-to-string.mjs and bench/snippets/buffer-base64.mjs.

Restore a SIMD hex encoder (Highway kernel, runtime-dispatched) for
encode_bytes_to_hex, and encode large base64/base64url outputs into a
mimalloc-backed buffer wrapped in an external WTF string instead of
cycling large blocks through WTF's string allocator.
@coderabbitai

coderabbitai Bot commented May 26, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Warning

Review limit reached

@robobun, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 55 minutes and 34 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 2e525239-3a58-4b00-a918-9541170270a6

📥 Commits

Reviewing files that changed from the base of the PR and between 81397ae and 5f0e161.

📒 Files selected for processing (3)
  • scripts/verify-baseline-static/allowlist-x64-windows.txt
  • src/highway/lib.rs
  • test/js/node/buffer.test.js

Walkthrough

This PR adds Highway-based SIMD acceleration for hex encoding with a 64-byte threshold, refactors base64 encoding into a length-aware helper, registers symbols across ARM64 and x64 platforms, and introduces comprehensive tests validating encoding correctness at SIMD and scalar boundaries.

Changes

Hex SIMD and Encoding Refactoring

Layer / File(s) Summary
C++ Highway Hex Encoder Implementation
src/jsc/bindings/highway_strings.cpp
EncodeHexLowerImpl uses Highway SIMD to convert bytes to lowercase hex via nibble splitting and table lookup, with scalar tail handling. Function is exported via dispatch table and wrapped by extern "C" highway_encode_hex_lower.
Rust FFI Binding and Public Wrapper
src/highway/lib.rs
Adds unsafe extern "C" FFI binding and exposes a public #[inline(always)] wrapper encode_hex_lower with buffer size validation and early-exit for empty input.
Hex Encoding Path Integration
src/bun_core/string/immutable.rs
encode_bytes_to_hex now dispatches to highway SIMD for inputs ≥64 bytes, falling back to scalar LUT implementation for smaller sizes.
Static Symbol Allowlist Registration
scripts/verify-baseline-static/allowlist-aarch64.txt, allowlist-x64.txt, allowlist-x64-windows.txt
Registers EncodeHexLowerImpl symbols across all Highway variants: ARM64 SVE/SVE2, x64 AVX2/AVX3/AVX10, and Windows MSVC-mangled forms; section symbol counts incremented.
Base64 Encoding Length-Aware Refactoring
src/runtime/webcore/encoding.rs
to_bun_string_comptime delegates base64/base64url to new encode_base64_to_bun_string helper that chooses strategy by output length: small outputs encode into uninitialized BunString, large outputs allocate Vec, fill via fill_spare, and return external string.
Comprehensive Encoding Test Suite
test/js/node/buffer.test.js
Adds pure-JS reference encoders and deterministic tests validating Buffer.prototype.toString byte-exactness at SIMD boundaries, unaligned views, large buffers via SHA-256 digest, and small buffers across padding shapes.

Suggested reviewers

  • Jarred-Sumner
  • alii
🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main objective: optimizing Buffer.toString for hex and base64 encodings on large buffers, which is the core focus of the changeset.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The PR description comprehensively explains the regression, root causes, fixes, benchmarks, and verification for all encoding optimizations (hex and base64).

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@robobun

robobun commented May 26, 2026

Copy link
Copy Markdown
Collaborator Author
Updated 1:32 AM PT - May 26th, 2026

@robobun, your commit 5f0e161821d79b253994f0e9af607a36edd2ffba passed in Build #58173! 🎉


🧪   To try this PR locally:

bunx bun-pr 31421

That installs a local version of the PR into your bun-31421 executable, so you can run:

bun-31421 --bun

@github-actions

Copy link
Copy Markdown
Contributor

Found 1 issue this PR may fix:

  1. Switch from Zig’s @Vector to Google Highway SIMD lib #8782 - This PR adds a new Highway SIMD kernel (EncodeHexLowerImpl) for hex encoding, directly implementing the migration from Zig's @Vector to Google Highway that this issue proposes

If this is helpful, copy the block below into the PR description to auto-close these issues on merge.

Fixes #8782

🤖 Generated with Claude Code

Comment thread test/js/node/buffer.test.js Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/highway/lib.rs`:
- Around line 265-275: The function encode_hex_lower currently uses
debug_assert! which is removed in release builds and can allow out-of-bounds FFI
writes; replace the debug-only check with a runtime capacity check: compute
needed = src.len().checked_mul(2) (handle overflow by treating as too-large),
then assert!(dst.len() >= needed) or return a Result indicating insufficient
capacity so the public API cannot call highway_encode_hex_lower with a too-small
dst; update encode_hex_lower to perform this checked validation before the
unsafe call to highway_encode_hex_lower and include a clear error message
referencing src/dst sizes.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 11eb8c5c-77bf-47db-8ff2-b490d6fe9928

📥 Commits

Reviewing files that changed from the base of the PR and between ec68d39 and 81397ae.

📒 Files selected for processing (8)
  • scripts/verify-baseline-static/allowlist-aarch64.txt
  • scripts/verify-baseline-static/allowlist-x64-windows.txt
  • scripts/verify-baseline-static/allowlist-x64.txt
  • src/bun_core/string/immutable.rs
  • src/highway/lib.rs
  • src/jsc/bindings/highway_strings.cpp
  • src/runtime/webcore/encoding.rs
  • test/js/node/buffer.test.js

Comment thread src/highway/lib.rs

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't find any correctness issues, but this adds a new Highway SIMD kernel plus an allocation-strategy change on a hot path (and the ISA allowlist entries are partly speculative per the description), so it's worth a human pass.

Extended reasoning...

Overview

This PR restores large-buffer Buffer.toString('hex'/'base64') throughput that regressed in the Zig→Rust port. It adds a new EncodeHexLowerImpl Highway kernel in highway_strings.cpp (nibble split → TableLookupBytesStoreInterleaved2), wires it through a Rust FFI wrapper in src/highway/lib.rs and a 64-byte threshold in encode_bytes_to_hex, and refactors the base64/base64url arms of to_bun_string_comptime into a helper that switches to a mimalloc-backed external WTF string above 32 KB of output. Supporting changes: per-target symbol entries in three verify-baseline-static allowlists, and a substantial new test block in buffer.test.js (byte-exact reference comparisons across vector-width boundaries, unaligned views, SHA-256 digests of a 110 KB buffer, and a timing-ratio regression guard).

Security risks

None apparent. Inputs are in-process Buffer bytes; outputs are freshly allocated strings. The one memory-safety surface — the FFI write into dst — is now guarded by a runtime assert!(dst.len() / 2 >= src.len()) (added after CodeRabbit's comment), and the only caller slices dst to exactly 2 * to_read. The C++ kernel's HWY_RESTRICT aliasing assumption is satisfied by the &/&mut split on the Rust side. The fill_spare usage for the large-base64 path correctly reserves to_len on an empty Vec before slicing spare[..to_len].

Level of scrutiny

This warrants human review rather than bot approval. It is a new hand-written SIMD kernel that will run on every ≥64-byte hex encode across SSE/AVX2/AVX-512/SVE targets, relying on Highway semantics (TableLookupBytes index-range behaviour, StoreInterleaved2 write bounds, ShiftRight<4> on u8) that look correct to me but are exactly the kind of thing a maintainer familiar with this file's other kernels should eyeball. The base64 change swaps the allocation/ownership model on a hot path (uninitialized WTF string → external mimalloc string with finalizer), which is a design choice worth a human nod. The PR description also explicitly flags that the AVX10_2 allowlist feature set is extrapolated and may need adjusting once the baseline CI lanes report.

Other factors

Both prior review comments (my subarray end-index nit and CodeRabbit's debug_assert!assert! request) have been addressed in follow-up commits and the threads are resolved. Test coverage is thorough and includes round-trip decode checks. The new timing-based regression guard (hex ≤ 6× latin1) is skipped on debug/ASAN and uses a median of 13 GC-isolated samples with wide margin, which seems reasonable but is the kind of thing that occasionally flakes on shared CI — worth a maintainer's judgment on whether that belongs in the suite vs. the bench scripts.

…VX3_DL hex allowlist ceiling

The in-suite timing baseline gets skewed by the surrounding tests' heap and
GC activity, so run the hex-vs-latin1 measurement in a fresh process. The
Windows baseline ISA scan reports AVX512DQ/AVX512VL encodings in the AVX3_DL
hex kernel, so include them in that entry's feature ceiling.

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No bugs found, but this adds a new Highway SIMD kernel with unsafe FFI plumbing, changes the large-buffer base64 allocation strategy, and touches ISA allowlists across three platforms — worth a human pass on the kernel and the 32 KB / 64 B thresholds.

Extended reasoning...

Overview

This PR restores pre-Rust-port performance for Buffer.toString('hex'|'base64'|'base64url') on large buffers. It adds a new Highway kernel EncodeHexLowerImpl in highway_strings.cpp (nibble split → TableLookupBytesStoreInterleaved2), wires it through a new extern "C" shim and a safe Rust wrapper in src/highway/lib.rs, and dispatches to it from encode_bytes_to_hex for inputs ≥ 64 bytes. Separately, to_bun_string_comptime's base64/base64url arms are refactored into encode_base64_to_bun_string, which switches from create_uninitialized_latin1 to a mimalloc-backed Vec + external WTF string when the encoded output is ≥ 32 KB. Three verify-baseline-static allowlists gain per-target EncodeHexLowerImpl entries, and buffer.test.js gains a ~190-line correctness suite plus a subprocess-based throughput regression guard.

Security risks

None apparent. The new FFI write path (highway_encode_hex_lower) is guarded by a runtime assert!(dst.len() / 2 >= src.len()) in the safe wrapper (added in 55b86a4 after CodeRabbit feedback), and the only caller slices dst to exactly 2 * src.len(). The HWY_RESTRICT on input/output is satisfied because the Rust wrapper takes &/&mut (non-overlapping by construction). The kernel's index vectors (ShiftRight<4> and And(..., 0x0F)) are bounded to [0,15], so TableLookupBytes behaves identically across PSHUFB / NEON TBL / AVX-512 VPERMB without high-bit zeroing surprises. No user-controlled data reaches allowlists or build config.

Level of scrutiny

High. Buffer.toString is one of the hottest user-facing paths in the runtime, and this change introduces (a) a brand-new per-ISA-target SIMD kernel that will execute on every ≥64-byte hex encode, (b) a heuristic allocation-strategy fork at 32 KB that trades WTF-string allocation for an external-string finalizer, and (c) hand-written ISA feature-ceiling entries in the baseline allowlists — one of which (N_AVX3_DL on Windows) already needed a follow-up commit after CI flagged a mismatch, and the PR description notes the AVX10_2 entry was extrapolated rather than measured. None of this is mechanical.

Other factors

Test coverage is strong: byte-exact comparison against pure-JS reference encoders across 0–130 and vector-width ±1 boundaries, unaligned subarray views, a 110 KB buffer cross-checked via SHA-256 against Node and the JS reference, and decode round-trips. The throughput guard runs in a subprocess with GC disabled and uses a 6× ratio with wide margin, but timing-based tests are always a flake risk worth a second opinion. Both prior review threads (my subarray nit and CodeRabbit's debug_assert!assert!) are resolved. CodeRabbit's walkthrough suggests Jarred-Sumner / alii as reviewers, which seems right for Highway + allocator changes.

…t ceilings

The Windows baseline ISA scan reports the same AVX512DQ/AVX512VL encodings
in the AVX3_SPR and AVX3_ZEN4 variants of EncodeHexLowerImpl as in AVX3_DL.

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found after the fixes, but this adds a new Highway SIMD kernel, an unsafe FFI boundary, and changes the large-buffer base64 allocation strategy — worth a human pass on the kernel correctness and the 32 KB / 64 B thresholds.

Extended reasoning...

Overview

This PR restores large-buffer Buffer.toString('hex'/'base64') performance to (and past) the pre-Rust-port baseline. It adds a new EncodeHexLowerImpl Highway kernel in highway_strings.cpp with the standard HWY_EXPORT/HWY_DYNAMIC_DISPATCH plumbing, a Rust FFI wrapper in bun_highway, a 64-byte dispatch threshold in encode_bytes_to_hex, and a refactored encode_base64_to_bun_string that switches to a mimalloc-backed external string above 32 KB output. It also registers the new per-target symbols in all three verify-baseline-static allowlists and adds ~190 lines of byte-exact reference tests plus a throughput regression guard.

Security risks

None identified. The new unsafe surface is the highway_encode_hex_lower FFI call, which is now guarded by a runtime assert!(dst.len() / 2 >= src.len()) (upgraded from debug_assert! per CodeRabbit feedback), and the only caller slices dst to exactly 2 * src.len(). The base64 path's fill_spare usage commits only the bytes the encoder reports as written. Inputs are arbitrary user buffer bytes, but the encoders are pure data transforms with fixed-ratio output sizes.

Level of scrutiny

Medium-high. This is a hot-path performance change in Buffer.prototype.toString that introduces hand-written SIMD across six x86 targets and four ARM SVE targets, plus an allocation-strategy change whose correctness depends on create_external_globally_allocated_latin1 ownership semantics. The kernel itself (ShiftRight<4> / And 0x0F / TableLookupBytes / StoreInterleaved2) is a standard nibble-interlace and the scalar tail is straightforward, but SIMD kernels and ISA allowlist ceilings are exactly the kind of thing where a second pair of eyes pays off — the PR already needed two follow-up commits to get the Windows SPR/ZEN4 ceilings right, and the description notes the AVX10_2 entry is extrapolated rather than measured.

Other factors

Test coverage is strong: byte-for-byte comparison against pure-JS reference encoders across 0–130 and vector-width ±1 boundaries, unaligned subarray views (my earlier nit about the subarray end index was fixed in 77f5e52), SHA-256 cross-checks against a 110 KB buffer, and round-trip decodes. The throughput guard runs in a subprocess and is skipped on debug/ASAN. Both prior review comments (mine and CodeRabbit's) are resolved. Still, the combination of new SIMD code, unsafe FFI, magic thresholds (64 B, 32 KB), and hand-maintained per-ISA allowlist entries puts this outside what I'd auto-approve without human sign-off.

@Jarred-Sumner Jarred-Sumner left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the buffer.write code path?

@robobun

robobun commented May 26, 2026

Copy link
Copy Markdown
Collaborator Author

buffer.write() (and Buffer.from(string, encoding)) is the decode direction, and this PR doesn't touch it — write_u8's hex arm still goes through the scalar decode_hex_to_bytes_truncate, and base64/base64url writes go through bun_base64::decode (simdutf + the forgiving-base64 fallback), none of which are in this diff. The existing hexWrite/base64Write/base64urlWrite tests and the new round-trip decode checks in buffer.test.js cover it and are unchanged.

I did measure it to make sure there isn't a lurking port regression on that side. Same methodology as the PR benchmarks (110 000-byte payload, median of GC-isolated samples, three interleaved rounds on a noisy shared box — ranges below are min–max of the three rounds):

case Zig 1.3.14 current canary (identical to this branch for these paths)
buf.write(hex, "hex") (220 000 chars) 119–250 µs 98–248 µs
buf.write(b64, "base64") 9.3–12.6 µs 7.2–12.0 µs
buf.write(b64url, "base64url") 440–556 µs 240–477 µs
Buffer.from(hex, "hex") 109–248 µs 138–258 µs
Buffer.from(b64, "base64") 16–27 µs 18–26 µs

So: write-hex and write/from-base64 are at parity with (or slightly ahead of) the Zig build — hex decode was a scalar loop in Zig too, and base64 decode is the same simdutf call in both. Buffer.from(hex) may be modestly behind Zig but it's within this box's noise. The one eyebrow-raiser is write(…, "base64url") being ~25–40× slower than write(…, "base64") in both builds: the write path calls the standard-alphabet decoder, which rejects -/_ and falls through to the scalar forgiving decoder, so it's a pre-existing quirk rather than a port regression.

Two natural follow-ups I'd keep out of this PR to keep it reviewable, but am happy to do next:

  1. A decode-direction Highway kernel (SIMD hex decode) for write(…, "hex") / Buffer.from(…, "hex"), mirroring the encoder added here.
  2. Pass the URL-safe flag through for write(…, "base64url") so it uses the simdutf base64url decoder instead of the forgiving fallback.

@robobun

robobun commented May 26, 2026

Copy link
Copy Markdown
Collaborator Author

CI note: the only red check on the current head is darwin-14-aarch64-test-bun, which expired waiting for an agent (no macOS 14 arm64 runner picked it up) rather than failing any test — the same suite is green on the other Darwin/Linux/Windows lanes, and everything else (73 checks) has passed. It just needs a job retry in Buildkite when an agent is available.

@Jarred-Sumner Jarred-Sumner merged commit cab4feb into main May 26, 2026
78 checks passed
@Jarred-Sumner Jarred-Sumner deleted the farm/265ef138/buffer-tostring-hex-base64 branch May 26, 2026 10:14
springmin pushed a commit to springmin/bun that referenced this pull request May 26, 2026
* oven/main (3 new commits):
  build: support cross-compiling Windows (x64 and arm64) from Linux (oven-sh#31300)
  Bun.serve: restore per-request GC memory accounting to fix elevated RSS under HTTP load (oven-sh#31422)
  Optimize Buffer.toString('hex'/'base64') for large buffers (oven-sh#31421)

Auto-merged: scripts/build.ts, scripts/build/bun.ts, scripts/build/deps/webkit.ts, scripts/build/flags.ts, scripts/build/rust.ts, scripts/build/source.ts, scripts/build/tools.ts
Resolved conflict in scripts/build/config.ts: kept OHOS closing block + upstream Windows cross-compilation block
springmin pushed a commit to springmin/bun that referenced this pull request May 26, 2026
* oven/main (3 new commits):
  build: support cross-compiling Windows (x64 and arm64) from Linux (oven-sh#31300)
  Bun.serve: restore per-request GC memory accounting to fix elevated RSS under HTTP load (oven-sh#31422)
  Optimize Buffer.toString('hex'/'base64') for large buffers (oven-sh#31421)

Auto-merged: scripts/build.ts, scripts/build/bun.ts, scripts/build/deps/webkit.ts, scripts/build/flags.ts, scripts/build/rust.ts, scripts/build/source.ts, scripts/build/tools.ts
Resolved conflict in scripts/build/config.ts: kept OHOS closing block + upstream Windows cross-compilation block
Jarred-Sumner added a commit that referenced this pull request May 26, 2026
### What does this PR do?

`Buffer.from(str, "hex")` (and `buf.write(str, "hex")`) currently decode
two characters per loop iteration through a lookup table. This PR adds
Highway SIMD kernels for hex decoding and routes the shared decoder
through them, so whole vectors of input characters are validated and
converted at a time.

- **`DecodeHex8Impl` / `DecodeHex16Impl`** in `highway_strings.cpp`
(same `HWY_EXPORT` + `HWY_DYNAMIC_DISPATCH` pattern as the existing
kernels): classify digits with two range checks, compute nibble values
arithmetically, pair them with `ConcatEven`/`ConcatOdd`, and stop at the
first vector block containing a non-hex character — the scalar tail then
pinpoints the exact pair, preserving Node's "stop at the first invalid
pair" semantics byte-for-byte. A capped 128-bit cleanup loop handles the
16–63-pair remainder on wide-vector targets so digest-sized inputs
(md5/sha256 strings) still vectorize on AVX-512 machines.
- **`bun_highway::decode_hex` / `decode_hex_u16`** expose the kernels to
Rust.
- **`_decode_hex_to_bytes`** (the shared helper behind `Buffer.from`,
`buf.write`, CSRF token parsing, etc.) takes the Highway path for inputs
of at least 16 byte pairs, for both Latin-1 and UTF-16 source strings;
shorter inputs keep the existing scalar loop. The UTF-16 kernel
classifies on the full code unit, so units above 0xFF whose low byte
looks like a hex digit are still rejected, matching the scalar decoder.
- `verify-baseline-static` allowlists gain entries for the new dispatch
targets, with feature ceilings taken from the tool's own report on the
compiled kernels.

Complementary to #31421, which covers the encode direction
(`toString('hex')`); the two touch the same files but different
functions. Related to #8782 (adopting Highway for runtime-dispatched
SIMD) — this covers the hex decode path only, so it is not marked as
closing that issue.

### Benchmarks

`bench/snippets/buffer-from-hex.mjs` (added in this PR), median of 3
interleaved rounds on linux x64 (Ice Lake, AVX-512), comparing a release
build of `main` (4324120) with this PR at the same commit:

| input | main | this PR | speedup |
|---|---|---|---|
| 32 chars → 16 B | 237 ns | 235 ns | ~1.0x |
| 64 chars → 32 B | 255 ns | 237 ns | 1.08x |
| 128 chars → 64 B | 273 ns | 243 ns | 1.12x |
| 1 KiB chars → 512 B | 796 ns | 375 ns | 2.1x |
| 128 KiB chars → 64 KB | 65.4 µs | 9.8 µs | 6.7x |
| 1 MiB chars → 512 KB | 513 µs | 88 µs | 5.8x |
| 16 MiB chars → 8 MB | 8.50 ms | 2.22 ms | 3.8x |

Small inputs are dominated by the fixed `Buffer.from` overhead, so they
sit at parity; the decode itself is the win for everything from a few
hundred characters up.

### How did you verify your code works?

- `bun bd test test/js/node/buffer.test.js` — 472 pass with the debug
(ASAN) build; also passes with `USE_SYSTEM_BUN=1` since this is a
behavior-preserving optimization. `test/js/bun/util/csrf.test.ts`,
`buffer-utf16`, `buffer-concat`, and `buffer-from-encoding-leak` (with
the ASAN quarantine accounted for) also pass.
- New regression coverage in `test/js/node/buffer.test.js` sweeps the
SIMD block boundaries: every length around the vector widths (15–1024
pairs, with and without a trailing lone digit), an invalid character at
every position of an 80-pair input for ten different invalid bytes,
UTF-16 code units above 0xFF whose low byte is a valid hex digit placed
at block boundaries, and `buf.write` destination-limit behavior — all
cross-checked against a plain JS reference decoder, on both the Latin-1
and two-byte string paths. These tests pass before and after this change
by design (there is no functional delta to assert); the before/after
evidence for the optimization itself is the benchmark table above.
- A standalone harness compiled the kernels for every Highway target and
checked them against a scalar reference for all input lengths 0–600,
invalid characters at every position, and the UTF-16 high-byte cases,
including "never writes past the reported count".
- `cargo clippy -p bun_core -p bun_highway` is clean.
- The baseline-ISA allowlist entries are the feature sets
`verify-baseline-static` reports for the new kernels when compiled with
the same LLVM toolchain; if the CI baseline builds report different sets
I'll update the entries to match.

Co-authored-by: Jarred Sumner <jarred@jarredsumner.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants