Skip to content

perf(napi/parser, linter/plugins): speed up decoding strings in raw transfer#21021

Merged
graphite-app[bot] merged 1 commit intomainfrom
om/04-03-perf_napi_parser_linter_plugins_speed_up_decoding_strings_in_raw_transfer
Apr 3, 2026
Merged

perf(napi/parser, linter/plugins): speed up decoding strings in raw transfer#21021
graphite-app[bot] merged 1 commit intomainfrom
om/04-03-perf_napi_parser_linter_plugins_speed_up_decoding_strings_in_raw_transfer

Conversation

@overlookmotel
Copy link
Copy Markdown
Member

@overlookmotel overlookmotel commented Apr 3, 2026

Improve perf of deserializing strings in raw transfer. This PR combines several optimizations, which have been tested and benchmarked in https://github.com/overlookmotel/oxc-raw-str-bench. This PR implements the version "latin-slice-onebyte64" from that repo, which is the current winner.

String deserialization is the main bottleneck in raw transfer, so speeding it up will likely make a large impact on deserialization overall.

This work follows on from #20834 which produced a major speed-up in many files by making files which contain some non-ASCII characters take the fast path of slicing sourceText more often.

This PR tackles the remainder - speeding up the fallback path where the fast path can't be taken.

Optimizations

The optimizations in this PR are:

Latin1

When source is not 100% ASCII, decode source text from buffer as Latin1.

A Latin1-decoded string represents each UTF-8 byte as a single Latin1 character, so it can be indexed into using UTF-8 offsets.

So when we can't slice the string from sourceText because the UTF-8 and UTF-16 offsets differ (after any non-ASCII character), loop through the string's bytes and check if they're all ASCII. If they are, the string can be sliced from sourceTextLatin instead, with the original UTF-8 offsets.

This is way faster than calling textDecoder.decode, as it avoids a call into C++. Benchmarks show speed up of 55% on average, and up to 70% on some files.

Latin1 decoding method

It turns out that new TextDecoder("latin1").decode(arr) doesn't actually decode to Latin1!

Per the WHATWG Encoding Standard, "latin1" is mapped to "windows-1252".

The result is that with TextDecoder("latin1"):

  1. decode is quite complicated, requiring a 2-pass scan of the bytes to determine if they're all ASCII, followed by a 2nd pass to do the actual windows-1252 decoding. If the string does contain any non-ASCII characters (which it always does in our usecase), NodeJS implements the decoding in JS, not native code. Slow.
  2. decode produces a 2-byte-per-char string (TWO_BYTE in V8), which takes more memory, and is slower for all operations on it e.g. string comparison, hashing for use as an object key etc.

Instead, use Buffer.prototype.latin1Slice which:

  1. Does a pure Latin1 decode, which is just a single memcpy call.
  2. Produces a 1-byte-per-char string (ONE_BYTE in V8).

latin1Slice involves a call into C++, but we only do it once per file, so this cost is tiny in context of deserializing the whole AST.

Latin1 string slicing

In the fast path, slice from the Latin1-decoded string, instead of sourceText. In the fast path, we know that all bytes of source comprising the string are ASCII, so no further checks are required.

This makes no difference on benchmarks for deserializeStr itself, but it may have beneficial effects downstream for code (e.g. lint rules) which access strings in the AST, e.g. Identifier names.

Because Latin1-decoded source text is ONE_BYTE-encoded, slices of it are too. In comparison, slices of sourceText may be ONE_BYTE or TWO_BYTE. If a file's source is pure ASCII, it'll be ONE_BYTE, if source contains any non-ASCII characters, it'll be TWO_BYTE. Files in a repo will likely be a mix of both, which makes strings returned from deserializeStr and placed in the AST a mix too. This in turn makes functions (e.g. lint rule visitors) polymorphic. V8 cannot optimize them as aggressively as if they see only ONE_BYTE strings.

We cannot make sure that all strings returned by deserializeStr are ONE_BYTE. Some string may contain non-ASCII characters, and they have to be represented in TWO_BYTE form. But we can minimize it - now only strings which themselves contain non-ASCII characters are TWO_BYTE, whereas before they would be if the source text as a whole contains a single non-ASCII byte.

Code which accesses Identifier names, for example will exclusively see ONE_BYTE strings and will be more heavily optimized, because Unicode Identifiers are rarer than hen's teeth in real-world code.

Remove string-concatenation loop

Previously strings which are outside of source text were assembled byte-by-byte in a loop via concatenation.

Instead, check that all the bytes are ASCII first, copy them into an array and pass that array to String.fromCharCode with fromCharCode.apply(null, array).

To avoid allocating a fresh array every time, hold a stock of arrays for all string lengths that this path can require, and reuse them.

This is a variation on the approach that #20883 took, but without the massive switch. This produces much tighter assembly, and avoids regressing the fast path due making deserializeStr a very large function.

Despite the complexity, and multiple operations, this is up to 3x faster than the switch approach, and gives an average 30% speed-up.

Increase native call threshold

The above optimizations make the slow path much faster. This shifts the tipping point at which it's faster to make a native call to TextDecoder.decode from 9 bytes to 64 bytes. Most strings now avoid the native call and stay in JS code which is heavily optimized by Turbofan.

The tipping point of 64 is something of a guesstimate. Benchmarking shows its in the right ballpark, but we could finesse it, and probably squeeze out another couple of %.

Credit

The Latin1 string technique was cooked up by @joshuaisaact in overlookmotel/oxc-raw-str-bench#1. All credit to him for this masterstroke which cracks the whole problem!

Copy link
Copy Markdown
Member Author

overlookmotel commented Apr 3, 2026


How to use the Graphite Merge Queue

Add either label to this PR to merge it via the merge queue:

  • 0-merge - adds this PR to the back of the merge queue
  • hotfix - for urgent changes, fast-track this PR to the front of the merge queue

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

@github-actions github-actions bot added A-linter Area - Linter A-parser Area - Parser A-cli Area - CLI A-ast-tools Area - AST tools A-linter-plugins Area - Linter JS plugins C-performance Category - Solution not expected to change functional behavior, only performance labels Apr 3, 2026
@overlookmotel overlookmotel self-assigned this Apr 3, 2026
@overlookmotel overlookmotel marked this pull request as ready for review April 3, 2026 20:50
@overlookmotel overlookmotel requested a review from camc314 as a code owner April 3, 2026 20:50
Copilot AI review requested due to automatic review settings April 3, 2026 20:50
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR speeds up raw-transfer string deserialization (a known hotspot) by introducing a Latin-1 decoded source string for fast ASCII slicing on non-ASCII sources, plus a faster short-string fallback path that avoids per-byte concatenation and reduces native TextDecoder.decode calls.

Changes:

  • Decode source bytes once into a Latin-1 “byte-addressable” string (sourceTextLatin) and prefer slicing it for ASCII substrings even when the original sourceText is non-ASCII.
  • Replace the byte-by-byte concatenation loop for short non-source strings with a reused temp-array + String.fromCharCode.apply approach.
  • Raise the native TextDecoder.decode crossover threshold to 64 bytes.

Reviewed changes

Copilot reviewed 1 out of 10 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tasks/ast_tools/src/generators/raw_transfer.rs Generator updates implementing Latin-1 source decoding, temp-array short-string decoding, and the new crossover threshold.
napi/parser/src-js/generated/deserialize/ts.js Regenerated TS deserializer using sourceTextLatin, array-based short-string decode, and 64-byte crossover.
napi/parser/src-js/generated/deserialize/ts_range.js Same regeneration for the range variant.
napi/parser/src-js/generated/deserialize/ts_range_parent.js Same regeneration for the range + parent variant.
napi/parser/src-js/generated/deserialize/ts_parent.js Same regeneration for the parent variant.
napi/parser/src-js/generated/deserialize/js.js Regenerated JS deserializer with the new decoding strategy.
napi/parser/src-js/generated/deserialize/js_range.js Same regeneration for the range variant.
napi/parser/src-js/generated/deserialize/js_range_parent.js Same regeneration for the range + parent variant.
napi/parser/src-js/generated/deserialize/js_parent.js Same regeneration for the parent variant.
apps/oxlint/src-js/generated/deserialize.js Regenerated oxlint-side deserializer with the new Latin-1 and short-string decoding paths.

@graphite-app
Copy link
Copy Markdown
Contributor

graphite-app bot commented Apr 3, 2026

Merge activity

…ransfer (#21021)

Improve perf of deserializing strings in raw transfer. This PR combines several optimizations, which have been tested and benchmarked in https://github.com/overlookmotel/oxc-raw-str-bench. This PR implements the version "latin-slice-onebyte64" from that repo, which is the current winner.

String deserialization is the main bottleneck in raw transfer, so speeding it up will likely make a large impact on deserialization overall.

This work follows on from #20834 which produced a major speed-up in many files by making files which contain some non-ASCII characters take the fast path of slicing `sourceText` more often.

This PR tackles the remainder - speeding up the fallback path where the fast path can't be taken.

## Optimizations

The optimizations in this PR are:

### Latin1

When source is not 100% ASCII, decode source text from buffer as Latin1.

A Latin1-decoded string represents each UTF-8 byte as a single Latin1 character, so it can be indexed into using UTF-8 offsets.

So when we can't slice the string from `sourceText` because the UTF-8 and UTF-16 offsets differ (after any non-ASCII character), loop through the string's bytes and check if they're all ASCII. If they are, the string can be sliced from `sourceTextLatin` instead, with the original UTF-8 offsets.

This is way faster than calling `textDecoder.decode`, as it avoids a call into C++. [Benchmarks show](https://github.com/overlookmotel/oxc-raw-str-bench/blob/4f96275efa9a35d5d27615abb27f21a137149cc0/README.md#apply30-vs-latin-vs-latin-source64) speed up of 55% on average, and up to 70% on some files.

### Latin1 decoding method

It turns out that `new TextDecoder("latin1").decode(arr)` doesn't actually decode to Latin1!

Per the WHATWG Encoding Standard, "latin1" is mapped to "windows-1252".

The result is that with `TextDecoder("latin1")`:

1. `decode` is quite complicated, requiring a 2-pass scan of the bytes to determine if they're all ASCII, followed by a 2nd pass to do the actual `windows-1252` decoding. If the string *does* contain any non-ASCII characters (which it always does in our usecase), NodeJS implements the decoding in JS, not native code. Slow.
2. `decode` produces a 2-byte-per-char string (`TWO_BYTE` in V8), which takes more memory, and is slower for all operations on it e.g. string comparison, hashing for use as an object key etc.

Instead, use `Buffer.prototype.latin1Slice` which:

1. Does a pure Latin1 decode, which is just a single `memcpy` call.
2. Produces a 1-byte-per-char string (`ONE_BYTE` in V8).

`latin1Slice` involves a call into C++, but we only do it once per file, so this cost is tiny in context of deserializing the whole AST.

### Latin1 string slicing

In the fast path, slice from the Latin1-decoded string, instead of `sourceText`. In the fast path, we know that all bytes of source comprising the string are ASCII, so no further checks are required.

This makes no difference on benchmarks for `deserializeStr` itself, but it may have beneficial effects downstream for code (e.g. lint rules) which access strings in the AST, e.g. `Identifier` names.

Because Latin1-decoded source text is `ONE_BYTE`-encoded, slices of it are too. In comparison, slices of `sourceText` may be `ONE_BYTE` or `TWO_BYTE`. If a file's source is pure ASCII, it'll be `ONE_BYTE`, if source contains any non-ASCII characters, it'll be `TWO_BYTE`. Files in a repo will likely be a mix of both, which makes strings returned from `deserializeStr` and placed in the AST a mix too. This in turn makes functions (e.g. lint rule visitors) polymorphic. V8 cannot optimize them as aggressively as if they see only `ONE_BYTE` strings.

We cannot make sure that all strings returned by `deserializeStr` are `ONE_BYTE`. Some string may contain non-ASCII characters, and they *have* to be represented in `TWO_BYTE` form. But we can minimize it - now only strings which *themselves* contain non-ASCII characters are `TWO_BYTE`, whereas before they would be if the source text as a whole contains a single non-ASCII byte.

Code which accesses `Identifier` names, for example will exclusively see `ONE_BYTE` strings and will be more heavily optimized, because Unicode `Identifier`s are rarer than hen's teeth in real-world code.

### Remove string-concatenation loop

Previously strings which are outside of source text were assembled byte-by-byte in a loop via concatenation.

Instead, check that all the bytes are ASCII first, copy them into an array and pass that array to `String.fromCharCode` with `fromCharCode.apply(null, array)`.

To avoid allocating a fresh array every time, hold a stock of arrays for all string lengths that this path can require, and reuse them.

This is a variation on the approach that #20883 took, but without the massive switch. This produces much tighter assembly, and avoids regressing the fast path due making `deserializeStr` a very large function.

Despite the complexity, and multiple operations, [this is up to 3x faster](https://github.com/overlookmotel/oxc-raw-str-bench/blob/4f96275efa9a35d5d27615abb27f21a137149cc0/README.md#apply30-vs-switch30) than the switch approach, and gives an average 30% speed-up.

### Increase native call threshold

The above optimizations make the slow path much faster. This shifts the tipping point at which it's faster to make a native call to `TextDecoder.decode` from 9 bytes to 64 bytes. Most strings now avoid the native call and stay in JS code which is heavily optimized by Turbofan.

The tipping point of 64 is something of a guesstimate. Benchmarking shows its in the right ballpark, but we could finesse it, and probably squeeze out another couple of %.

## Credit

The Latin1 string technique was cooked up by @joshuaisaact in overlookmotel/oxc-raw-str-bench#1. All credit to him for this masterstroke which cracks the whole problem!
@graphite-app graphite-app bot force-pushed the om/04-03-perf_napi_parser_linter_plugins_initialize_vars_as_0 branch from 2710dc5 to 55e1e9b Compare April 3, 2026 22:06
@graphite-app graphite-app bot force-pushed the om/04-03-perf_napi_parser_linter_plugins_speed_up_decoding_strings_in_raw_transfer branch from 683d3b3 to 012c924 Compare April 3, 2026 22:07
graphite-app bot pushed a commit that referenced this pull request Apr 3, 2026
…lier (#21025)

There's no need to defer clearing vars containing buffers and source text in raw transfer deserializer module until end of linting. It was a left-over from when this module contained code which lazily deserialized comments, but that's now done elsewhere.

Clear these vars as soon as deserialization is complete. Buffer and `sourceText` are still held in vars in other modules until linting completes, but `sourceTextLatin` (introduced in #21021) can be freed immediately.
Base automatically changed from om/04-03-perf_napi_parser_linter_plugins_initialize_vars_as_0 to main April 3, 2026 22:14
@graphite-app graphite-app bot merged commit 012c924 into main Apr 3, 2026
25 checks passed
@graphite-app graphite-app bot deleted the om/04-03-perf_napi_parser_linter_plugins_speed_up_decoding_strings_in_raw_transfer branch April 3, 2026 22:15
leaysgur pushed a commit that referenced this pull request Apr 7, 2026
### 🐛 Bug Fixes

- fc7f60c allocator: Revert changes to
`get_current_chunk_footer_field_offset` (#20964) (overlookmotel)
- 31316c8 semantic: Rebind class expressions before identifier checks
(#20916) (camc314)

### ⚡ Performance

- fb52383 napi/parser, linter/plugins: Clear buffers and source texts
earlier (#21025) (overlookmotel)
- 3b7dec4 napi/parser, linter/plugins: Use `utf8Slice` for decoding
UTF-8 strings (#21022) (overlookmotel)
- 012c924 napi/parser, linter/plugins: Speed up decoding strings in raw
transfer (#21021) (overlookmotel)
- 55e1e9b napi/parser, linter/plugins: Initialize vars as 0 (#21020)
(overlookmotel)
- c25ef02 napi/parser, linter/plugins: Simplify branch condition in
`deserializeStr` (#21019) (overlookmotel)
- 9f494c3 napi/parser, linter/plugins: Raw transfer use
`String.fromCharCode` in string decoding (#21018) (overlookmotel)
- 91cf105 allocator: Increase initial chunk size from 512B to 16KB
(#20968) (overlookmotel)
- cbc0c21 allocator: Add `#[cold]` to to error handling functions
(#20967) (overlookmotel)
- 0503a78 napi/parser, linter/plugins: Faster deserialization of `raw`
fields (#20923) (overlookmotel)
- a24f75e napi/parser: Optimize string deserialization for non-ASCII
sources (#20834) (Joshua Tuddenham)

### 📚 Documentation

- c78a57a syntax: Fix typo (#21044) (camc314)
- f5e228f allocator: Fix typo in comment (#20972) (overlookmotel)
- 7159d51 allocator: Improve doc comment examples for `vec2::Vec`
(#20969) (overlookmotel)
- b1da750 allocator, data_structures: Correct comments (#20966)
(overlookmotel)

Co-authored-by: Boshen <1430279+Boshen@users.noreply.github.com>
leaysgur pushed a commit that referenced this pull request Apr 7, 2026
# Oxlint
### 💥 BREAKING CHANGES

- 22ce6af oxlint/lsp: [**BREAKING**] Show/fix safe suggestions by
default (#19816) (Sysix)

### 🚀 Features

- 7a7b7b8 oxlint/lsp: Add source.fixAllDangerous.oxc code action kind
(#20526) (bab)
- 9cfe57e linter/unicorn: Implement prefer-import-meta-properties rule
(#20662) (Irfan - ئىرفان)
- 1edb391 linter/eslint: Implement `no-restricted-exports` rule (#20592)
(Nicolas Le Cam)
- 0f12bcd linter/react: Implement `hook-use-state` rule (#20986) (Khaled
Labeb)
- 1513a9f oxlint/lsp: Show note field for lsp diagnostic (#20983)
(Sysix)
- 7fdf722 linter/unicorn: Implement `no-useless-iterator-to-array` rule
(#20945) (Mikhail Baev)
- 39c8f2c linter/jest: Implement padding-around-after-all-blocks
(#21034) (Sapphire)
- ac39e51 linter/eslint-vitest-plugin: Prefer importing vitest globals
(#20960) (Said Atrahouch)
- 0b84de1 oxlint: Support allow option for prefer-promise-reject-errors
(#20934) (camc314)
- 23db851 linter/consistent-return: Move rule from nursery to suspicious
(#20920) (camc314)
- 9a27e32 linter/no-unnecessary-type-conversion: Move rule from nursery
to suspicious (#20919) (camc314)
- 1ca7b58 linter/dot-notation: Move rule from nursery to style (#20918)
(camc314)
- 73ba81a linter/consistent-type-exports: Move rule from nursery to
style (#20917) (camc314)
- b9199b1 linter/unicorn: Implement switch-case-break-position (#20872)
(Mikhail Baev)
- 3435ff8 linter: Implements `prefer-snapshot-hint` rule in Jest and
Vitest (#20870) (Said Atrahouch)
- 98510d2 linter: Implement react/prefer-function-component (#19652)
(Connor Shea)
- 871f9d9 linter: Implement no-useless-assignment (#15466) (Zhaoting
Zhou)
- 0f01fbd linter: Implement eslint/object-shorthand (#17688) (yue)

### 🐛 Bug Fixes

- dd2df87 npm: Export package.json for oxlint and oxfmt (#20784) (kazuya
kawaguchi)
- 9bc77dd linter/no-unused-private-class-members: False positive with
await expr (#21067) (camc314)
- 60a57cd linter/const-comparisons: Detect equality contradictions
(#21065) (camc314)
- 2bb2be2 linter/no-array-index-key: False positive when index is passed
as function argument (#21012) (bab)
- 6492953 linter/no-this-in-sfc: Only flag `this` used as member
expression object (#20961) (bab)
- 9446dcc oxlint/lsp: Skip `node_modules` in oxlint config walker
(#21004) (copilot-swe-agent)
- af89923 linter/no-namespace: Support glob pattern matching against
basename (#21031) (bab)
- 64a1a7e oxlint: Don't search for nested config outside base config
(#21051) (Sysix)
- 3b953bc linter/button-has-type: Ignore `document.createElement` calls
(#21008) (Said Atrahouch)
- 8c36070 linter/unicorn: Add support for `Array.from()` for
`prefer-set-size` rule (#21016) (Mikhail Baev)
- c1a48f0 linter: Detect vitest import from vite-plus/test (#20976)
(Said Atrahouch)
- 5c32fd1 lsp: Prevent corrupted autofix output from overlapping text
edits (#19793) (Peter Wagenet)
- ca79960 linter/no-array-index-key: Move span to `key` property
(#20947) (camc314)
- 2098274 linter: Add suggestion for `jest/prefer-equality-matcher`
(#20925) (eryue0220)
- 6eb77ec linter: Allow default-import barrels in import/named (#20757)
(Bazyli Brzóska)
- 9c218ef linter/eslint-vitest-plugin: Remove pending fix status for
require-local-test-context-for-concurrent-snapshot (#20890) (Said
Atrahouch)

### ⚡ Performance

- fb52383 napi/parser, linter/plugins: Clear buffers and source texts
earlier (#21025) (overlookmotel)
- 3b7dec4 napi/parser, linter/plugins: Use `utf8Slice` for decoding
UTF-8 strings (#21022) (overlookmotel)
- 012c924 napi/parser, linter/plugins: Speed up decoding strings in raw
transfer (#21021) (overlookmotel)
- 55e1e9b napi/parser, linter/plugins: Initialize vars as 0 (#21020)
(overlookmotel)
- c25ef02 napi/parser, linter/plugins: Simplify branch condition in
`deserializeStr` (#21019) (overlookmotel)
- 9f494c3 napi/parser, linter/plugins: Raw transfer use
`String.fromCharCode` in string decoding (#21018) (overlookmotel)
- 0503a78 napi/parser, linter/plugins: Faster deserialization of `raw`
fields (#20923) (overlookmotel)
- a24f75e napi/parser: Optimize string deserialization for non-ASCII
sources (#20834) (Joshua Tuddenham)

### 📚 Documentation

- af72b80 oxlint: Fix typo for --tsconfig (#20889) (leaysgur)
- 70c53b1 linter: Highlight that tsconfig is not respected in type aware
linting (#20884) (camc314)
# Oxfmt
### 🚀 Features

- 35cf6e8 oxfmt: Add node version hint for ts config import failures
(#21046) (camc314)

### 🐛 Bug Fixes

- dd2df87 npm: Export package.json for oxlint and oxfmt (#20784) (kazuya
kawaguchi)
- 9d45511 oxfmt: Propagate file write errors instead of panicking
(#20997) (leaysgur)
- 139ddd9 formatter: Handle leading comment after array elision (#20987)
(leaysgur)
- 4216380 oxfmt: Support `.editorconfig` `tab_width` fallback (#20988)
(leaysgur)
- d10df39 formatter: Resolve pending space in fits measurer before
expanded-mode early exit (#20954) (Dunqing)
- f9ef1bd formatter: Avoid breaking after `=>` when arrow body has JSDoc
type cast (#20857) (bab)

Co-authored-by: Boshen <1430279+Boshen@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-ast-tools Area - AST tools A-cli Area - CLI A-linter Area - Linter A-linter-plugins Area - Linter JS plugins A-parser Area - Parser C-performance Category - Solution not expected to change functional behavior, only performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants