From 012c924a046b2dc14c0f381b188743fc24226b79 Mon Sep 17 00:00:00 2001 From: overlookmotel <557937+overlookmotel@users.noreply.github.com> Date: Fri, 3 Apr 2026 22:05:21 +0000 Subject: [PATCH] perf(napi/parser, linter/plugins): speed up decoding strings in raw transfer (#21021) Improve perf of deserializing strings in raw transfer. This PR combines several optimizations, which have been tested and benchmarked in https://github.com/overlookmotel/oxc-raw-str-bench. This PR implements the version "latin-slice-onebyte64" from that repo, which is the current winner. String deserialization is the main bottleneck in raw transfer, so speeding it up will likely make a large impact on deserialization overall. This work follows on from #20834 which produced a major speed-up in many files by making files which contain some non-ASCII characters take the fast path of slicing `sourceText` more often. This PR tackles the remainder - speeding up the fallback path where the fast path can't be taken. ## Optimizations The optimizations in this PR are: ### Latin1 When source is not 100% ASCII, decode source text from buffer as Latin1. A Latin1-decoded string represents each UTF-8 byte as a single Latin1 character, so it can be indexed into using UTF-8 offsets. So when we can't slice the string from `sourceText` because the UTF-8 and UTF-16 offsets differ (after any non-ASCII character), loop through the string's bytes and check if they're all ASCII. If they are, the string can be sliced from `sourceTextLatin` instead, with the original UTF-8 offsets. This is way faster than calling `textDecoder.decode`, as it avoids a call into C++. [Benchmarks show](https://github.com/overlookmotel/oxc-raw-str-bench/blob/4f96275efa9a35d5d27615abb27f21a137149cc0/README.md#apply30-vs-latin-vs-latin-source64) speed up of 55% on average, and up to 70% on some files. ### Latin1 decoding method It turns out that `new TextDecoder("latin1").decode(arr)` doesn't actually decode to Latin1! Per the WHATWG Encoding Standard, "latin1" is mapped to "windows-1252". The result is that with `TextDecoder("latin1")`: 1. `decode` is quite complicated, requiring a 2-pass scan of the bytes to determine if they're all ASCII, followed by a 2nd pass to do the actual `windows-1252` decoding. If the string *does* contain any non-ASCII characters (which it always does in our usecase), NodeJS implements the decoding in JS, not native code. Slow. 2. `decode` produces a 2-byte-per-char string (`TWO_BYTE` in V8), which takes more memory, and is slower for all operations on it e.g. string comparison, hashing for use as an object key etc. Instead, use `Buffer.prototype.latin1Slice` which: 1. Does a pure Latin1 decode, which is just a single `memcpy` call. 2. Produces a 1-byte-per-char string (`ONE_BYTE` in V8). `latin1Slice` involves a call into C++, but we only do it once per file, so this cost is tiny in context of deserializing the whole AST. ### Latin1 string slicing In the fast path, slice from the Latin1-decoded string, instead of `sourceText`. In the fast path, we know that all bytes of source comprising the string are ASCII, so no further checks are required. This makes no difference on benchmarks for `deserializeStr` itself, but it may have beneficial effects downstream for code (e.g. lint rules) which access strings in the AST, e.g. `Identifier` names. Because Latin1-decoded source text is `ONE_BYTE`-encoded, slices of it are too. In comparison, slices of `sourceText` may be `ONE_BYTE` or `TWO_BYTE`. If a file's source is pure ASCII, it'll be `ONE_BYTE`, if source contains any non-ASCII characters, it'll be `TWO_BYTE`. Files in a repo will likely be a mix of both, which makes strings returned from `deserializeStr` and placed in the AST a mix too. This in turn makes functions (e.g. lint rule visitors) polymorphic. V8 cannot optimize them as aggressively as if they see only `ONE_BYTE` strings. We cannot make sure that all strings returned by `deserializeStr` are `ONE_BYTE`. Some string may contain non-ASCII characters, and they *have* to be represented in `TWO_BYTE` form. But we can minimize it - now only strings which *themselves* contain non-ASCII characters are `TWO_BYTE`, whereas before they would be if the source text as a whole contains a single non-ASCII byte. Code which accesses `Identifier` names, for example will exclusively see `ONE_BYTE` strings and will be more heavily optimized, because Unicode `Identifier`s are rarer than hen's teeth in real-world code. ### Remove string-concatenation loop Previously strings which are outside of source text were assembled byte-by-byte in a loop via concatenation. Instead, check that all the bytes are ASCII first, copy them into an array and pass that array to `String.fromCharCode` with `fromCharCode.apply(null, array)`. To avoid allocating a fresh array every time, hold a stock of arrays for all string lengths that this path can require, and reuse them. This is a variation on the approach that #20883 took, but without the massive switch. This produces much tighter assembly, and avoids regressing the fast path due making `deserializeStr` a very large function. Despite the complexity, and multiple operations, [this is up to 3x faster](https://github.com/overlookmotel/oxc-raw-str-bench/blob/4f96275efa9a35d5d27615abb27f21a137149cc0/README.md#apply30-vs-switch30) than the switch approach, and gives an average 30% speed-up. ### Increase native call threshold The above optimizations make the slow path much faster. This shifts the tipping point at which it's faster to make a native call to `TextDecoder.decode` from 9 bytes to 64 bytes. Most strings now avoid the native call and stay in JS code which is heavily optimized by Turbofan. The tipping point of 64 is something of a guesstimate. Benchmarking shows its in the right ballpark, but we could finesse it, and probably squeeze out another couple of %. ## Credit The Latin1 string technique was cooked up by @joshuaisaact in https://github.com/overlookmotel/oxc-raw-str-bench/pull/1. All credit to him for this masterstroke which cracks the whole problem! --- apps/oxlint/src-js/generated/deserialize.js | 88 ++++++------- .../parser/src-js/generated/deserialize/js.js | 55 +++++--- .../src-js/generated/deserialize/js_parent.js | 55 +++++--- .../src-js/generated/deserialize/js_range.js | 55 +++++--- .../generated/deserialize/js_range_parent.js | 55 +++++--- .../parser/src-js/generated/deserialize/ts.js | 55 +++++--- .../src-js/generated/deserialize/ts_parent.js | 55 +++++--- .../src-js/generated/deserialize/ts_range.js | 55 +++++--- .../generated/deserialize/ts_range_parent.js | 55 +++++--- .../ast_tools/src/generators/raw_transfer.rs | 122 ++++++++++++------ 10 files changed, 394 insertions(+), 256 deletions(-) diff --git a/apps/oxlint/src-js/generated/deserialize.js b/apps/oxlint/src-js/generated/deserialize.js index 716ce0875af5d..d7def457b7059 100644 --- a/apps/oxlint/src-js/generated/deserialize.js +++ b/apps/oxlint/src-js/generated/deserialize.js @@ -8,6 +8,7 @@ let uint8, uint32, float64, sourceText, + sourceTextLatin, sourceStartPos = 0, firstNonAsciiPos = 0, parent = null, @@ -16,14 +17,18 @@ let uint8, const textDecoder = new TextDecoder("utf-8", { ignoreBOM: true }), decodeStr = textDecoder.decode.bind(textDecoder), { fromCharCode } = String, - NodeProto = Object.create(Object.prototype, { - loc: { - get() { - return getLoc(this); - }, - enumerable: true, + { latin1Slice } = Buffer.prototype, + stringDecodeArrays = Array(65).fill(null); +for (let i = 0; i <= 64; i++) stringDecodeArrays[i] = Array(i).fill(0); + +const NodeProto = Object.create(Object.prototype, { + loc: { + get() { + return getLoc(this); }, - }); + enumerable: true, + }, +}); export function deserializeProgramOnly( buffer, @@ -41,20 +46,23 @@ function deserializeWith(buffer, sourceTextInput, sourceByteLen, getLocInput, de uint32 = buffer.uint32; float64 = buffer.float64; sourceText = sourceTextInput; - if (sourceText.length === sourceByteLen) firstNonAsciiPos = sourceStartPos + sourceByteLen; - else { + if (sourceText.length === sourceByteLen) { + firstNonAsciiPos = sourceStartPos + sourceByteLen; + sourceTextLatin = sourceText; + } else { let i = sourceStartPos, sourceEndPos = sourceStartPos + sourceByteLen; for (; i < sourceEndPos && uint8[i] < 128; i++); firstNonAsciiPos = i; + sourceTextLatin = latin1Slice.call(uint8, sourceStartPos, sourceEndPos); } getLoc = getLocInput; return deserialize(uint32[536870900]); } export function resetBuffer() { - // Clear buffer and source text string to allow them to be garbage collected - uint8 = uint32 = float64 = sourceText = void 0; + // Clear buffer and source text strings to allow them to be garbage collected + uint8 = uint32 = float64 = sourceText = sourceTextLatin = void 0; } function deserializeProgram(pos) { @@ -5880,40 +5888,30 @@ function deserializeStr(pos) { len = uint32[pos32 + 2]; if (len === 0) return ""; pos = uint32[pos32]; - let end = pos + len; - // Note: Tried reducing this check to a single branch by making the comparison the equivalent of this Rust: - // `end.wrapping_sub(sourceStartPos) <= firstNonAsciiOffset`. - // - // The JS versions tried were: - // - `((end - sourceStartPos) >>> 0) <= firstNonAsciiOffset` - // - `((end - sourceStartPos) & 0x7FFF_FFFF) <= firstNonAsciiOffset` - // But it turned out that these are both slower by 5-10% on files which are all ASCII. - // - // `>>>` is slower as V8 can't assume result fits in an SMI (which is a 32-bit *signed* integer), - // as result could be greater or equal to `2 ** 31`. So it converts both the comparison's operands to `float64`s - // and does float compare (which is slower than integer compare). - // - // `& 0x7FFF_FFFF` is slower as it has a longer chain of data dependencies than the 2 independent - // branch comparisons. - // - // Both branches are very predictable, so 2 branches wins. - if (pos >= sourceStartPos && end <= firstNonAsciiPos) - return sourceText.substr(pos - sourceStartPos, len); - // Use `TextDecoder` for strings longer than 9 bytes. - // For shorter strings, the byte-by-byte loop below avoids native call overhead. - if (len > 9) return decodeStr(uint8.subarray(pos, end)); - // Shorter strings decode by hand to avoid native call - let out = "", - c; - do { - c = uint8[pos++]; - if (c < 128) out += fromCharCode(c); - else { - out += decodeStr(uint8.subarray(pos - 1, end)); - break; - } - } while (pos < end); - return out; + let end = pos + len, + isInSourceRegion = pos >= sourceStartPos; + if (isInSourceRegion && end <= firstNonAsciiPos) + return sourceTextLatin.substr(pos - sourceStartPos, len); + // Use `TextDecoder` for strings longer than 64 bytes + if (len > 64) return decodeStr(uint8.subarray(pos, end)); + // If string is in source region, use slice of `sourceTextLatin` if all ASCII + if (isInSourceRegion) { + // Check if all bytes are ASCII, use `TextDecoder` if not + for (let i = pos; i < end; i++) if (uint8[i] >= 128) return decodeStr(uint8.subarray(pos, end)); + // String is all ASCII, so slice from `sourceTextLatin` + return sourceTextLatin.substr(pos - sourceStartPos, len); + } + // String is not in source region - use `fromCharCode.apply` with a temp array of correct length. + // Copy bytes into temp array. + // If any byte is non-ASCII, use `TextDecoder`. + let arr = stringDecodeArrays[len]; + for (let i = 0; i < len; i++) { + let b = uint8[pos + i]; + if (b >= 128) return decodeStr(uint8.subarray(pos, end)); + arr[i] = b; + } + // Call `fromCharCode` with temp array + return fromCharCode.apply(null, arr); } function deserializeVecDirective(pos) { diff --git a/napi/parser/src-js/generated/deserialize/js.js b/napi/parser/src-js/generated/deserialize/js.js index bfad0fa40a8b8..fe652b4b4cdf7 100644 --- a/napi/parser/src-js/generated/deserialize/js.js +++ b/napi/parser/src-js/generated/deserialize/js.js @@ -5,13 +5,19 @@ let uint8, uint32, float64, sourceText, + sourceTextLatin, + sourceEndPos = 0, firstNonAsciiPos = 0; const textDecoder = new TextDecoder("utf-8", { ignoreBOM: true }), decodeStr = textDecoder.decode.bind(textDecoder), - { fromCharCode } = String; + { fromCharCode } = String, + { latin1Slice } = Buffer.prototype, + stringDecodeArrays = Array(65).fill(null); +for (let i = 0; i <= 64; i++) stringDecodeArrays[i] = Array(i).fill(0); export function deserialize(buffer, sourceText, sourceByteLen) { + sourceEndPos = sourceByteLen; let data = deserializeWith(buffer, sourceText, sourceByteLen, null, deserializeRawTransferData); resetBuffer(); return data; @@ -22,18 +28,21 @@ function deserializeWith(buffer, sourceTextInput, sourceByteLen, getLocInput, de uint32 = buffer.uint32; float64 = buffer.float64; sourceText = sourceTextInput; - if (sourceText.length === sourceByteLen) firstNonAsciiPos = sourceByteLen; - else { + if (sourceText.length === sourceByteLen) { + firstNonAsciiPos = sourceByteLen; + sourceTextLatin = sourceText; + } else { let i = 0; for (; i < sourceByteLen && uint8[i] < 128; i++); firstNonAsciiPos = i; + sourceTextLatin = latin1Slice.call(uint8, 0, sourceByteLen); } return deserialize(uint32[536870900]); } export function resetBuffer() { - // Clear buffer and source text string to allow them to be garbage collected - uint8 = uint32 = float64 = sourceText = void 0; + // Clear buffer and source text strings to allow them to be garbage collected + uint8 = uint32 = float64 = sourceText = sourceTextLatin = void 0; } function deserializeProgram(pos) { @@ -4547,22 +4556,26 @@ function deserializeStr(pos) { if (len === 0) return ""; pos = uint32[pos32]; let end = pos + len; - if (end <= firstNonAsciiPos) return sourceText.substr(pos, len); - // Use `TextDecoder` for strings longer than 9 bytes. - // For shorter strings, the byte-by-byte loop below avoids native call overhead. - if (len > 9) return decodeStr(uint8.subarray(pos, end)); - // Shorter strings decode by hand to avoid native call - let out = "", - c; - do { - c = uint8[pos++]; - if (c < 128) out += fromCharCode(c); - else { - out += decodeStr(uint8.subarray(pos - 1, end)); - break; - } - } while (pos < end); - return out; + if (end <= firstNonAsciiPos) return sourceTextLatin.substr(pos, len); + // Use `TextDecoder` for strings longer than 64 bytes + if (len > 64) return decodeStr(uint8.subarray(pos, end)); + if (pos < sourceEndPos) { + // Check if all bytes are ASCII, use `TextDecoder` if not + for (let i = pos; i < end; i++) if (uint8[i] >= 128) return decodeStr(uint8.subarray(pos, end)); + // String is all ASCII, so slice from `sourceTextLatin` + return sourceTextLatin.substr(pos, len); + } + // String is not in source region - use `fromCharCode.apply` with a temp array of correct length. + // Copy bytes into temp array. + // If any byte is non-ASCII, use `TextDecoder`. + let arr = stringDecodeArrays[len]; + for (let i = 0; i < len; i++) { + let b = uint8[pos + i]; + if (b >= 128) return decodeStr(uint8.subarray(pos, end)); + arr[i] = b; + } + // Call `fromCharCode` with temp array + return fromCharCode.apply(null, arr); } function deserializeVecComment(pos) { diff --git a/napi/parser/src-js/generated/deserialize/js_parent.js b/napi/parser/src-js/generated/deserialize/js_parent.js index 71f5298cf556b..7f9298492f290 100644 --- a/napi/parser/src-js/generated/deserialize/js_parent.js +++ b/napi/parser/src-js/generated/deserialize/js_parent.js @@ -5,14 +5,20 @@ let uint8, uint32, float64, sourceText, + sourceTextLatin, + sourceEndPos = 0, firstNonAsciiPos = 0, parent = null; const textDecoder = new TextDecoder("utf-8", { ignoreBOM: true }), decodeStr = textDecoder.decode.bind(textDecoder), - { fromCharCode } = String; + { fromCharCode } = String, + { latin1Slice } = Buffer.prototype, + stringDecodeArrays = Array(65).fill(null); +for (let i = 0; i <= 64; i++) stringDecodeArrays[i] = Array(i).fill(0); export function deserialize(buffer, sourceText, sourceByteLen) { + sourceEndPos = sourceByteLen; let data = deserializeWith(buffer, sourceText, sourceByteLen, null, deserializeRawTransferData); resetBuffer(); return data; @@ -23,18 +29,21 @@ function deserializeWith(buffer, sourceTextInput, sourceByteLen, getLocInput, de uint32 = buffer.uint32; float64 = buffer.float64; sourceText = sourceTextInput; - if (sourceText.length === sourceByteLen) firstNonAsciiPos = sourceByteLen; - else { + if (sourceText.length === sourceByteLen) { + firstNonAsciiPos = sourceByteLen; + sourceTextLatin = sourceText; + } else { let i = 0; for (; i < sourceByteLen && uint8[i] < 128; i++); firstNonAsciiPos = i; + sourceTextLatin = latin1Slice.call(uint8, 0, sourceByteLen); } return deserialize(uint32[536870900]); } export function resetBuffer() { - // Clear buffer and source text string to allow them to be garbage collected - uint8 = uint32 = float64 = sourceText = void 0; + // Clear buffer and source text strings to allow them to be garbage collected + uint8 = uint32 = float64 = sourceText = sourceTextLatin = void 0; } function deserializeProgram(pos) { @@ -5078,22 +5087,26 @@ function deserializeStr(pos) { if (len === 0) return ""; pos = uint32[pos32]; let end = pos + len; - if (end <= firstNonAsciiPos) return sourceText.substr(pos, len); - // Use `TextDecoder` for strings longer than 9 bytes. - // For shorter strings, the byte-by-byte loop below avoids native call overhead. - if (len > 9) return decodeStr(uint8.subarray(pos, end)); - // Shorter strings decode by hand to avoid native call - let out = "", - c; - do { - c = uint8[pos++]; - if (c < 128) out += fromCharCode(c); - else { - out += decodeStr(uint8.subarray(pos - 1, end)); - break; - } - } while (pos < end); - return out; + if (end <= firstNonAsciiPos) return sourceTextLatin.substr(pos, len); + // Use `TextDecoder` for strings longer than 64 bytes + if (len > 64) return decodeStr(uint8.subarray(pos, end)); + if (pos < sourceEndPos) { + // Check if all bytes are ASCII, use `TextDecoder` if not + for (let i = pos; i < end; i++) if (uint8[i] >= 128) return decodeStr(uint8.subarray(pos, end)); + // String is all ASCII, so slice from `sourceTextLatin` + return sourceTextLatin.substr(pos, len); + } + // String is not in source region - use `fromCharCode.apply` with a temp array of correct length. + // Copy bytes into temp array. + // If any byte is non-ASCII, use `TextDecoder`. + let arr = stringDecodeArrays[len]; + for (let i = 0; i < len; i++) { + let b = uint8[pos + i]; + if (b >= 128) return decodeStr(uint8.subarray(pos, end)); + arr[i] = b; + } + // Call `fromCharCode` with temp array + return fromCharCode.apply(null, arr); } function deserializeVecComment(pos) { diff --git a/napi/parser/src-js/generated/deserialize/js_range.js b/napi/parser/src-js/generated/deserialize/js_range.js index 5c2cf2be3bf29..fe74dc7d4bcd3 100644 --- a/napi/parser/src-js/generated/deserialize/js_range.js +++ b/napi/parser/src-js/generated/deserialize/js_range.js @@ -5,13 +5,19 @@ let uint8, uint32, float64, sourceText, + sourceTextLatin, + sourceEndPos = 0, firstNonAsciiPos = 0; const textDecoder = new TextDecoder("utf-8", { ignoreBOM: true }), decodeStr = textDecoder.decode.bind(textDecoder), - { fromCharCode } = String; + { fromCharCode } = String, + { latin1Slice } = Buffer.prototype, + stringDecodeArrays = Array(65).fill(null); +for (let i = 0; i <= 64; i++) stringDecodeArrays[i] = Array(i).fill(0); export function deserialize(buffer, sourceText, sourceByteLen) { + sourceEndPos = sourceByteLen; let data = deserializeWith(buffer, sourceText, sourceByteLen, null, deserializeRawTransferData); resetBuffer(); return data; @@ -22,18 +28,21 @@ function deserializeWith(buffer, sourceTextInput, sourceByteLen, getLocInput, de uint32 = buffer.uint32; float64 = buffer.float64; sourceText = sourceTextInput; - if (sourceText.length === sourceByteLen) firstNonAsciiPos = sourceByteLen; - else { + if (sourceText.length === sourceByteLen) { + firstNonAsciiPos = sourceByteLen; + sourceTextLatin = sourceText; + } else { let i = 0; for (; i < sourceByteLen && uint8[i] < 128; i++); firstNonAsciiPos = i; + sourceTextLatin = latin1Slice.call(uint8, 0, sourceByteLen); } return deserialize(uint32[536870900]); } export function resetBuffer() { - // Clear buffer and source text string to allow them to be garbage collected - uint8 = uint32 = float64 = sourceText = void 0; + // Clear buffer and source text strings to allow them to be garbage collected + uint8 = uint32 = float64 = sourceText = sourceTextLatin = void 0; } function deserializeProgram(pos) { @@ -5089,22 +5098,26 @@ function deserializeStr(pos) { if (len === 0) return ""; pos = uint32[pos32]; let end = pos + len; - if (end <= firstNonAsciiPos) return sourceText.substr(pos, len); - // Use `TextDecoder` for strings longer than 9 bytes. - // For shorter strings, the byte-by-byte loop below avoids native call overhead. - if (len > 9) return decodeStr(uint8.subarray(pos, end)); - // Shorter strings decode by hand to avoid native call - let out = "", - c; - do { - c = uint8[pos++]; - if (c < 128) out += fromCharCode(c); - else { - out += decodeStr(uint8.subarray(pos - 1, end)); - break; - } - } while (pos < end); - return out; + if (end <= firstNonAsciiPos) return sourceTextLatin.substr(pos, len); + // Use `TextDecoder` for strings longer than 64 bytes + if (len > 64) return decodeStr(uint8.subarray(pos, end)); + if (pos < sourceEndPos) { + // Check if all bytes are ASCII, use `TextDecoder` if not + for (let i = pos; i < end; i++) if (uint8[i] >= 128) return decodeStr(uint8.subarray(pos, end)); + // String is all ASCII, so slice from `sourceTextLatin` + return sourceTextLatin.substr(pos, len); + } + // String is not in source region - use `fromCharCode.apply` with a temp array of correct length. + // Copy bytes into temp array. + // If any byte is non-ASCII, use `TextDecoder`. + let arr = stringDecodeArrays[len]; + for (let i = 0; i < len; i++) { + let b = uint8[pos + i]; + if (b >= 128) return decodeStr(uint8.subarray(pos, end)); + arr[i] = b; + } + // Call `fromCharCode` with temp array + return fromCharCode.apply(null, arr); } function deserializeVecComment(pos) { diff --git a/napi/parser/src-js/generated/deserialize/js_range_parent.js b/napi/parser/src-js/generated/deserialize/js_range_parent.js index 379ef21801b77..932221b0a50ef 100644 --- a/napi/parser/src-js/generated/deserialize/js_range_parent.js +++ b/napi/parser/src-js/generated/deserialize/js_range_parent.js @@ -5,14 +5,20 @@ let uint8, uint32, float64, sourceText, + sourceTextLatin, + sourceEndPos = 0, firstNonAsciiPos = 0, parent = null; const textDecoder = new TextDecoder("utf-8", { ignoreBOM: true }), decodeStr = textDecoder.decode.bind(textDecoder), - { fromCharCode } = String; + { fromCharCode } = String, + { latin1Slice } = Buffer.prototype, + stringDecodeArrays = Array(65).fill(null); +for (let i = 0; i <= 64; i++) stringDecodeArrays[i] = Array(i).fill(0); export function deserialize(buffer, sourceText, sourceByteLen) { + sourceEndPos = sourceByteLen; let data = deserializeWith(buffer, sourceText, sourceByteLen, null, deserializeRawTransferData); resetBuffer(); return data; @@ -23,18 +29,21 @@ function deserializeWith(buffer, sourceTextInput, sourceByteLen, getLocInput, de uint32 = buffer.uint32; float64 = buffer.float64; sourceText = sourceTextInput; - if (sourceText.length === sourceByteLen) firstNonAsciiPos = sourceByteLen; - else { + if (sourceText.length === sourceByteLen) { + firstNonAsciiPos = sourceByteLen; + sourceTextLatin = sourceText; + } else { let i = 0; for (; i < sourceByteLen && uint8[i] < 128; i++); firstNonAsciiPos = i; + sourceTextLatin = latin1Slice.call(uint8, 0, sourceByteLen); } return deserialize(uint32[536870900]); } export function resetBuffer() { - // Clear buffer and source text string to allow them to be garbage collected - uint8 = uint32 = float64 = sourceText = void 0; + // Clear buffer and source text strings to allow them to be garbage collected + uint8 = uint32 = float64 = sourceText = sourceTextLatin = void 0; } function deserializeProgram(pos) { @@ -5623,22 +5632,26 @@ function deserializeStr(pos) { if (len === 0) return ""; pos = uint32[pos32]; let end = pos + len; - if (end <= firstNonAsciiPos) return sourceText.substr(pos, len); - // Use `TextDecoder` for strings longer than 9 bytes. - // For shorter strings, the byte-by-byte loop below avoids native call overhead. - if (len > 9) return decodeStr(uint8.subarray(pos, end)); - // Shorter strings decode by hand to avoid native call - let out = "", - c; - do { - c = uint8[pos++]; - if (c < 128) out += fromCharCode(c); - else { - out += decodeStr(uint8.subarray(pos - 1, end)); - break; - } - } while (pos < end); - return out; + if (end <= firstNonAsciiPos) return sourceTextLatin.substr(pos, len); + // Use `TextDecoder` for strings longer than 64 bytes + if (len > 64) return decodeStr(uint8.subarray(pos, end)); + if (pos < sourceEndPos) { + // Check if all bytes are ASCII, use `TextDecoder` if not + for (let i = pos; i < end; i++) if (uint8[i] >= 128) return decodeStr(uint8.subarray(pos, end)); + // String is all ASCII, so slice from `sourceTextLatin` + return sourceTextLatin.substr(pos, len); + } + // String is not in source region - use `fromCharCode.apply` with a temp array of correct length. + // Copy bytes into temp array. + // If any byte is non-ASCII, use `TextDecoder`. + let arr = stringDecodeArrays[len]; + for (let i = 0; i < len; i++) { + let b = uint8[pos + i]; + if (b >= 128) return decodeStr(uint8.subarray(pos, end)); + arr[i] = b; + } + // Call `fromCharCode` with temp array + return fromCharCode.apply(null, arr); } function deserializeVecComment(pos) { diff --git a/napi/parser/src-js/generated/deserialize/ts.js b/napi/parser/src-js/generated/deserialize/ts.js index b5bf97cd804b2..2616fb217a603 100644 --- a/napi/parser/src-js/generated/deserialize/ts.js +++ b/napi/parser/src-js/generated/deserialize/ts.js @@ -5,13 +5,19 @@ let uint8, uint32, float64, sourceText, + sourceTextLatin, + sourceEndPos = 0, firstNonAsciiPos = 0; const textDecoder = new TextDecoder("utf-8", { ignoreBOM: true }), decodeStr = textDecoder.decode.bind(textDecoder), - { fromCharCode } = String; + { fromCharCode } = String, + { latin1Slice } = Buffer.prototype, + stringDecodeArrays = Array(65).fill(null); +for (let i = 0; i <= 64; i++) stringDecodeArrays[i] = Array(i).fill(0); export function deserialize(buffer, sourceText, sourceByteLen) { + sourceEndPos = sourceByteLen; let data = deserializeWith(buffer, sourceText, sourceByteLen, null, deserializeRawTransferData); resetBuffer(); return data; @@ -22,18 +28,21 @@ function deserializeWith(buffer, sourceTextInput, sourceByteLen, getLocInput, de uint32 = buffer.uint32; float64 = buffer.float64; sourceText = sourceTextInput; - if (sourceText.length === sourceByteLen) firstNonAsciiPos = sourceByteLen; - else { + if (sourceText.length === sourceByteLen) { + firstNonAsciiPos = sourceByteLen; + sourceTextLatin = sourceText; + } else { let i = 0; for (; i < sourceByteLen && uint8[i] < 128; i++); firstNonAsciiPos = i; + sourceTextLatin = latin1Slice.call(uint8, 0, sourceByteLen); } return deserialize(uint32[536870900]); } export function resetBuffer() { - // Clear buffer and source text string to allow them to be garbage collected - uint8 = uint32 = float64 = sourceText = void 0; + // Clear buffer and source text strings to allow them to be garbage collected + uint8 = uint32 = float64 = sourceText = sourceTextLatin = void 0; } function deserializeProgram(pos) { @@ -4856,22 +4865,26 @@ function deserializeStr(pos) { if (len === 0) return ""; pos = uint32[pos32]; let end = pos + len; - if (end <= firstNonAsciiPos) return sourceText.substr(pos, len); - // Use `TextDecoder` for strings longer than 9 bytes. - // For shorter strings, the byte-by-byte loop below avoids native call overhead. - if (len > 9) return decodeStr(uint8.subarray(pos, end)); - // Shorter strings decode by hand to avoid native call - let out = "", - c; - do { - c = uint8[pos++]; - if (c < 128) out += fromCharCode(c); - else { - out += decodeStr(uint8.subarray(pos - 1, end)); - break; - } - } while (pos < end); - return out; + if (end <= firstNonAsciiPos) return sourceTextLatin.substr(pos, len); + // Use `TextDecoder` for strings longer than 64 bytes + if (len > 64) return decodeStr(uint8.subarray(pos, end)); + if (pos < sourceEndPos) { + // Check if all bytes are ASCII, use `TextDecoder` if not + for (let i = pos; i < end; i++) if (uint8[i] >= 128) return decodeStr(uint8.subarray(pos, end)); + // String is all ASCII, so slice from `sourceTextLatin` + return sourceTextLatin.substr(pos, len); + } + // String is not in source region - use `fromCharCode.apply` with a temp array of correct length. + // Copy bytes into temp array. + // If any byte is non-ASCII, use `TextDecoder`. + let arr = stringDecodeArrays[len]; + for (let i = 0; i < len; i++) { + let b = uint8[pos + i]; + if (b >= 128) return decodeStr(uint8.subarray(pos, end)); + arr[i] = b; + } + // Call `fromCharCode` with temp array + return fromCharCode.apply(null, arr); } function deserializeVecComment(pos) { diff --git a/napi/parser/src-js/generated/deserialize/ts_parent.js b/napi/parser/src-js/generated/deserialize/ts_parent.js index 2b495eaf435f5..f8583be6b64bf 100644 --- a/napi/parser/src-js/generated/deserialize/ts_parent.js +++ b/napi/parser/src-js/generated/deserialize/ts_parent.js @@ -5,14 +5,20 @@ let uint8, uint32, float64, sourceText, + sourceTextLatin, + sourceEndPos = 0, firstNonAsciiPos = 0, parent = null; const textDecoder = new TextDecoder("utf-8", { ignoreBOM: true }), decodeStr = textDecoder.decode.bind(textDecoder), - { fromCharCode } = String; + { fromCharCode } = String, + { latin1Slice } = Buffer.prototype, + stringDecodeArrays = Array(65).fill(null); +for (let i = 0; i <= 64; i++) stringDecodeArrays[i] = Array(i).fill(0); export function deserialize(buffer, sourceText, sourceByteLen) { + sourceEndPos = sourceByteLen; let data = deserializeWith(buffer, sourceText, sourceByteLen, null, deserializeRawTransferData); resetBuffer(); return data; @@ -23,18 +29,21 @@ function deserializeWith(buffer, sourceTextInput, sourceByteLen, getLocInput, de uint32 = buffer.uint32; float64 = buffer.float64; sourceText = sourceTextInput; - if (sourceText.length === sourceByteLen) firstNonAsciiPos = sourceByteLen; - else { + if (sourceText.length === sourceByteLen) { + firstNonAsciiPos = sourceByteLen; + sourceTextLatin = sourceText; + } else { let i = 0; for (; i < sourceByteLen && uint8[i] < 128; i++); firstNonAsciiPos = i; + sourceTextLatin = latin1Slice.call(uint8, 0, sourceByteLen); } return deserialize(uint32[536870900]); } export function resetBuffer() { - // Clear buffer and source text string to allow them to be garbage collected - uint8 = uint32 = float64 = sourceText = void 0; + // Clear buffer and source text strings to allow them to be garbage collected + uint8 = uint32 = float64 = sourceText = sourceTextLatin = void 0; } function deserializeProgram(pos) { @@ -5414,22 +5423,26 @@ function deserializeStr(pos) { if (len === 0) return ""; pos = uint32[pos32]; let end = pos + len; - if (end <= firstNonAsciiPos) return sourceText.substr(pos, len); - // Use `TextDecoder` for strings longer than 9 bytes. - // For shorter strings, the byte-by-byte loop below avoids native call overhead. - if (len > 9) return decodeStr(uint8.subarray(pos, end)); - // Shorter strings decode by hand to avoid native call - let out = "", - c; - do { - c = uint8[pos++]; - if (c < 128) out += fromCharCode(c); - else { - out += decodeStr(uint8.subarray(pos - 1, end)); - break; - } - } while (pos < end); - return out; + if (end <= firstNonAsciiPos) return sourceTextLatin.substr(pos, len); + // Use `TextDecoder` for strings longer than 64 bytes + if (len > 64) return decodeStr(uint8.subarray(pos, end)); + if (pos < sourceEndPos) { + // Check if all bytes are ASCII, use `TextDecoder` if not + for (let i = pos; i < end; i++) if (uint8[i] >= 128) return decodeStr(uint8.subarray(pos, end)); + // String is all ASCII, so slice from `sourceTextLatin` + return sourceTextLatin.substr(pos, len); + } + // String is not in source region - use `fromCharCode.apply` with a temp array of correct length. + // Copy bytes into temp array. + // If any byte is non-ASCII, use `TextDecoder`. + let arr = stringDecodeArrays[len]; + for (let i = 0; i < len; i++) { + let b = uint8[pos + i]; + if (b >= 128) return decodeStr(uint8.subarray(pos, end)); + arr[i] = b; + } + // Call `fromCharCode` with temp array + return fromCharCode.apply(null, arr); } function deserializeVecComment(pos) { diff --git a/napi/parser/src-js/generated/deserialize/ts_range.js b/napi/parser/src-js/generated/deserialize/ts_range.js index 588f379a3f071..c40d72dabb786 100644 --- a/napi/parser/src-js/generated/deserialize/ts_range.js +++ b/napi/parser/src-js/generated/deserialize/ts_range.js @@ -5,13 +5,19 @@ let uint8, uint32, float64, sourceText, + sourceTextLatin, + sourceEndPos = 0, firstNonAsciiPos = 0; const textDecoder = new TextDecoder("utf-8", { ignoreBOM: true }), decodeStr = textDecoder.decode.bind(textDecoder), - { fromCharCode } = String; + { fromCharCode } = String, + { latin1Slice } = Buffer.prototype, + stringDecodeArrays = Array(65).fill(null); +for (let i = 0; i <= 64; i++) stringDecodeArrays[i] = Array(i).fill(0); export function deserialize(buffer, sourceText, sourceByteLen) { + sourceEndPos = sourceByteLen; let data = deserializeWith(buffer, sourceText, sourceByteLen, null, deserializeRawTransferData); resetBuffer(); return data; @@ -22,18 +28,21 @@ function deserializeWith(buffer, sourceTextInput, sourceByteLen, getLocInput, de uint32 = buffer.uint32; float64 = buffer.float64; sourceText = sourceTextInput; - if (sourceText.length === sourceByteLen) firstNonAsciiPos = sourceByteLen; - else { + if (sourceText.length === sourceByteLen) { + firstNonAsciiPos = sourceByteLen; + sourceTextLatin = sourceText; + } else { let i = 0; for (; i < sourceByteLen && uint8[i] < 128; i++); firstNonAsciiPos = i; + sourceTextLatin = latin1Slice.call(uint8, 0, sourceByteLen); } return deserialize(uint32[536870900]); } export function resetBuffer() { - // Clear buffer and source text string to allow them to be garbage collected - uint8 = uint32 = float64 = sourceText = void 0; + // Clear buffer and source text strings to allow them to be garbage collected + uint8 = uint32 = float64 = sourceText = sourceTextLatin = void 0; } function deserializeProgram(pos) { @@ -5429,22 +5438,26 @@ function deserializeStr(pos) { if (len === 0) return ""; pos = uint32[pos32]; let end = pos + len; - if (end <= firstNonAsciiPos) return sourceText.substr(pos, len); - // Use `TextDecoder` for strings longer than 9 bytes. - // For shorter strings, the byte-by-byte loop below avoids native call overhead. - if (len > 9) return decodeStr(uint8.subarray(pos, end)); - // Shorter strings decode by hand to avoid native call - let out = "", - c; - do { - c = uint8[pos++]; - if (c < 128) out += fromCharCode(c); - else { - out += decodeStr(uint8.subarray(pos - 1, end)); - break; - } - } while (pos < end); - return out; + if (end <= firstNonAsciiPos) return sourceTextLatin.substr(pos, len); + // Use `TextDecoder` for strings longer than 64 bytes + if (len > 64) return decodeStr(uint8.subarray(pos, end)); + if (pos < sourceEndPos) { + // Check if all bytes are ASCII, use `TextDecoder` if not + for (let i = pos; i < end; i++) if (uint8[i] >= 128) return decodeStr(uint8.subarray(pos, end)); + // String is all ASCII, so slice from `sourceTextLatin` + return sourceTextLatin.substr(pos, len); + } + // String is not in source region - use `fromCharCode.apply` with a temp array of correct length. + // Copy bytes into temp array. + // If any byte is non-ASCII, use `TextDecoder`. + let arr = stringDecodeArrays[len]; + for (let i = 0; i < len; i++) { + let b = uint8[pos + i]; + if (b >= 128) return decodeStr(uint8.subarray(pos, end)); + arr[i] = b; + } + // Call `fromCharCode` with temp array + return fromCharCode.apply(null, arr); } function deserializeVecComment(pos) { diff --git a/napi/parser/src-js/generated/deserialize/ts_range_parent.js b/napi/parser/src-js/generated/deserialize/ts_range_parent.js index 4f6e2a7d0eaee..33da28813e615 100644 --- a/napi/parser/src-js/generated/deserialize/ts_range_parent.js +++ b/napi/parser/src-js/generated/deserialize/ts_range_parent.js @@ -5,14 +5,20 @@ let uint8, uint32, float64, sourceText, + sourceTextLatin, + sourceEndPos = 0, firstNonAsciiPos = 0, parent = null; const textDecoder = new TextDecoder("utf-8", { ignoreBOM: true }), decodeStr = textDecoder.decode.bind(textDecoder), - { fromCharCode } = String; + { fromCharCode } = String, + { latin1Slice } = Buffer.prototype, + stringDecodeArrays = Array(65).fill(null); +for (let i = 0; i <= 64; i++) stringDecodeArrays[i] = Array(i).fill(0); export function deserialize(buffer, sourceText, sourceByteLen) { + sourceEndPos = sourceByteLen; let data = deserializeWith(buffer, sourceText, sourceByteLen, null, deserializeRawTransferData); resetBuffer(); return data; @@ -23,18 +29,21 @@ function deserializeWith(buffer, sourceTextInput, sourceByteLen, getLocInput, de uint32 = buffer.uint32; float64 = buffer.float64; sourceText = sourceTextInput; - if (sourceText.length === sourceByteLen) firstNonAsciiPos = sourceByteLen; - else { + if (sourceText.length === sourceByteLen) { + firstNonAsciiPos = sourceByteLen; + sourceTextLatin = sourceText; + } else { let i = 0; for (; i < sourceByteLen && uint8[i] < 128; i++); firstNonAsciiPos = i; + sourceTextLatin = latin1Slice.call(uint8, 0, sourceByteLen); } return deserialize(uint32[536870900]); } export function resetBuffer() { - // Clear buffer and source text string to allow them to be garbage collected - uint8 = uint32 = float64 = sourceText = void 0; + // Clear buffer and source text strings to allow them to be garbage collected + uint8 = uint32 = float64 = sourceText = sourceTextLatin = void 0; } function deserializeProgram(pos) { @@ -5987,22 +5996,26 @@ function deserializeStr(pos) { if (len === 0) return ""; pos = uint32[pos32]; let end = pos + len; - if (end <= firstNonAsciiPos) return sourceText.substr(pos, len); - // Use `TextDecoder` for strings longer than 9 bytes. - // For shorter strings, the byte-by-byte loop below avoids native call overhead. - if (len > 9) return decodeStr(uint8.subarray(pos, end)); - // Shorter strings decode by hand to avoid native call - let out = "", - c; - do { - c = uint8[pos++]; - if (c < 128) out += fromCharCode(c); - else { - out += decodeStr(uint8.subarray(pos - 1, end)); - break; - } - } while (pos < end); - return out; + if (end <= firstNonAsciiPos) return sourceTextLatin.substr(pos, len); + // Use `TextDecoder` for strings longer than 64 bytes + if (len > 64) return decodeStr(uint8.subarray(pos, end)); + if (pos < sourceEndPos) { + // Check if all bytes are ASCII, use `TextDecoder` if not + for (let i = pos; i < end; i++) if (uint8[i] >= 128) return decodeStr(uint8.subarray(pos, end)); + // String is all ASCII, so slice from `sourceTextLatin` + return sourceTextLatin.substr(pos, len); + } + // String is not in source region - use `fromCharCode.apply` with a temp array of correct length. + // Copy bytes into temp array. + // If any byte is non-ASCII, use `TextDecoder`. + let arr = stringDecodeArrays[len]; + for (let i = 0; i < len; i++) { + let b = uint8[pos + i]; + if (b >= 128) return decodeStr(uint8.subarray(pos, end)); + arr[i] = b; + } + // Call `fromCharCode` with temp array + return fromCharCode.apply(null, arr); } function deserializeVecComment(pos) { diff --git a/tasks/ast_tools/src/generators/raw_transfer.rs b/tasks/ast_tools/src/generators/raw_transfer.rs index 2d0cab965ebf9..94658a3dbbb5a 100644 --- a/tasks/ast_tools/src/generators/raw_transfer.rs +++ b/tasks/ast_tools/src/generators/raw_transfer.rs @@ -143,14 +143,26 @@ fn generate_deserializers( import {{ comments, initComments }} from '../plugins/comments.js'; /* END_IF */ - let uint8, uint32, float64, sourceText, sourceStartPos = 0, firstNonAsciiPos = 0; + let uint8, uint32, float64, sourceText, sourceTextLatin, + sourceStartPos = 0, sourceEndPos = 0, firstNonAsciiPos = 0; let parent = null; let getLoc; const textDecoder = new TextDecoder('utf-8', {{ ignoreBOM: true }}), - decodeStr = textDecoder.decode.bind(textDecoder), - {{ fromCharCode }} = String; + decodeStr = textDecoder.decode.bind(textDecoder); + + const {{ fromCharCode }} = String, + {{ latin1Slice }} = Buffer.prototype; + + const STRING_DECODE_CROSSOVER = 64; + + // Arrays used by `deserializeStr` for passing to `String.fromCharCode`. + // These arrays are reused over and over, avoiding allocating a new temporary array for each string. + const stringDecodeArrays = new Array(STRING_DECODE_CROSSOVER + 1).fill(null); + for (let i = 0; i <= STRING_DECODE_CROSSOVER; i++) {{ + stringDecodeArrays[i] = new Array(i).fill(0); + }} /* IF LOC */ const NodeProto = Object.create(Object.prototype, {{ @@ -166,6 +178,7 @@ fn generate_deserializers( /* IF !LINTER */ export function deserialize(buffer, sourceText, sourceByteLen) {{ + sourceEndPos = sourceByteLen; const data = deserializeWith(buffer, sourceText, sourceByteLen, null, deserializeRawTransferData); resetBuffer(); return data; @@ -191,22 +204,29 @@ fn generate_deserializers( // Find first non-ASCII byte in source region. // `sourceText.substr()` can be used for strings which are within source text and ending before // this position, since byte offsets equal char offsets in the all-ASCII prefix. + // Also decode source text as Latin-1 (or reuse `sourceText` if it's all ASCII). if (LINTER) {{ if (sourceIsAscii === true) {{ firstNonAsciiPos = sourceStartPos + sourceByteLen; + sourceTextLatin = sourceText; }} else {{ let i = sourceStartPos; const sourceEndPos = sourceStartPos + sourceByteLen; for (; i < sourceEndPos && uint8[i] < 128; i++); firstNonAsciiPos = i; + + sourceTextLatin = latin1Slice.call(uint8, sourceStartPos, sourceEndPos); }} }} else {{ if (sourceIsAscii === true) {{ firstNonAsciiPos = sourceByteLen; + sourceTextLatin = sourceText; }} else {{ let i = 0; for (; i < sourceByteLen && uint8[i] < 128; i++); firstNonAsciiPos = i; + + sourceTextLatin = latin1Slice.call(uint8, 0, sourceByteLen); }} }} @@ -216,8 +236,8 @@ fn generate_deserializers( }} export function resetBuffer() {{ - // Clear buffer and source text string to allow them to be garbage collected - uint8 = uint32 = float64 = sourceText = undefined; + // Clear buffer and source text strings to allow them to be garbage collected + uint8 = uint32 = float64 = sourceText = sourceTextLatin = undefined; }} "); @@ -930,54 +950,70 @@ fn generate_primitive(primitive_def: &PrimitiveDef, code: &mut String, schema: & static STR_DESERIALIZER_BODY: &str = " const pos32 = pos >> 2, len = uint32[pos32 + 2]; + if (len === 0) return ''; pos = uint32[pos32]; const end = pos + len; - if (LINTER) { - // Note: Tried reducing this check to a single branch by making the comparison the equivalent of this Rust: - // `end.wrapping_sub(sourceStartPos) <= firstNonAsciiOffset`. - // - // The JS versions tried were: - // - `((end - sourceStartPos) >>> 0) <= firstNonAsciiOffset` - // - `((end - sourceStartPos) & 0x7FFF_FFFF) <= firstNonAsciiOffset` - // But it turned out that these are both slower by 5-10% on files which are all ASCII. - // - // `>>>` is slower as V8 can't assume result fits in an SMI (which is a 32-bit *signed* integer), - // as result could be greater or equal to `2 ** 31`. So it converts both the comparison's operands to `float64`s - // and does float compare (which is slower than integer compare). - // - // `& 0x7FFF_FFFF` is slower as it has a longer chain of data dependencies than the 2 independent - // branch comparisons. - // - // Both branches are very predictable, so 2 branches wins. - if (pos >= sourceStartPos && end <= firstNonAsciiPos) { - return sourceText.substr(pos - sourceStartPos, len); - } - } else { - if (end <= firstNonAsciiPos) return sourceText.substr(pos, len); + /* IF !LINTER */ + if (end <= firstNonAsciiPos) return sourceTextLatin.substr(pos, len); + /* END_IF */ + + /* IF LINTER */ + // Note: Tried reducing this check to a single branch by making the comparison the equivalent of this Rust: + // `end.wrapping_sub(sourceStartPos) <= firstNonAsciiOffset`. + // + // The JS versions tried were: + // - `((end - sourceStartPos) >>> 0) <= firstNonAsciiOffset` + // - `((end - sourceStartPos) & 0x7FFF_FFFF) <= firstNonAsciiOffset` + // But it turned out that these are both slower by 5-10% on files which are all ASCII. + // + // `>>>` is slower as V8 can't assume result fits in an SMI (which is a 32-bit *signed* integer), + // as result could be greater or equal to `2 ** 31`. So it converts both the comparison's operands to `float64`s + // and does float compare (which is slower than integer compare). + // + // `& 0x7FFF_FFFF` is slower as it has a longer chain of data dependencies than the 2 independent + // branch comparisons. + // + // Both branches are very predictable, so 2 branches wins. + const isInSourceRegion = pos >= sourceStartPos; + if (isInSourceRegion && end <= firstNonAsciiPos) { + return sourceTextLatin.substr(pos - sourceStartPos, len); } + /* END_IF */ - // Use `TextDecoder` for strings longer than 9 bytes. - // For shorter strings, the byte-by-byte loop below avoids native call overhead. - if (len > 9) return decodeStr(uint8.subarray(pos, end)); - - // Shorter strings decode by hand to avoid native call - let out = '', - c; - do { - c = uint8[pos++]; - if (c < 0x80) { - out += fromCharCode(c); - } else { - out += decodeStr(uint8.subarray(pos - 1, end)); - break; + // Use `TextDecoder` for strings longer than 64 bytes + if (len > STRING_DECODE_CROSSOVER) return decodeStr(uint8.subarray(pos, end)); + + // If string is in source region, use slice of `sourceTextLatin` if all ASCII + /* IF !LINTER */ + const isInSourceRegion = pos < sourceEndPos; + /* END_IF */ + + if (isInSourceRegion) { + // Check if all bytes are ASCII, use `TextDecoder` if not + for (let i = pos; i < end; i++) { + if (uint8[i] >= 128) return decodeStr(uint8.subarray(pos, end)); } - } while (pos < end); - return out; + // String is all ASCII, so slice from `sourceTextLatin` + return sourceTextLatin.substr(LINTER ? pos - sourceStartPos : pos, len); + } + + // String is not in source region - use `fromCharCode.apply` with a temp array of correct length. + // Copy bytes into temp array. + // If any byte is non-ASCII, use `TextDecoder`. + const arr = stringDecodeArrays[len]; + for (let i = 0; i < len; i++) { + const b = uint8[pos + i]; + if (b >= 128) return decodeStr(uint8.subarray(pos, end)); + arr[i] = b; + } + + // Call `fromCharCode` with temp array + return fromCharCode.apply(null, arr); "; /// Generate deserialize function for an `Option`.