Skip to content

Commit

Permalink
Lazy compile
Browse files Browse the repository at this point in the history
  • Loading branch information
slevithan committed Feb 3, 2025
1 parent 4ca5ff4 commit 16c9bd0
Show file tree
Hide file tree
Showing 10 changed files with 58 additions and 37 deletions.
14 changes: 13 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ Oniguruma-To-ES deeply understands the hundreds of large and small differences b
- [Examples](#-examples)
- [Install and use](#️-install-and-use)
- [API](#-api): [`toRegExp`](#toregexp), [`toRegExpDetails`](#toregexpdetails), [`toOnigurumaAst`](#toonigurumaast), [`EmulatedRegExp`](#emulatedregexp)
- [Options](#-options): [`accuracy`](#accuracy), [`avoidSubclass`](#avoidsubclass), [`flags`](#flags), [`global`](#global), [`hasIndices`](#hasindices), [`rules`](#rules), [`target`](#target), [`verbose`](#verbose)
- [Options](#-options): [`accuracy`](#accuracy), [`avoidSubclass`](#avoidsubclass), [`flags`](#flags), [`global`](#global), [`hasIndices`](#hasindices), [`lazyCompileLength`](#lazycompilelength), [`rules`](#rules), [`target`](#target), [`verbose`](#verbose)
- [Supported features](#-supported-features)
- [Unsupported features](#-unsupported-features)
- [Unicode](#️-unicode)
Expand Down Expand Up @@ -106,6 +106,7 @@ type ToRegExpOptions = {
flags?: string;
global?: boolean;
hasIndices?: boolean;
lazyCompileLength?: number;
rules?: {
allowOrphanBackrefs?: boolean;
asciiWordBoundaries?: boolean;
Expand Down Expand Up @@ -176,6 +177,7 @@ The `rawOptions` property of `EmulatedRegExp` instances can be used for serializ
```ts
type EmulatedRegExpOptions = {
hiddenCaptures?: Array<number>;
lazyCompile?: boolean;
strategy?: string | null;
transfers?: Array<[number, Array<number>]>;
};
Expand Down Expand Up @@ -239,6 +241,16 @@ Include JavaScript flag `g` (`global`) in the result.

Include JavaScript flag `d` (`hasIndices`) in the result.

### `lazyCompileLength`

*Default: `Infinity`. In other words, lazy compilation is off by default.*

Delay regex construction until first use if the transpiled pattern is at least this length.

Although regex compilation in JavaScript is fast, it can sometimes be helpful to defer this cost for extremely long regexes. This option defers the time JavaScript spends inside the `RegExp` constructor on building the transpiled pattern into a regex object; it's not about transpilation or search performance.

Lazy compilation is a feature of the `EmulatedRegExp` constructor, so enabling option `avoidSubclass` prevents lazy compilation.

### `rules`

Advanced options that override standard behavior, error checking, and flags when enabled.
Expand Down
2 changes: 1 addition & 1 deletion demo/demo.css
Original file line number Diff line number Diff line change
Expand Up @@ -171,7 +171,7 @@ pre, code, kbd, textarea {
}

#more-options-cols div {
margin-right: 5%;
margin-right: 3%;
}

#output, textarea {
Expand Down
6 changes: 5 additions & 1 deletion demo/demo.js
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ const state = {
avoidSubclass: getValue('option-avoidSubclass'),
global: getValue('option-global'),
hasIndices: getValue('option-hasIndices'),
lazyCompileMinLength: getValue('option-lazyCompileMinLength'),
lazyCompileLength: getValue('option-lazyCompileLength'),
rules: {
allowOrphanBackrefs: getValue('option-allowOrphanBackrefs'),
asciiWordBoundaries: getValue('option-asciiWordBoundaries'),
Expand Down Expand Up @@ -206,6 +206,10 @@ function getValue(id) {
if (el.type === 'checkbox') {
return el.checked;
}
if (id === 'option-lazyCompileLength') {
// Turn dropdown values into numbers
return el.value === 'Infinity' ? Infinity : parseInt(el.value, 10);
}
return el.value;
}

Expand Down
11 changes: 8 additions & 3 deletions demo/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -112,9 +112,14 @@ <h2>Try it</h2>
</p>
<p>
<label>
<input type="number" id="option-lazyCompileMinLength" value="100" min="0" max="200" onchange="setOption('lazyCompileMinLength', +this.value)" onkeyup="setOption('lazyCompileMinLength', +this.value)">
<code>lazyCompileMinLength</code>
<span class="tip tip-xl">Pattern length threshold for delaying regex construction until first use</span>
<select id="option-lazyCompileLength" onchange="setOption('lazyCompileLength', this.value === 'Infinity' ? Infinity : parseInt(this.value, 10))">
<option value="Infinity" selected>Infinity</option>
<option value="3000">3000</option>
<option value="500">500</option>
<option value="0">0</option>
</select>
<code>lazyCompileLength</code>
<span class="tip tip-xl">Delay regex construction until first use if the transpiled pattern is at least this length</span>
</label>
</p>
</div>
Expand Down
4 changes: 2 additions & 2 deletions scripts/utils.js
Original file line number Diff line number Diff line change
Expand Up @@ -34,8 +34,8 @@ function err(i, msg) {

/**
@typedef {{
result: string | null;
index: number | null;
result: string?;
index: number?;
error?: Error;
}} MatchDetails
*/
Expand Down
4 changes: 2 additions & 2 deletions spec/toregexpdetails.spec.js
Original file line number Diff line number Diff line change
Expand Up @@ -30,8 +30,8 @@ describe('toRegExpDetails', () => {
});

it('should include an options property when the pattern uses lazy compilation', () => {
expect(Object.keys(toRegExpDetails('a', {lazyCompileMinLength: 0}))).toEqual(extProps);
expect(Object.keys(toRegExpDetails('a', {lazyCompileMinLength: Infinity}))).toEqual(props);
expect(Object.keys(toRegExpDetails('a', {lazyCompileLength: 0}))).toEqual(extProps);
expect(Object.keys(toRegExpDetails('a', {lazyCompileLength: Infinity}))).toEqual(props);
});
});
});
10 changes: 7 additions & 3 deletions src/index.js
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ function toOnigurumaAst(pattern, options) {
flags?: string;
global?: boolean;
hasIndices?: boolean;
lazyCompileMinLength?: number;
lazyCompileLength?: number;
rules?: {
allowOrphanBackrefs?: boolean;
asciiWordBoundaries?: boolean;
Expand Down Expand Up @@ -112,13 +112,17 @@ function toRegExpDetails(pattern, options) {
pattern: atomicResult.pattern,
flags: `${opts.hasIndices ? 'd' : ''}${opts.global ? 'g' : ''}${generated.flags}${generated.options.disable.v ? 'u' : 'v'}`,
};
if (!opts.avoidSubclass) {
if (opts.avoidSubclass) {
if (opts.lazyCompileLength !== Infinity) {
throw new Error('Lazy compilation requires subclass');
}
} else {
// Sort isn't required; only for readability when serialized
const hiddenCaptures = atomicResult.hiddenCaptures.sort((a, b) => a - b);
// Change the map to the `EmulatedRegExp` format, serializable as JSON
const transfers = Array.from(atomicResult.captureTransfers);
const strategy = regexAst._strategy;
const lazyCompile = details.pattern.length >= opts.lazyCompileMinLength;
const lazyCompile = details.pattern.length >= opts.lazyCompileLength;
if (hiddenCaptures.length || transfers.length || strategy || lazyCompile) {
details.options = {
...(hiddenCaptures.length && {hiddenCaptures}),
Expand Down
4 changes: 2 additions & 2 deletions src/options.js
Original file line number Diff line number Diff line change
Expand Up @@ -41,8 +41,8 @@ function getOptions(options) {
global: false,
// Include JavaScript flag `d` (`hasIndices`) in the result.
hasIndices: false,
// Pattern length threshold for delaying regex construction until first use.
lazyCompileMinLength: Infinity,
// Delay regex construction until first use if the transpiled pattern is at least this length.
lazyCompileLength: Infinity,
// JavaScript version used for generated regexes. Using `auto` detects the best value based on
// your environment. Later targets allow faster processing, simpler generated source, and
// support for additional features.
Expand Down
38 changes: 17 additions & 21 deletions src/subclass.js
Original file line number Diff line number Diff line change
Expand Up @@ -22,17 +22,23 @@ class EmulatedRegExp extends RegExp {
*/
#captureMap = new Map();

/**
@type {RegExp | EmulatedRegExp | null}
*/
#compiled = null;

/**
@type {string}
*/
#pattern;

/**
@type {Map<number, string> | null}
@type {Map<number, string>?}
*/
#nameMap = null;

#regexp;

/**
@type {string | null}
@type {string?}
*/
#strategy = null;

Expand Down Expand Up @@ -90,21 +96,21 @@ class EmulatedRegExp extends RegExp {
this.rawOptions = options ?? {};
}
if (!lazyCompile) {
this.#regexp = this;
this.#compiled = this;
}
}

/**
Called internally by all String/RegExp methods that use regexes.
@override
@param {string} str
@returns {RegExpExecArray | null}
@returns {RegExpExecArray?}
*/
exec(str) {
// Lazy compilation
if (!this.#regexp) {
if (!this.#compiled) {
const {lazyCompile, ...rest} = this.rawOptions;
this.#regexp = new EmulatedRegExp(this.#pattern, this.flags, rest);
this.#compiled = new EmulatedRegExp(this.#pattern, this.flags, rest);
}

const useLastIndex = this.global || this.sticky;
Expand Down Expand Up @@ -137,24 +143,14 @@ class EmulatedRegExp extends RegExp {
*/
#execCore(str) {
// Support lazy compilation
this.#regexp.lastIndex = this.lastIndex;
const match = super.exec.call(this.#regexp, str);
this.lastIndex = this.#regexp.lastIndex;
this.#compiled.lastIndex = this.lastIndex;
const match = super.exec.call(this.#compiled, str);
this.lastIndex = this.#compiled.lastIndex;

if (!match || !this.#captureMap.size) {
return match;
}

// Treat use of lazy compilation as a license to more aggressively optimize for perf. By
// removing `groups` and forcing reliance on numbered subpatterns, we can avoid some work
// below. Note that `vscode-oniguruma` doesn't include subpatterns/indices by name in results
if (this.rawOptions.lazyCompile) {
match.groups = undefined;
if (this.hasIndices) {
match.indices.groups = undefined;
}
}

const matchCopy = [...match];
// Empty all but the first value of the array while preserving its other properties
match.length = 1;
Expand Down
2 changes: 1 addition & 1 deletion src/transform.js
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ import emojiRegex from 'emoji-regex-xs';
flags: Object;
options: Object;
_originMap: Map<Object, Object>;
_strategy: string | null;
_strategy: string?;
}} RegexAst
*/
/**
Expand Down

0 comments on commit 16c9bd0

Please sign in to comment.