Skip to content

fix(ast/estree): fix serializing RegExpLiteral::value field#9028

Closed
overlookmotel wants to merge 1 commit intomainfrom
02-10-fix_ast_estree_fix_serializing_regexpliteral_value_field
Closed

fix(ast/estree): fix serializing RegExpLiteral::value field#9028
overlookmotel wants to merge 1 commit intomainfrom
02-10-fix_ast_estree_fix_serializing_regexpliteral_value_field

Conversation

@overlookmotel
Copy link
Member

@overlookmotel overlookmotel commented Feb 10, 2025

About half the remaining failing conformance tests relate to RegExps. This should fix them, but it's not working, and I don't know why.

It looks like the value field should be {} for a valid regexp, and null for an invalid one. So this PR enables parsing regexps in Oxc's parser, and sets value field accordingly depending on whether the regexp parser says it's valid or not.

The weird thing is that some cases which should pass, still don't. e.g.:

https://github.com/tc39/test262/blob/bc5c14176e2b11a78859571eb693f028c8822458/test/built-ins/RegExp/regexp-modifiers/add-ignoreCase.js

This test contains var re1 = /(?i:a)b/;, which is a valid regexp.

Sure enough, AST Explorer shows value: {}. https://ast.sxzz.dev/#eNoliTsKhDAQhq8if7ULgn2avcSy1TSz4xSRmAlJFES8uwN23+NEQsDCOzepsXSMKB5YrGZncZ5enxj4/Z/czf2kPAwElZV/Wlu0TAgeEndtnTA+v9lWRb9H0WevNm/JmfKF6wZsLyU5

But when running Acorn locally (v8.14.0, same as AST Explorer uses), it shows value: null. Whaaaat?

import { Parser } from "acorn";
const ast = Parser.parse('/(?i:a)b/', {ecmaVersion: "latest", sourceType: "module"});
console.log(ast.body[0].expression);

@sxzz As the author of AST Explorer, do you have any insight into this?

Copy link
Member Author

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more


How to use the Graphite Merge Queue

Add either label to this PR to merge it via the merge queue:

  • 0-merge - adds this PR to the back of the merge queue
  • hotfix - for urgent hot fixes, skip the queue and merge this PR next

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

@overlookmotel overlookmotel changed the base branch from 02-10-fix_ast_estree_fix_serializing_regexpliteral_flags to graphite-base/9028 February 10, 2025 19:57
@overlookmotel overlookmotel force-pushed the 02-10-fix_ast_estree_fix_serializing_regexpliteral_value_field branch from de58aec to d69951a Compare February 10, 2025 20:02
@overlookmotel overlookmotel changed the base branch from graphite-base/9028 to main February 10, 2025 20:03
@overlookmotel overlookmotel force-pushed the 02-10-fix_ast_estree_fix_serializing_regexpliteral_value_field branch from d69951a to 03fd1eb Compare February 10, 2025 20:03
@codspeed-hq
Copy link

codspeed-hq bot commented Feb 10, 2025

CodSpeed Performance Report

Merging #9028 will not alter performance

Comparing 02-10-fix_ast_estree_fix_serializing_regexpliteral_value_field (03fd1eb) with main (2b47299)

Summary

✅ 33 untouched benchmarks

@hi-ogawa
Copy link
Contributor

hi-ogawa commented Feb 11, 2025

Oh, this is pretty confusing. The reason why local run shows null is because such regex is not supported Node 22 while Ast explorer running on chrome probably already supports it. Such fallback is done by acorn side https://github.com/acornjs/acorn/blob/7b38623b5fdc364cd4945f03e50ea8dc19fb4502/acorn/src/tokenize.js#L436-L443.

Node 23 seems to catch up with this, so value gets correct.

$ node --version
v23.7.0
$ node repro.mjs 
Node {
  type: 'Literal',
  start: 0,
  end: 9,
  value: /(?i:a)b/,
  raw: '/(?i:a)b/',
  regex: { pattern: '(?i:a)b', flags: '' }
}

$ node --version
v22.13.1
$ node repro.mjs 
Node {
  type: 'Literal',
  start: 0,
  end: 9,
  value: null,
  raw: '/(?i:a)b/',
  regex: { pattern: '(?i:a)b', flags: '' }
}

Since serialization assumes json, supporting arbitrary Literal.value seems impossible unless doing extra traverse in js. I'm not sure what's the usage in the wild, but I think skip serializing non-json Literal.value and treat it as null is okay? (or empty object {} which is JSON.stringify of regexp).

@overlookmotel
Copy link
Member Author

@hi-ogawa Thanks very much for figuring this out.

Right, so there is no canonically correct JSON for RegExpLiteral then. What Acorn produces depends on what platform you run it on. This is rather unfortunate!

Also, when running Acorn from JS, you would never get {} as the value field. You would get an actual RegExp.

The code which handles this in Rollup is:

function literalRegExp(position, buffer): LiteralRegExpNode {
  const flags = buffer.convertString(buffer[position + 2]);
  const pattern = buffer.convertString(buffer[position + 3]);
  return {
    type: 'Literal',
    start: buffer[position],
    end: buffer[position + 1],
    raw: `/${pattern}/${flags}`,
    regex: { flags, pattern },
    value: new RegExp(pattern, flags)
  };
}

So rollup's parseAst will throw an error SyntaxError: Invalid regular expression if the regexp is not supported on the platform running Rollup. Not ideal!

Our primary target is matching Rollup, so here's what I think we should do:

  • Output "value": null in JSON for all RegExpLiterals.
  • Use JS-side code that runs after JSON.parse to set value field to an actual RegExp, same as we need to do for BigInts.
  • Amend acorn-test262 to also replace all RegExps with null, so that our tests pass.

We can improve on Rollup by using this code on JS side:

function convertRegexp(regexpLiteral) {
  try {
    regexpLiteral.value = new RegExp(regexpLiteral.regex.pattern, regexpLiteral.regex.flags);
  } catch (_err) {}
}

JS-side code:

Step 1: The quick-and-dirty first implementation can be a complete AST traversal on JS side, setting the value field of Literals for both RegExps and BigInts.

Step 2: Improve performance by compiling a list of "path"s to where RegExps and BigInts are in the AST in serializer on Rust side and sending that to JS. JS can then "surgically" go to where it needs to make corrections to the AST, rather than a full traversal. (as described in #8994 (comment))

Step 3: AST transfer will make the problem go away entirely, since it constructs all AST node Objects manually anyway. So it can produce correct value values while doing that.

I think this is at least an OK battle plan, so am going to go ahead and take the first step now. We can revise approach later on if we can figure out something better.

@overlookmotel overlookmotel deleted the 02-10-fix_ast_estree_fix_serializing_regexpliteral_value_field branch February 11, 2025 12:19
graphite-app bot pushed a commit that referenced this pull request Feb 11, 2025
Output `null` in JSON for `value` field of `RegExpLiteral`.

This PR also bumps the `acorn-estree` submodule, to include commit oxc-project/estree-conformance@bd99d40 which does the same in the JSON snapshots of Acorn's output.

This is first step as described in #9028 (comment).

A later PR will set the `value` field correctly on JS side.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-ast Area - AST C-bug Category - Bug

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants