How are you handling the division and regExp literal ambiguity? #29

bd82 · 2018-04-28T22:37:32Z

https://stackoverflow.com/questions/5519596/when-parsing-javascript-what-determines-the-meaning-of-a-slash

https://github.com/sweet-js/sweet-core/wiki/design

Is there a separate Lexing phase or is lexing done on demand and provided context by the parser?

w-y · 2018-05-02T07:35:52Z

Jison provides status stack in the lexing phase which can be used to solve this problem.

In the lexing phase, you can return different token for the bnf rules or update status stack based on the current context (the status stack, the token matched and the token looking ahead).

Here is the code:
https://github.com/w-y/ecma262-jison/blob/master/src/lex/regexp.js#L23

Each lexical rule has a 'conditions' property and the handler will be triggered only when the top of the status stack matched. In this way, you can handle the symbol ambiguity if you enumerate all the possible status. All the things is done in lexing phase and I just use different alias of the same '/' in the bnf rules.
https://github.com/w-y/ecma262-jison/blob/master/src/bnf/RegularExpressionLiteral.js

And here is my test cases, To be honest I spent me a lot time to solve the ambiguity:
https://github.com/w-y/ecma262-jison/blob/master/mock/regexp.js

w-y · 2018-05-02T08:24:11Z

This example explains how the status stack work:

a//b/

token	top status	status stack	output
a	INITIAL	[identifier_start, INITIAL]	IDENTIFIER
/	identifier_start	[INITIAL]	DIV
/	INITIAL	[regex_start, INITIAL]	LEFT_REGEXP_DIV
b	regex_start	[regex_start, INITIAL]	REGEX_CHAR
/	regex_start	[INITIAL]	RIGHT_REGEXP_DIV

bd82 · 2018-05-02T10:10:31Z

Thank you.

I understand that you are using the lexer previous context to figure out if the "/" is a division or regExp start.

This approach may be complicated to support in the long term as the ecma grammar
changes every year and more and more edge cases may be added.

See: acornjs/acorn#589

I have created an ECMA5 parser example using a parser library I've authored:
But due to the complexity of the lexer I've decided to initially use Acorn to scan.

The increasing complexity of the scanner logic means that if and when I I will want to implement the scanner myself,
I will probably attempt to refactor the grammar to perform scanning on demand and not as a separate step thus having the full parser context available to disambiguate the "/".

w-y · 2018-05-02T11:34:05Z

I am also trying to figure out a better way to organize the code of the lexical part. When I try to extend the parser(like jsx support), the bnf parts is easy but the lexical part need to add some status and fix ambiguity.

w-y added the question label May 2, 2018

bd82 closed this as completed May 2, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How are you handling the division and regExp literal ambiguity? #29

How are you handling the division and regExp literal ambiguity? #29

bd82 commented Apr 28, 2018

w-y commented May 2, 2018 •

edited

Loading

w-y commented May 2, 2018

bd82 commented May 2, 2018 •

edited

Loading

w-y commented May 2, 2018

How are you handling the division and regExp literal ambiguity? #29

How are you handling the division and regExp literal ambiguity? #29

Comments

bd82 commented Apr 28, 2018

w-y commented May 2, 2018 • edited Loading

w-y commented May 2, 2018

bd82 commented May 2, 2018 • edited Loading

w-y commented May 2, 2018

w-y commented May 2, 2018 •

edited

Loading

bd82 commented May 2, 2018 •

edited

Loading