Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How are you handling the division and regExp literal ambiguity? #29

Closed
bd82 opened this issue Apr 28, 2018 · 4 comments
Closed

How are you handling the division and regExp literal ambiguity? #29

bd82 opened this issue Apr 28, 2018 · 4 comments
Labels

Comments

@bd82
Copy link

bd82 commented Apr 28, 2018

https://stackoverflow.com/questions/5519596/when-parsing-javascript-what-determines-the-meaning-of-a-slash

https://github.com/sweet-js/sweet-core/wiki/design

Is there a separate Lexing phase or is lexing done on demand and provided context by the parser?

@w-y
Copy link
Owner

w-y commented May 2, 2018

@bd82

Jison provides status stack in the lexing phase which can be used to solve this problem.

In the lexing phase, you can return different token for the bnf rules or update status stack based on the current context (the status stack, the token matched and the token looking ahead).

Here is the code:
https://github.com/w-y/ecma262-jison/blob/master/src/lex/regexp.js#L23

Each lexical rule has a 'conditions' property and the handler will be triggered only when the top of the status stack matched. In this way, you can handle the symbol ambiguity if you enumerate all the possible status. All the things is done in lexing phase and I just use different alias of the same '/' in the bnf rules.
https://github.com/w-y/ecma262-jison/blob/master/src/bnf/RegularExpressionLiteral.js

And here is my test cases, To be honest I spent me a lot time to solve the ambiguity:
https://github.com/w-y/ecma262-jison/blob/master/mock/regexp.js

@w-y w-y added the question label May 2, 2018
@w-y
Copy link
Owner

w-y commented May 2, 2018

This example explains how the status stack work:

a//b/
token top status status stack output
a INITIAL [identifier_start, INITIAL] IDENTIFIER
/ identifier_start [INITIAL] DIV
/ INITIAL [regex_start, INITIAL] LEFT_REGEXP_DIV
b regex_start [regex_start, INITIAL] REGEX_CHAR
/ regex_start [INITIAL] RIGHT_REGEXP_DIV

@bd82
Copy link
Author

bd82 commented May 2, 2018

Thank you.

I understand that you are using the lexer previous context to figure out if the "/" is a division or regExp start.

This approach may be complicated to support in the long term as the ecma grammar
changes every year and more and more edge cases may be added.

See: acornjs/acorn#589

I have created an ECMA5 parser example using a parser library I've authored:
But due to the complexity of the lexer I've decided to initially use Acorn to scan.

The increasing complexity of the scanner logic means that if and when I I will want to implement the scanner myself,
I will probably attempt to refactor the grammar to perform scanning on demand and not as a separate step thus having the full parser context available to disambiguate the "/".

@bd82 bd82 closed this as completed May 2, 2018
@w-y
Copy link
Owner

w-y commented May 2, 2018

I am also trying to figure out a better way to organize the code of the lexical part. When I try to extend the parser(like jsx support), the bnf parts is easy but the lexical part need to add some status and fix ambiguity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants