Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

J2CL transpiled compiler Scanner.java doesn't understand non-ascii characters. #2383

Open
brad4d opened this issue Mar 21, 2017 · 2 comments

Comments

@brad4d
Copy link
Contributor

brad4d commented Mar 21, 2017

JavaScript is supposed to allow non-ASCII unicode letters in identifiers, but the gwt-compiled version of Scanner.java doesn't recognize them because GWT's version of Character.isLetter() doesn't understand unicode..

https://github.com/google/closure-compiler/blob/master/src/com/google/javascript/jscomp/parsing/parser/Scanner.java#L775

Esprima has written their own logic for this in TypeScript here.
https://github.com/jquery/esprima/blob/master/src/character.ts

We should emulate this logic in Scanner.java instead of relying on Character, since it's buggy in GWT.

@brad4d brad4d changed the title GWT transpiled compiler Scanner.java doesn't understand non-ascii characters. J2CL transpiled compiler Scanner.java doesn't understand non-ascii characters. Apr 30, 2020
@ctjlewis
Copy link
Contributor

ctjlewis commented Jul 20, 2020

We should emulate this logic in Scanner.java instead of relying on Character, since it's buggy in GWT.

Does this still hold? I figured getting Character.isAlphabetic and constant fields like Character.COMBINING_SPACING_MARK exposed to J2CL would definitely help with this, and let us rewrite the logic in Scanner.java to be future-proof. See google/j2cl#103, issue #3639, and PR #3645.

I'm focused on the isAlphabetic method because it single-handedly matches Lu, Ll, Lt, Lm, Lo, NI characters for IdentifierPart grammar in the ECMA spec. I'm happy to go through and hardcode all of this grammar, but it will break as new Unicode characters are added (hence #3639) - it seems like most long-term reward would be given by getting Character to interface well with J2CL but I could be very wrong.

@brad4d
Copy link
Contributor Author

brad4d commented Jul 27, 2020

AFAIK, switching to J2CL did not make this situation better.
I doubt fixing it is ever going to be a priority.
Supporting the Java version of the compiler and the Graal-compiled native version are our focus.
The compiled-to-JS version is strictly a nice-to-have.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants