Expose java.lang.Character.getType() and constant fields like COMBINING_SPACING_MARK #103

ctjlewis · 2020-07-20T01:55:07Z

It seems that J2CL is unable to load the Character.getType or Unicode category constant fields (like Character.COMBINING_SPACING_MARK), and throws a "symbol not found" error.

For reference, see google/closure-compiler#3639, where Closure Compiler was unable to interpret a composite Unicode sequence as a valid IdentifierPart. In CC, there is a part of the parsing process which relies on Scanner.java, a class that determines if a given token is an IdentifierStart or IdentifierPart in compliance with the ECMAScript spec. All token Unicode category checks are currently done by evaluating if the character belongs to any hard-coded Unicode ranges (see below), an approach that I replicated that for this fix, but is not as future-proof nor as legible as Character.getType(char) == Character.COMBINING_SPACING_MARK, which will work as the Unicode standard evolves over time.

private static boolean isCombiningMark(char ch) {
    return (
      // 0300-036F
      (0x0300 <= ch & ch <= 0x036F) |
      // 1AB0–1AFF
      (0x1AB0 <= ch & ch <= 0x1AFF) |
      // 1DC0–1DFF
      (0x1DC0 <= ch & ch <= 0x1DFF) |
      // 20D0–20FF
      (0x20D0 <= ch & ch <= 0x20FF) |
      // FE20–FE2F
      (0xFE20 <= ch & ch <= 0xFE2F)
    );
    // TODO (ctjl): Implement in a more reliable and future-proofed way, i.e.:
    // return Character.getType(ch) == Character.NON_SPACING_MARK;
  }

This hardcoded, manual approach is taken for every Unicode category check in the jsComp library because the J2CL compile must succeed in order to push a release (using Character.getType() will compile using maven, but not with bazel). It would be beneficial for the CC library if J2CL could support these.

The text was updated successfully, but these errors were encountered:

ctjlewis · 2020-07-20T03:49:29Z

It seems like this might be a Guava issue rather than a j2cl one - please close this if so.

gkdn · 2020-07-21T00:59:44Z

J2CL doesn't support various java.lang.Character APIs since they are too costly to support on the web.

ctjlewis mentioned this issue Jul 20, 2020

Added support for UnicodeCombiningMark, fixes #3639. google/closure-compiler#3645

Closed

ctjlewis mentioned this issue Jul 20, 2020

J2CL transpiled compiler Scanner.java doesn't understand non-ascii characters. google/closure-compiler#2383

Open

gkdn closed this as completed Jul 21, 2020

ctjlewis mentioned this issue Aug 12, 2020

feat: add full Unicode support for Javascript identifiers google/closure-compiler#3647

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expose java.lang.Character.getType() and constant fields like COMBINING_SPACING_MARK #103

Expose java.lang.Character.getType() and constant fields like COMBINING_SPACING_MARK #103

ctjlewis commented Jul 20, 2020 •

edited

Loading

ctjlewis commented Jul 20, 2020

gkdn commented Jul 21, 2020

Expose java.lang.Character.getType() and constant fields like COMBINING_SPACING_MARK #103

Expose java.lang.Character.getType() and constant fields like COMBINING_SPACING_MARK #103

Comments

ctjlewis commented Jul 20, 2020 • edited Loading

ctjlewis commented Jul 20, 2020

gkdn commented Jul 21, 2020

ctjlewis commented Jul 20, 2020 •

edited

Loading