Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added support for UnicodeCombiningMark, fixes #3639. #3645

Closed
wants to merge 0 commits into from

Conversation

ctjlewis
Copy link
Contributor

Some notes:

  1. Annotating functions with spec grammar is likely wise (and will help with issues like this in the future), so I did not remove that JSDOC block.
  2. A utils/ directory was added, which contains a script for building the source (without GWT) and a script for calling that built version with the given args. This directory will be a good place to put bash scripts in the future, including runtime_tests.sh to execute the jsComp runtime tests (a separate issue I'm working on).
  3. There is more grammar to implement if we want to be consistent with the spec (isConnectorPunctuation, isZeroWidthJoiner, etc.), which will prevent issues like this in the future. I intend to do this over the coming days, I just wanted to get this PR in first.

This PR fixes #3639, and you can see that the source:

var bar = {
  : "foo"
};

now compiles as expected to:

var bar={"I\u0307":"foo"};

@ctjlewis
Copy link
Contributor Author

ctjlewis commented Jul 19, 2020

Build tests are failing because:

/src/com/google/javascript/jscomp/parsing/parser/Scanner.java:909: error: cannot find symbol
    return Character.getType(ch) == Character.NON_SPACING_MARK;
                    ^
  symbol:   method getType(char)
  location: class Character
/src/com/google/javascript/jscomp/parsing/parser/Scanner.java:909: error: cannot find symbol
    return Character.getType(ch) == Character.NON_SPACING_MARK;

Anybody have recommendations? Not sure why Character.getType would not be defined, I even added a manual import for java.lang.Character, and getType method has been defined since JDK 1.1.

I noticed the rest of the character checks did not use builtins like Character.NO_SPACING_MARK, but this syntax is much more legible on top of being future-proofed and likely just as performant.

Edit: Is this breaking because it's being fed into J2CL, which can't compile Character.getType and therefore doesn't import it?

@ctjlewis
Copy link
Contributor Author

Opened an issue at google/j2cl#103 regarding future-proofed Unicode category checks, just need J2CL to support Character.getType() and the category constants.

.gitignore Outdated
Comment on lines 17 to 20
/bazel-bin
/bazel-closure-compiler
/bazel-out
/bazel-testlogs
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running ./build_test.sh created bazel output directories that were added to the git tree by default. These lines prevent that.

@@ -901,8 +901,65 @@ private static boolean isIdentifierStart(char ch) {
| (ch >= 0x03B1 & ch <= 0x03C9); // Greek lowercase letters
}

// Check if char is Unicode Category "Combining spacing mark (Mc)"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not compliant (see #3647), per spec UnicodeCombiningMark also includes "Non-spacing mark (Mn)" category, but this addresses #3639 and a decent scope of similar possibilities with as minimal change as possible.

utils/run.sh Outdated
#!/bin/bash
# Run the local build of CC in target/. Make sure to run build.sh before running
# this script.
java -jar target/closure-compiler-1.0-SNAPSHOT.jar $@
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handy for debugging local build.

utils/build.sh Outdated Show resolved Hide resolved
@rishipal rishipal self-assigned this Jul 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Parser error on "combining character" (U+0307)
4 participants