Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Require Unicode 8.0.0 #300

Closed
wants to merge 1 commit into from
Closed

Require Unicode 8.0.0 #300

wants to merge 1 commit into from

Conversation

littledan
Copy link
Member

Interpretation of some basic things like whitespace changed after
Unicode 5.1. This patch requires the latest Unicode standard.

@mathiasbynens
Copy link
Member

@littledan
Copy link
Member Author

A note with respect to that thread: Chrome runs on Windows 7, but it supports Unicode 8.0.0 (with a couple exceptions in V8 using 7.0, but a fix is in progress). I think we should be OK with saying that software which doesn't update to pretty recent Unicode versions isn't implementing the latest ECMAScript spec. @bterlson What do you think, as Microsoft?

@bterlson
Copy link
Member

I am a fan of this personally, although I have concerns with how well Chakra will be able to support this as we depend on platform in many cases. Will dig into this more. In the meantime, this is a simple change and yet is something that is discussed in committee. We can get a quick sign off without going through the normal proposal process I bet.

@bterlson bterlson added the needs consensus This needs committee consensus before it can be eligible to be merged. label Jan 20, 2016
@littledan
Copy link
Member Author

At the January 2016 TC39 meeting, we reached consensus in support of this proposal. Is anything else needed to merge this? I fixed the merge conflict.

@mathiasbynens
Copy link
Member

Relevant meeting notes: https://github.com/rwaldron/tc39-notes/blob/master/es7/2016-01/2016-01-26.md#unicode-fix-httpsgithubcomtc39ecma262pull300-de

It’s probably intentional, but just to make sure this is not being overlooked — this patch leaves the following section intact: https://tc39.github.io/ecma262/#sec-white-space

<p>ECMAScript implementations must recognize as <emu-nt><a href="#prod-WhiteSpace">WhiteSpace</a></emu-nt> code points listed in the “Separator, space” (Zs) category by Unicode 5.1. ECMAScript implementations may also recognize as <emu-nt><a href="#prod-WhiteSpace">WhiteSpace</a></emu-nt> additional category Zs code points from subsequent editions of the Unicode Standard.</p>

i.e. WhiteSpace is still based on Unicode 5.1.0 + Unicode 8 or later, meaning U+180E is considered whitespace. I’m not sure if this is a necessity for backwards compatibility.

@littledan
Copy link
Member Author

Oh, I missed that section. Actually, this patch was partly motivated by getting U+180E out of whitespace! This was all discussed pretty explicitly at the meeting, so I'll upload a new patch with that modified.

@mathiasbynens
Copy link
Member

👍

mathiasbynens added a commit to mathiasbynens/regexpu-fixtures that referenced this pull request Feb 8, 2016
ECMAScript 6 required Unicode v5.1.0 `Zs` symbols to be recognized as whitespace in addition to any `Zs` symbols in whatever Unicode version the engine implemented.

Per tc39/ecma262#300 this is no longer the case in ES2016. 🎉

The only observable change is that U+180E is no longer considered whitespace.
mathiasbynens added a commit to mathiasbynens/regexpu-core that referenced this pull request Feb 8, 2016
ECMAScript 6 required Unicode v5.1.0 `Zs` symbols to be recognized as whitespace in addition to any `Zs` symbols in whatever Unicode version the engine implemented.

Per tc39/ecma262#300 this is no longer the case in ES2016. 🎉

The only observable change is that U+180E is no longer considered whitespace.
Interpretation of some basic things like whitespace changed after
Unicode 5.1. This patch requires the latest Unicode standard.
@bterlson
Copy link
Member

Committed as 24dad16. Thanks @littledan!

@bterlson bterlson closed this Feb 10, 2016
@littledan
Copy link
Member Author

Thanks for the reviews and landing, everyone!

mathiasbynens added a commit to whatwg/javascript that referenced this pull request Feb 11, 2016
It’s now part of the ECMAScript spec: tc39/ecma262#300
mathiasbynens added a commit to whatwg/javascript that referenced this pull request Feb 11, 2016
It’s now part of the ECMAScript spec: tc39/ecma262#300

Closes #28.
dilijev added a commit to dilijev/ChakraCore that referenced this pull request Dec 5, 2016
This character was recategorized from Zs to Cf in Unicode 6.3, and remains categorized as such in Unicode 9.0 (current target version). The current version of ECMAScript requires that only characters classed as Zs in the target version of Unicode be recognized as Whitespace. This change is now reflected in Test262 so making this change will improve our Test262 score rather than regress it.

Additionally, update comments about location of UnicodeData.txt and about the definition of Whitespace characters.

See: tc39/ecma262#300

See: tc39/test262@3a5a09e

See: mathiasbynens/regexpu-core@9b10d2a

Fixes chakra-core#2120
dilijev added a commit to dilijev/ChakraCore that referenced this pull request Dec 5, 2016
This character was recategorized from Zs to Cf in Unicode 6.3, and remains categorized as such in Unicode 9.0 (current target version). The current version of ECMAScript requires that only characters classed as Zs in the target version of Unicode be recognized as Whitespace. This change is now reflected in Test262 so making this change will improve our Test262 score rather than regress it.

Additionally, update comments about location of UnicodeData.txt and about the definition of Whitespace characters.

See: tc39/ecma262#300

See: tc39/test262@3a5a09e

See: mathiasbynens/regexpu-core@9b10d2a

Fixes chakra-core#2120
chakrabot pushed a commit to chakra-core/ChakraCore that referenced this pull request Dec 7, 2016
…Whitespace classification.

Merge pull request #2121 from dilijev:regex-ws

This character was recategorized from Zs to Cf in Unicode 6.3, and remains categorized as such in Unicode 9.0 (current target version). The current version of ECMAScript requires that only characters classed as Zs in the target version of Unicode be recognized as Whitespace. This change is now reflected in Test262 so making this change will improve our Test262 score rather than regress it.

Additionally, update comments about location of UnicodeData.txt and about the definition of Whitespace characters.

See: tc39/ecma262#300

See: tc39/test262@3a5a09e

See: mathiasbynens/regexpu-core@9b10d2a

Fixes #2120
chakrabot pushed a commit to chakra-core/ChakraCore that referenced this pull request Dec 7, 2016
…PARATOR from Whitespace classification.

Merge pull request #2121 from dilijev:regex-ws

This character was recategorized from Zs to Cf in Unicode 6.3, and remains categorized as such in Unicode 9.0 (current target version). The current version of ECMAScript requires that only characters classed as Zs in the target version of Unicode be recognized as Whitespace. This change is now reflected in Test262 so making this change will improve our Test262 score rather than regress it.

Additionally, update comments about location of UnicodeData.txt and about the definition of Whitespace characters.

See: tc39/ecma262#300

See: tc39/test262@3a5a09e

See: mathiasbynens/regexpu-core@9b10d2a

Fixes #2120
lydell added a commit to lydell/js-tokens that referenced this pull request Jan 28, 2018
ES2016 and later require Unicode 8.0, which does not consider U+180E to
be whitespace, as opposed to earlier versions. See:

tc39/ecma262#300

Node.js seem to have incorporated this change some time after 8.1.2.

This commit simply removes the tests that expected U+180E to be
whitespace.
kisg pushed a commit to paul99/webkit-mips that referenced this pull request Nov 9, 2018
https://bugs.webkit.org/show_bug.cgi?id=191415

Reviewed by Saam Barati.

JSTests:

* ChakraCore/test/es5/regexSpace.baseline:
* ChakraCore/test/es6/unicode_whitespace.js:
Update tests to latest version.
(See chakra-core/ChakraCore@7c097b6.)

* test262.yaml:
* test262/config.yaml:
* test262/expectations.yaml:
Update expectations.

Source/JavaScriptCore:

Mongolian Vowel Separator stopped being a valid whitespace character as of ES2016.
(tc39/ecma262#300)

* parser/Lexer.h:
(JSC::Lexer<UChar>::isWhiteSpace):
* runtime/ParseInt.h:
(JSC::isStrWhiteSpace):
* yarr/create_regex_tables:

LayoutTests:

* js/ToNumber-expected.txt:
* js/parseFloat-expected.txt:
* js/script-tests/ToNumber.js:
* js/script-tests/parseFloat.js:
Update tests and expectations.

* sputnik/Conformance/09_Type_Conversion/9.3_ToNumber/9.3.1_ToNumber_from_String/S9.3.1_A2-expected.txt:
* sputnik/Conformance/09_Type_Conversion/9.3_ToNumber/9.3.1_ToNumber_from_String/S9.3.1_A3_T1-expected.txt:
* sputnik/Conformance/09_Type_Conversion/9.3_ToNumber/9.3.1_ToNumber_from_String/S9.3.1_A3_T2-expected.txt:
* sputnik/Conformance/15_Native_Objects/15.10_RegExp/15.10.2/15.10.2.12_CharacterClassEscape/S15.10.2.12_A1_T1-expected.txt:
* sputnik/Conformance/15_Native_Objects/15.10_RegExp/15.10.2/15.10.2.12_CharacterClassEscape/S15.10.2.12_A2_T1-expected.txt:
* sputnik/Conformance/15_Native_Objects/15.1_The_Global_Object/15.1.2/15.1.2.2_parseInt/S15.1.2.2_A2_T10-expected.txt:
* sputnik/Conformance/15_Native_Objects/15.1_The_Global_Object/15.1.2/15.1.2.3_parseFloat/S15.1.2.3_A2_T10-expected.txt:
* sputnik/Unicode/Unicode_410/S15.10.2.12_A1_T6-expected.txt:
* sputnik/Unicode/Unicode_410/S15.10.2.12_A2_T6-expected.txt:
* sputnik/Unicode/Unicode_410/S7.2_A1.6_T1-expected.txt:
* sputnik/Unicode/Unicode_500/S15.10.2.12_A1_T6-expected.txt:
* sputnik/Unicode/Unicode_500/S15.10.2.12_A2_T6-expected.txt:
* sputnik/Unicode/Unicode_500/S7.2_A1.6_T1-expected.txt:
* sputnik/Unicode/Unicode_510/S15.10.2.12_A1_T6-expected.txt:
* sputnik/Unicode/Unicode_510/S15.10.2.12_A2_T6-expected.txt:
* sputnik/Unicode/Unicode_510/S7.2_A1.6_T1-expected.txt:
Let outdated sputnik checks fail.


git-svn-id: http://svn.webkit.org/repository/webkit/trunk@238004 268f45cc-cd09-0410-ab3c-d52691b4dbfc
facebook-github-bot pushed a commit to facebook/flow that referenced this pull request May 7, 2019
Summary:
ES2016 updated the list of whitespace from Unicode 5.1 to Unicode 8. between those versions, U+180e (mongolian vowel separator) was moved from `Zs` to `Cf`, so it is no longer whitespace.

tc39/ecma262#300 (comment)

Reviewed By: samwgoldman

Differential Revision: D15249134

fbshipit-source-id: 228560604fbde3567e86dfd281a141f930b0e347
ljharb added a commit to ljharb/ecma262 that referenced this pull request Sep 16, 2019
 - 2016: the Unicode change affected what was considered whitespace (tc39#300 / 24dad16)
 - 2018: changed tagged template literal objects to be cached per source location rather than per realm (tc39#890)
 - 2019: Atomics.wake was renamed to Atomics.notify (tc39#1220)
 - 2019: `await` was changed to require fewer ticks (tc39#1250)
ljharb added a commit to ljharb/ecma262 that referenced this pull request Sep 16, 2019
 - 2016: the Unicode change affected what was considered whitespace (tc39#300 / 24dad16)
 - 2018: changed tagged template literal objects to be cached per source location rather than per realm (tc39#890)
 - 2019: Atomics.wake was renamed to Atomics.notify (tc39#1220)
 - 2019: `await` was changed to require fewer ticks (tc39#1250)
ljharb added a commit to ljharb/ecma262 that referenced this pull request Sep 17, 2019
 - 2016: the Unicode change affected what was considered whitespace (tc39#300 / 24dad16)
 - 2017: the latest version of Unicode is mandated (tc39#620)
 - 2018: changed tagged template literal objects to be cached per source location rather than per realm (tc39#890)
 - 2019: Atomics.wake was renamed to Atomics.notify (tc39#1220)
 - 2019: `await` was changed to require fewer ticks (tc39#1250)
ljharb added a commit to ljharb/ecma262 that referenced this pull request Oct 1, 2019
 - 2016: the Unicode change affected what was considered whitespace (tc39#300 / 24dad16)
 - 2017: the latest version of Unicode is mandated (tc39#620)
 - 2018: changed tagged template literal objects to be cached per source location rather than per realm (tc39#890)
 - 2019: Atomics.wake was renamed to Atomics.notify (tc39#1220)
 - 2019: `await` was changed to require fewer ticks (tc39#1250)
ljharb added a commit to ljharb/ecma262 that referenced this pull request Oct 17, 2019
…ns (tc39#1698)

 - 2016: the Unicode change affected what was considered whitespace (tc39#300 / 24dad16)
 - 2017: the latest version of Unicode is mandated (tc39#620)
 - 2018: changed tagged template literal objects to be cached per source location rather than per realm (tc39#890)
 - 2019: Atomics.wake was renamed to Atomics.notify (tc39#1220)
 - 2019: `await` was changed to require fewer ticks (tc39#1250)
ljharb added a commit to ljharb/ecma262 that referenced this pull request Oct 17, 2019
…ns (tc39#1698)

 - 2016: the Unicode change affected what was considered whitespace (tc39#300 / 24dad16)
 - 2017: the latest version of Unicode is mandated (tc39#620)
 - 2018: changed tagged template literal objects to be cached per source location rather than per realm (tc39#890)
 - 2019: Atomics.wake was renamed to Atomics.notify (tc39#1220)
 - 2019: `await` was changed to require fewer ticks (tc39#1250)
ljharb added a commit to ljharb/ecma262 that referenced this pull request Oct 18, 2019
…ns (tc39#1698)

 - 2016: the Unicode change affected what was considered whitespace (tc39#300 / 24dad16)
 - 2017: the latest version of Unicode is mandated (tc39#620)
 - 2018: changed tagged template literal objects to be cached per source location rather than per realm (tc39#890)
 - 2019: Atomics.wake was renamed to Atomics.notify (tc39#1220)
 - 2019: `await` was changed to require fewer ticks (tc39#1250)
ksh8281 added a commit to ksh8281/escargot that referenced this pull request Mar 10, 2020
* Treat single `let` as Identifier in parsing ExpressionStatement
* The `let` contextual keyword must not contain Unicode escape sequences.
* U+180E had been changed to `Other, Format [Cf]` from `Separator, Space [Zs]`
  see tc39/ecma262#300, chakra-core/ChakraCore#2120
* The `let` contextual keyword must not contain Unicode escape sequences
* We need to allocate new env on right side of for in-of head when there is lexical decl on left side

Signed-off-by: Seonghyun Kim <[email protected]>
ksh8281 added a commit to ksh8281/escargot that referenced this pull request Mar 12, 2020
* Treat single `let` as Identifier in parsing ExpressionStatement
* The `let` contextual keyword must not contain Unicode escape sequences.
* U+180E had been changed to `Other, Format [Cf]` from `Separator, Space [Zs]`
  see tc39/ecma262#300, chakra-core/ChakraCore#2120
* The `let` contextual keyword must not contain Unicode escape sequences
* We need to allocate new env on right side of for in-of head when there is lexical decl on left side

Signed-off-by: Seonghyun Kim <[email protected]>
bbrto21 pushed a commit to Samsung/escargot that referenced this pull request Mar 12, 2020
* Treat single `let` as Identifier in parsing ExpressionStatement
* The `let` contextual keyword must not contain Unicode escape sequences.
* U+180E had been changed to `Other, Format [Cf]` from `Separator, Space [Zs]`
  see tc39/ecma262#300, chakra-core/ChakraCore#2120
* The `let` contextual keyword must not contain Unicode escape sequences
* We need to allocate new env on right side of for in-of head when there is lexical decl on left side

Signed-off-by: Seonghyun Kim <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs consensus This needs committee consensus before it can be eligible to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants