Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

“Invalid Unicode code points” in numeric character references #614

Open
wooorm opened this issue Oct 1, 2019 · 3 comments
Open

“Invalid Unicode code points” in numeric character references #614

wooorm opened this issue Oct 1, 2019 · 3 comments

Comments

@wooorm
Copy link
Contributor

wooorm commented Oct 1, 2019

Decimal numeric character references references “Invalid Unicode code points”, but nowhere is it defined what those are.

Hexadecimal numeric character references do not mention this limitation, but I guess imply it (with “They too are parsed as the corresponding Unicode character”).

The HTML spec defines several limitations on numerical character references: https://html.spec.whatwg.org/multipage/parsing.html#numeric-character-reference-end-state, so I’m guessing some or all of that applies to CM as well.

However, HTML defines that some “invalid” references map to other characters (the table at the bottom of the linked section).


Why mention code points instead of characters? Is it just surrogates?

@jgm
Copy link
Member

jgm commented Oct 2, 2019

@wooorm
Copy link
Contributor Author

wooorm commented Oct 2, 2019

Thanks! That’s a good read but a) doesn’t answer the “invalid code points” part, and b) the CM spec already defines “A character is a Unicode code point [...] all code points count as characters for purposes of this spec”, so I’m not sure why not to use the word “character” in references.

@nwellnhof
Copy link
Contributor

The invalid code points are

  • U+0000, this is mentioned explicitly
  • Surrogates
  • Code points larger than 0x10FFFF

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants