Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code points, scalar values, and validity #778

Open
dpk opened this issue Sep 29, 2024 · 1 comment
Open

Code points, scalar values, and validity #778

dpk opened this issue Sep 29, 2024 · 1 comment

Comments

@dpk
Copy link

dpk commented Sep 29, 2024

  • A character is defined as a ‘Unicode code point’. This means (unpaired) surrogates are allowed in input and, by implication, in output. If this is not intended (which is what I glean from the answer to “Invalid Unicode code points” in numeric character references #614) the definition should be changed to ‘Unicode scalar value’. Changing ‘invalid Unicode code points’ to ‘invalid Unicode scalar values’ would also resolve “Invalid Unicode code points” in numeric character references #614.

  • It is not explicitly stated that every possible sequence of Unicode scalar values (or code points?) is a valid CommonMark input text for which some HTML output must be produced, although I also believe that this is the intention. If so, it should be made explicit that a processor which fails to parse any input document is non-conforming.

@dbuenzli
Copy link

See also #369

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants