You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
One label matches another just in case their normalized forms are equal. To normalize a label, strip off the opening and closing brackets, perform the Unicode case fold, [...]
"Perform the Unicode case fold" is a bit unclear – in the sense I had to consult cmark to see what it was doing. If I understood correctly this is definition R4 of the Unicode standard p. 154. so maybe that could be referenced
P.S. A better definition would likely have been R5 as it would handle correctly identifiers in different normal forms (like é composed in one id and é decomposed in another one) but you'd need to import the Unicode normalization and associated machinery into the definition of CommonMark.
The text was updated successfully, but these errors were encountered:
I'm currently bringing Comrak up to speed on the changes to CommonMark since GFM was rebased on it, and I hit some difficulty here too, since "Unicode case fold" has no precise meaning.
I might end up imitating the mechanism used in cmark directly (generating code based on CaseFolding-x.0.0.txt) since every Unicode library out there supports a slightly different set of things.
kivikakk
added a commit
to kivikakk/comrak
that referenced
this issue
Jul 10, 2024
We add `caseless` to do the folding. It matches upstream enough [^1],
unlike e.g. ICU4X's `CaseMapper` (doesn't fold Eszett to "ss"), and also
unlike ICU4X, it doesn't require us to bump our MSRV. 2/2 sgtm
A separate `--gfm-quirks` CLI option is added since base tests fail if
we just turn on all of GFM for them.
The nice thing about `caseless` is that while its last release may be
6 years ago, it depends on unicode-normalization ^0.1, the latest of
which is 5 months ago. It's also [very easy to read][caseless], so I'm
all good with this.
[^1] Not that straightforward: commonmark/commonmark-spec#695
[caseless]: https://github.com/unicode-rs/rust-caseless/blob/v0.2.1/src/lib.rs
In the 0.30 spec we have:
"Perform the Unicode case fold" is a bit unclear – in the sense I had to consult
cmark
to see what it was doing. If I understood correctly this is definition R4 of the Unicode standard p. 154. so maybe that could be referencedP.S. A better definition would likely have been
R5
as it would handle correctly identifiers in different normal forms (like é composed in one id and é decomposed in another one) but you'd need to import the Unicode normalization and associated machinery into the definition of CommonMark.The text was updated successfully, but these errors were encountered: