-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add Unicode.julia_chartransform Julia-parser normalization #42561
Conversation
Cool. Though is "charmap" the best term for this? It's often used to refer to the full list of all possible characters. Maybe something about "mapping" or "transformation" would avoid confusion? |
Thank you for working on this. |
109d55e
to
f07680f
Compare
@nalimilan, renamed |
CI failures seem unrelated (it's been a long time since I've seen a green CI!), so this PR should be ready for review. |
Co-authored-by: Milan Bouchet-Valat <[email protected]>
Co-authored-by: Milan Bouchet-Valat <[email protected]>
Co-authored-by: Milan Bouchet-Valat <[email protected]>
Co-authored-by: Milan Bouchet-Valat <[email protected]>
0dbc3fb
to
7601943
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some small nits. Merge when you want.
@@ -4,6 +4,50 @@ module Unicode | |||
|
|||
export graphemes, isequal_normalized | |||
|
|||
""" | |||
Unicode.julia_chartransform(c::Union{Char,Integer}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thoughts about naming this something like parsertransform
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, do you have an opinion on my last comment above? #42561 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the callback function is good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have strong feelings on the name…
Co-authored-by: Jameson Nash <[email protected]>
Co-authored-by: Jameson Nash <[email protected]>
Co-authored-by: Jameson Nash <[email protected]>
* Add Unicode NFC normalization of all identifiers * Add Julia-specific normalizations of lookalikes of - ⋅ ε μ Based on the same functionality from the Unicode stdlib in Julia 1.8 JuliaLang/julia#42561
Finally addresses a request by @Keno in #19464 (comment): provides a
Unicode.julia_chartransform
function exporting the custom character normalization used by the Julia parser, which can be used as achartransform
keyword toUnicode.normalize
.(More generally, the user can pass any custom codepoint mapping to
normalize
via thechartransform
keyword.)This should be useful for any package that needs to reproduce the normalization performed by the Julia parser: they can now do
Unicode.normalize(string, compose=true, stable=true, chartransformUnicode.julia_chartransform)
.