Restrict codepoints of valid identifiers #5936

stevengj · 2014-02-24T16:04:46Z

As mentioned in #5434, separate from the question of what unicode normalization we should use for identifiers, it would probably be a good idea to restrict the codepoints of valid identifiers. Currently, you can do crazy things like:

julia> ² = 1
1
julia> 2²
2
julia> ２= 1
1
julia> 1 + ２
2
julia> –3 = 3
3
julia> -3 + –3
0

Python 3's valid identifiers provide one possible model.

The text was updated successfully, but these errors were encountered:

stevengj · 2014-02-24T16:11:57Z

Another possible model would be the Fortress language specification, which is fairly detailed (see chapter 5, although it doesn't discuss normalization) and was unburdened by backwards compatibility (unlike Python).

stevengj · 2014-02-24T16:51:14Z

cc: @malmaud, @jiahao

JeffBezanson · 2014-02-24T16:56:27Z

It's starting to look like we need more and more of the ICU library. It would be great to rely on libc and call iswalpha, but (1) some libc implementations are quite far behind the unicode standard, and (2) for some reason this function is locale-dependent. I don't really see how whether a character is a letter should depend on locale...

stevengj · 2014-02-24T16:59:18Z

The utf8proc library will tell us the unicode category of a codepoint, in a locale-independent way.

JeffBezanson · 2014-02-24T17:02:23Z

Excellent.

stevengj · 2014-02-24T22:14:47Z

What character categories do we want to allow in identifiers?

Certainly we want Sm (symbol, math) to be allowed, unlike Python.

As another example, Python does not allow Po (punctuation, other) characters in identifiers. Currently, Julia does, so you can have e.g. x′ as an identifier using the prime character. Do we want to allow this?

StefanKarpinski · 2014-02-24T22:50:05Z

I really like using prime in variable names. However, we probably want to use other mathematical operators as, well, operators. So, I suspect we'll have to go through the math pages and decide on a case-by-case basis whether they should be allowed in identifiers or become operators (and how they should parse).

fixes #6797, fixes #5936

stevengj added the decision label Feb 24, 2014

JeffBezanson added the breaking label Feb 24, 2014

jiahao added the unicode label Feb 24, 2014

JeffBezanson added a commit that referenced this issue May 10, 2014

use unicode character categories to classify identifier characters

569fdb7

fixes #6797, fixes #5936

JeffBezanson mentioned this issue May 10, 2014

RFC: use unicode character categories to classify identifier characters #6805

Merged

JeffBezanson added a commit that referenced this issue May 13, 2014

use unicode character categories to classify identifier characters

e039950

fixes #6797, fixes #5936

JeffBezanson added a commit that referenced this issue May 13, 2014

use unicode character categories to classify identifier characters

5aa2267

fixes #6797, fixes #5936

JeffBezanson added a commit that referenced this issue May 13, 2014

use unicode character categories to classify identifier characters

82e34b6

fixes #6797, fixes #5936

JeffBezanson closed this as completed in #6805 May 13, 2014

jiahao mentioned this issue Sep 11, 2015

Get dict value as a nullable #13055

Open

digital-carver mentioned this issue Apr 14, 2018

Make 𝟏, 𝟎, 𝟙, 𝟘 into valid identifiers for DSLs #26808

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Restrict codepoints of valid identifiers #5936

Restrict codepoints of valid identifiers #5936

stevengj commented Feb 24, 2014

stevengj commented Feb 24, 2014

stevengj commented Feb 24, 2014

JeffBezanson commented Feb 24, 2014

stevengj commented Feb 24, 2014

JeffBezanson commented Feb 24, 2014

stevengj commented Feb 24, 2014

StefanKarpinski commented Feb 24, 2014

Restrict codepoints of valid identifiers #5936

Restrict codepoints of valid identifiers #5936

Comments

stevengj commented Feb 24, 2014

stevengj commented Feb 24, 2014

stevengj commented Feb 24, 2014

JeffBezanson commented Feb 24, 2014

stevengj commented Feb 24, 2014

JeffBezanson commented Feb 24, 2014

stevengj commented Feb 24, 2014

StefanKarpinski commented Feb 24, 2014