-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: allow operator suffixes — combining characters and primes #22089
Conversation
It would also be possible to support a whitelist of superscripts and subscripts. For example, if we wanted Unicode already has a couple of oddball examples of such operators, e.g. U+2a27 is |
Great! |
NEWS.md
Outdated
@@ -4,6 +4,14 @@ Julia v0.7.0 Release Notes | |||
New language features | |||
--------------------- | |||
|
|||
* `getpeername` on a `TCPSocket` returns the address and port of the remote | |||
endpoint of the TCP connection ([#21825]). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bad merge resolution
(Still working on some fixes to this PR. In particular, I'm updating it to blacklist many syntactic "operators" like |
(The annoyance with supporting superscripts and subscripts is that these codepoints are scattered all over unicode; the only way to detect them is to make a manual table.) |
This seems to be the list of the 93 Latin/Greek/math super/subscripts in Unicode, sorted by codepoint: |
Rebased. Tests were green before the rebase, so it should be good to squash+merge once others approve. |
@StefanKarpinski, any chance of a decision on this? |
Seems like everyone agrees with this change in principle, and it is just a matter of approving the implementation. @JeffBezanson? |
Rebased and fixed whitespace problem introduced by last merge. |
@@ -54,9 +58,25 @@ | |||
(lambda (x) | |||
(has? t x)))))) | |||
|
|||
; only allow/strip suffixes for some operators | |||
(define no-suffix? (Set (append prec-assignment prec-conditional prec-lazy-or prec-lazy-and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be a whitelist instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was a bit easier as a blacklist because otherwise I'd have to split e.g. prec-arrow
into a couple of separate lists rather than just explicitly listing -- --> ->
here.
@@ -68,7 +88,9 @@ | |||
(pushprec (cdr L) (+ prec 1))))) | |||
(pushprec (map eval prec-names) 1) | |||
t)) | |||
(define (operator-precedence op) (get prec-table op 0)) | |||
(define (operator-precedence op) (get prec-table | |||
(maybe-strip-op-suffix op) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Testing for operators can be pretty important for performance. It would be nice to only call maybe-strip-op-suffix
for precedence levels that support it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was hoping that maybe-strip-op-suffix
would be fast enough, because in the common case strip-op-suffix
(implemented in C) is a no-op and no-suffix?
isn't even called.
Is there any way to benchmark the impact of this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try parsing e.g. string(:[$((:(a+b) for i=1:10000)...)])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried benchmarking parse(s)
for s = string(:[$((:(a+b) for i=1:10000)...)])
, and removing the maybe-strip-op-suffix
call from operator-precedence
makes no detectable difference on my machine, so operator-precedence
doesn't seem to be a problem.
However, there seems to be about an 8% slowdown overall in that benchmark compared to before this PR, so there must be something else in this PR that is the culprit.
test/parse.jl
Outdated
@test parse("3 +⁽¹⁾ 4") == Expr(:call, :+⁽¹⁾, 3, 4) | ||
@test parse("3 +₍₀₎ 4") == Expr(:call, :+₍₀₎, 3, 4) | ||
@test Base.operator_precedence(:+̂) == Base.operator_precedence(:+) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should add some cases of suffixes on operators that don't allow them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do. Done.
This PR causes a slight slowdown in the parser. In particular, replacing Is this a concern? Any suggestions? |
Yes, I think we should try using |
Ah, just saw your optimization. That looks good. How much does it help? |
@JeffBezanson, unfortunately, that optimization hardly makes a difference (7% slowdown instead of 8%). |
I don't really have a good handle on performance optimization for flisp. |
Any thoughts on how I can further speed up parsing? Or whether we should just swallow the 5% parsing slowdown on realistic code and worry about parser optimization later? |
I'm all for swallowing the 5% slowdown for this. |
Should be ready to merge if we decide we want it. |
src/julia-parser.scm
Outdated
(define (maybe-strip-op-suffix op) | ||
(if (symbol? op) | ||
(let ((op_ (strip-op-suffix op))) | ||
(if (or (eqv? op op_) (no-suffix? op_)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use eq?
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. (Makes < 1% difference in the benchmark.)
(let ((S (Set l))) | ||
(if (every no-suffix? l) | ||
S ; suffixes not allowed for anything in l | ||
(lambda (op) (S (maybe-strip-op-suffix op)))))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe try splitting l
into operators that do/don't support suffixes, and testing (or (no-suff-set op) (suff-set (maybe-strip-op-suffix op)))
(depending on which of the sets are non-empty of course).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried this; it makes < 1% difference in the benchmark.
Whoever merges, please remember to squash! |
Looks like an unrelated stalled build on Travis. |
doc/src/manual/variables.md
Outdated
@@ -96,7 +96,7 @@ Operators like `+` are also valid identifiers, but are parsed specially. In some | |||
can be used just like variables; for example `(+)` refers to the addition function, and `(+) = f` | |||
will reassign it. Most of the Unicode infix operators (in category Sm), such as `⊕`, are parsed | |||
as infix operators and are available for user-defined methods (e.g. you can use `const ⊗ = kron` | |||
to define `⊗` as an infix Kronecker product). | |||
to define `⊗` as an infix Kronecker product). Operators can also be suffixed with modifying marks, primes, and sub/superscripts, e.g. `+̂ₐ″` is parsed as an infix operator with the same precedence as `+`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be good to conform to the line length convention of the rest of the file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
Bump. Okay to squash/merge? |
This PR implements something that I had been hoping to do for a while (see #6929 (comment)): custom operators can be defined by appending Unicode combining characters, primes, and sub/superscripts to other operators.
For example,
+̂
and+″
are now parsed as binary operators with the same precedence as+
.Rationale: this allows you to define an operator that is clearly a "modified +" (etc.) without having to dig through Unicode for some vaguely appropriate symbol, and without overriding
+
itself. Also, it is pretty inconceivable that+̂
could be anything other than an infix operator, so it is a choice between supporting it or giving an error, and I don't see why an error would be useful.(Note: combining characters with operators, e.g.
+̂
, don't show up properly in some fonts. It should look like)