Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AST Format Cleanup #21774

Closed
MikeInnes opened this issue May 10, 2017 · 14 comments
Closed

AST Format Cleanup #21774

MikeInnes opened this issue May 10, 2017 · 14 comments
Assignees
Labels
breaking This change will break code parser Language parsing and surface syntax
Milestone

Comments

@MikeInnes
Copy link
Member

Let's discuss cleaning up some kinks in Julia's AST format to make things easier for macro authors.

Specific Issues

In many cases, the parser does too much work, losing information:

  • :(1(2)):(1 * 2)
  • a ? b : cif a ...
  • if a b elseif c d endif a b else (if c d end) end
  • f() do ...f(() -> ...)

I think we should aim for round-trip-ability here and have this work be done only at lowering time. Short function definitions and broadcasting syntax are examples of this being done right.

This is more of a parser issue than an AST issue, but it's not currently possible to write for $cond body end as the parser checks for an x = y iteration specification; instead you must use $(Expr(:for, cond, :body). For the same reason, it's not possible to write @capture(ex, for cond_ body_ end) with MacroTools.

let and for have the opposite ordering for declarations and body.

The existence of both kw and = is strange as it gives us the opposite problem to that above: we have two ways to represent the same surface syntax. Perhaps this is necessary; MacroTools normalises kw to = to avoid surprising failures.

We currently have two ways to represent keyword arguments in calls: f(a, b = c) which has little processing, and f(a; b = c) which becomes f($(Expr(:parameters, Expr(:kw, :a, :b)), a). The latter is one of the few places (let is another) where the ordering/structure of the expression mistmatches that of the surface syntax. I suspect a nicer design is possible but I don't have an exact proposal. A matching pattern like f(a, xs__) currently matches kwargs style and not the other, so at least putting the params at the end of the call consistently would help with that.

Motivation

Losing information about the AST limits what macros can do, and/or adds strange edge cases to what they can do. Consider @static if for example:

@static if foo
  f()
elseif bar
  g()
else
  h()
end

@static if foo
  f()
else
  bar ? g() : h()
end

The fact that we can't tell these two examples apart means that we have to choose a surprising behaviour for one of them.

As another example, due to the parsing of 1(2) you can't use DataFlow.jl's syntax for graphs if the graph contains literal numbers. This isn't a crucial use case, but it illustrates the kind of surprising edge cases this leads to, even for innocuous-looking transformations.

Concerns

I'm not sure to what extent this can be done after 1.0. What levels of AST are part of Julia's public API? People will be relying on it in practice, although the users should be few enough and expert enough to alleviate the impact.

This adds some work for macro authors similar to that caused by short/long-form function definitions, in cases where the difference doesn't matter. This should be very manageable, as it's easy to provide utility functions that do various parts of normalisation (as MacroTools does already for longdef and shortdef).

Let me know if I've missed anything. Input from other macro writers welcome.

@JeffBezanson JeffBezanson added breaking This change will break code parser Language parsing and surface syntax labels May 10, 2017
@JeffBezanson JeffBezanson added this to the 1.0 milestone May 10, 2017
@JeffBezanson
Copy link
Sponsor Member

I had assumed it would be easier to write macros if parsing normalized things to some extent, but if people feel that hasn't been the case then ok.

I agree we should try to get rid of the kw head and only use =.

elseif, ?, and do should be pretty easy to change.

I assume the alternative parsing for 1(2) would be as a call. However given how common call nodes are, it seems extremely taxing to check every call to see if the first argument is a numeric literal, in which case you have to know that it will be a multiply instead. We could eliminate implicit multiplication with parens, but then we'd be left with 2(...) as wasted syntax.

@tkelman
Copy link
Contributor

tkelman commented May 10, 2017

I like these being pure sugar and not having to deal with them as separate cases in macros. A tokenizer should be able to round trip these things, but macros shouldn't need to worry about what are semantically just formatting differences in their input.

@MikeInnes
Copy link
Member Author

@JeffBezanson glad to hear that you're on board.

it seems extremely taxing to check every call to see if the first argument is a numeric literal

Assuming you're talking about the burden on macro writers, it shouldn't be that bad. It's a trivial utility function in fact (and this generalises to all the discussed changes):

lower_muls(ex) = postwalk(x -> @capture(x, n_Number(y_)) ? :($n*$y) : x, ex)

Then just call lower_muls once and you get the same AST you would've before. Of course, most macros won't have to do this, in the same way that most macros don't currently have to normalise function definitions or broadcast syntax.

I think it's worth emphasising that macros have to deal with these kinds of things already; this just reduces the number of awkward special cases. I don't expect this to increase the burden on macros (or I wouldn't propose it) so if anyone has concerns about that I'm happy to see examples and work it through.

macros shouldn't need to worry about what are semantically just formatting differences

They're only "semantically just formatting differences" if you aren't changing the semantics, which is exactly what things like @static if do.

@JeffBezanson
Copy link
Sponsor Member

Yichao had a good point about kw: #20000 (comment)
It avoids changing the meaning of an expression based on the context it's spliced into.

@tkelman
Copy link
Contributor

tkelman commented May 11, 2017

if you aren't changing the semantics, which is exactly what things like @static if do.

@static if doesn't change that the semantics of if and the ternary operator are completely interchangeable - the fact that they're treated exactly the same in the AST means @static was easier to write in a way that works on ternary expressions with no extra effort.

@StefanKarpinski
Copy link
Sponsor Member

I would point out that the round-tripping issue could be addressed by choosing which form to print with based on properties of the expressions. A conditional with multiple statements has to be an if/else; a conditional with single-expression branches can be either one, if the expressions are short enough, we can format them as a ternary operator.

@ZacLN
Copy link
Contributor

ZacLN commented May 12, 2017

There are quite a few more ambiguous cases , e.g.
import A: b, c vs import A.b, A.c
or
a ? (b;c) : d
vs

if a 
    b
    c
else
    d
end

@bramtayl
Copy link
Contributor

A couple more things to consider:

(a...; b...) -> f(a...; b...)

gets parsed confusingly as

begin
    a...
    b...
end -> f(a...; b...)
type a
    b
    c
end

shouldn't get parsed with a begin block IMHO because b and c are not executed sequentially.

I've been using NumberedLines.jl for working with numbered lines but I'd argue that at least some of its features should be included in base. Having a NumberedLine type (or AST node) which includes both a line number and expression would be I think both much safer and much more intuitive.

@yuyichao
Copy link
Contributor

Parsing code blocks as Expr(:block) should not be a problem here.

  1. It doesn't affect round trip at all.

  2. The printing of the (a..., b...) -> ... could be improved, but not necessarily the parsing

  3. Expr(:block) does not imply evaluation so not executed sequentially is not a good argument for parsing differently as long as there's no ambiguity from the context. By this argument

    • begin end in quote shouldn't be parsed as Expr(:block) since it's not evaluted, in fact nothing should parse in quote since none of them are evaluated as usual.
    • The first arg of function f() end shouldn't be parsed as a call either since it's not evaluated.
    • (a, b) -> 2 the argument list shouldn't be parsed as tuple since it's not evaluated.
    • let a = 1, b = 2, c::Int end should not use Expr(:block) for the var list since they are not evaluated with the same rule as a normal block (in fact, this block uses a similar rule to a struct definition).

    None of the cases above has any ambiguity and there's no reason to create many more expression types for people to handle in any of these cases.

@bramtayl
Copy link
Contributor

One more is the arguments to for, which hopefully could be similar to the arguments to :generator

e = quote
    result = 0
    for i = 1:10, j = 11:20
        result = result + i + j
    end
    (i + j for i = 1:10, j = 11:20)
end
e.args[4].args
e.args[6].args

JeffBezanson added a commit that referenced this issue Aug 3, 2017
JeffBezanson added a commit that referenced this issue Aug 4, 2017
pkofod pushed a commit to pkofod/julia that referenced this issue Aug 5, 2017
JeffBezanson added a commit that referenced this issue Sep 1, 2017
JeffBezanson added a commit that referenced this issue Sep 1, 2017
JeffBezanson added a commit that referenced this issue Sep 2, 2017
JeffBezanson added a commit that referenced this issue Sep 4, 2017
JeffBezanson added a commit that referenced this issue Sep 5, 2017
parse `let` the same as `for`. part of #21774
@JeffBezanson
Copy link
Sponsor Member

Status of this:

  • elseif and let have been fixed
  • Seems we are leaning against changing kw vs. =
  • I don't really want to parse 1(2) as a call, but perhaps there should be an expression head for juxtaposition?
  • ? and do are still up for debate

Actual round-trip-ability is not on the table here; CSTParser handles that very well and moving fully to that is way out of scope for 1.0.

@JeffBezanson JeffBezanson added the triage This should be discussed on a triage call label Sep 14, 2017
@JeffBezanson
Copy link
Sponsor Member

From triage: preserving do sounds good, that's next on the list.

@JeffBezanson JeffBezanson removed the triage This should be discussed on a triage call label Sep 14, 2017
@Keno Keno added the triage This should be discussed on a triage call label Sep 20, 2017
@JeffBezanson
Copy link
Sponsor Member

Please mention more specific examples of weird ASTs if anybody finds any, but otherwise I think we're done here.

@JeffBezanson JeffBezanson removed the triage This should be discussed on a triage call label Sep 21, 2017
@MikeInnes
Copy link
Member Author

Awesome. Do I take it we are not in favour of preserving ?, or of delaying for $cond end errors later in the pipeline?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking This change will break code parser Language parsing and surface syntax
Projects
None yet
Development

No branches or pull requests

8 participants