AST Format Cleanup #21774

MikeInnes · 2017-05-10T17:18:30Z

Let's discuss cleaning up some kinks in Julia's AST format to make things easier for macro authors.

Specific Issues

In many cases, the parser does too much work, losing information:

:(1(2)) → :(1 * 2)
a ? b : c → if a ...
if a b elseif c d end → if a b else (if c d end) end
f() do ... → f(() -> ...)

I think we should aim for round-trip-ability here and have this work be done only at lowering time. Short function definitions and broadcasting syntax are examples of this being done right.

This is more of a parser issue than an AST issue, but it's not currently possible to write for $cond body end as the parser checks for an x = y iteration specification; instead you must use $(Expr(:for, cond, :body). For the same reason, it's not possible to write @capture(ex, for cond_ body_ end) with MacroTools.

let and for have the opposite ordering for declarations and body.

The existence of both kw and = is strange as it gives us the opposite problem to that above: we have two ways to represent the same surface syntax. Perhaps this is necessary; MacroTools normalises kw to = to avoid surprising failures.

We currently have two ways to represent keyword arguments in calls: f(a, b = c) which has little processing, and f(a; b = c) which becomes f($(Expr(:parameters, Expr(:kw, :a, :b)), a). The latter is one of the few places (let is another) where the ordering/structure of the expression mistmatches that of the surface syntax. I suspect a nicer design is possible but I don't have an exact proposal. A matching pattern like f(a, xs__) currently matches kwargs style and not the other, so at least putting the params at the end of the call consistently would help with that.

Motivation

Losing information about the AST limits what macros can do, and/or adds strange edge cases to what they can do. Consider @static if for example:

@static if foo
  f()
elseif bar
  g()
else
  h()
end

@static if foo
  f()
else
  bar ? g() : h()
end

The fact that we can't tell these two examples apart means that we have to choose a surprising behaviour for one of them.

As another example, due to the parsing of 1(2) you can't use DataFlow.jl's syntax for graphs if the graph contains literal numbers. This isn't a crucial use case, but it illustrates the kind of surprising edge cases this leads to, even for innocuous-looking transformations.

Concerns

I'm not sure to what extent this can be done after 1.0. What levels of AST are part of Julia's public API? People will be relying on it in practice, although the users should be few enough and expert enough to alleviate the impact.

This adds some work for macro authors similar to that caused by short/long-form function definitions, in cases where the difference doesn't matter. This should be very manageable, as it's easy to provide utility functions that do various parts of normalisation (as MacroTools does already for longdef and shortdef).

Let me know if I've missed anything. Input from other macro writers welcome.

The text was updated successfully, but these errors were encountered:

JeffBezanson · 2017-05-10T18:14:45Z

I had assumed it would be easier to write macros if parsing normalized things to some extent, but if people feel that hasn't been the case then ok.

I agree we should try to get rid of the kw head and only use =.

elseif, ?, and do should be pretty easy to change.

I assume the alternative parsing for 1(2) would be as a call. However given how common call nodes are, it seems extremely taxing to check every call to see if the first argument is a numeric literal, in which case you have to know that it will be a multiply instead. We could eliminate implicit multiplication with parens, but then we'd be left with 2(...) as wasted syntax.

tkelman · 2017-05-10T19:45:02Z

I like these being pure sugar and not having to deal with them as separate cases in macros. A tokenizer should be able to round trip these things, but macros shouldn't need to worry about what are semantically just formatting differences in their input.

MikeInnes · 2017-05-10T20:28:18Z

@JeffBezanson glad to hear that you're on board.

it seems extremely taxing to check every call to see if the first argument is a numeric literal

Assuming you're talking about the burden on macro writers, it shouldn't be that bad. It's a trivial utility function in fact (and this generalises to all the discussed changes):

lower_muls(ex) = postwalk(x -> @capture(x, n_Number(y_)) ? :($n*$y) : x, ex)

Then just call lower_muls once and you get the same AST you would've before. Of course, most macros won't have to do this, in the same way that most macros don't currently have to normalise function definitions or broadcast syntax.

I think it's worth emphasising that macros have to deal with these kinds of things already; this just reduces the number of awkward special cases. I don't expect this to increase the burden on macros (or I wouldn't propose it) so if anyone has concerns about that I'm happy to see examples and work it through.

macros shouldn't need to worry about what are semantically just formatting differences

They're only "semantically just formatting differences" if you aren't changing the semantics, which is exactly what things like @static if do.

JeffBezanson · 2017-05-10T20:36:35Z

Yichao had a good point about kw: #20000 (comment)
It avoids changing the meaning of an expression based on the context it's spliced into.

tkelman · 2017-05-11T10:02:34Z

if you aren't changing the semantics, which is exactly what things like @static if do.

@static if doesn't change that the semantics of if and the ternary operator are completely interchangeable - the fact that they're treated exactly the same in the AST means @static was easier to write in a way that works on ternary expressions with no extra effort.

StefanKarpinski · 2017-05-12T04:53:46Z

I would point out that the round-tripping issue could be addressed by choosing which form to print with based on properties of the expressions. A conditional with multiple statements has to be an if/else; a conditional with single-expression branches can be either one, if the expressions are short enough, we can format them as a ternary operator.

ZacLN · 2017-05-12T07:40:21Z

There are quite a few more ambiguous cases , e.g.
import A: b, c vs import A.b, A.c
or
a ? (b;c) : d
vs

if a 
    b
    c
else
    d
end

bramtayl · 2017-05-23T04:13:19Z

A couple more things to consider:

(a...; b...) -> f(a...; b...)

gets parsed confusingly as

begin
    a...
    b...
end -> f(a...; b...)

type a
    b
    c
end

shouldn't get parsed with a begin block IMHO because b and c are not executed sequentially.

I've been using NumberedLines.jl for working with numbered lines but I'd argue that at least some of its features should be included in base. Having a NumberedLine type (or AST node) which includes both a line number and expression would be I think both much safer and much more intuitive.

yuyichao · 2017-05-23T04:40:33Z

Parsing code blocks as Expr(:block) should not be a problem here.

It doesn't affect round trip at all.
The printing of the (a..., b...) -> ... could be improved, but not necessarily the parsing
Expr(:block) does not imply evaluation so not executed sequentially is not a good argument for parsing differently as long as there's no ambiguity from the context. By this argument
- begin end in quote shouldn't be parsed as Expr(:block) since it's not evaluted, in fact nothing should parse in quote since none of them are evaluated as usual.
- The first arg of function f() end shouldn't be parsed as a call either since it's not evaluated.
- (a, b) -> 2 the argument list shouldn't be parsed as tuple since it's not evaluated.
- let a = 1, b = 2, c::Int end should not use Expr(:block) for the var list since they are not evaluated with the same rule as a normal block (in fact, this block uses a similar rule to a struct definition).
None of the cases above has any ambiguity and there's no reason to create many more expression types for people to handle in any of these cases.

bramtayl · 2017-06-22T19:17:57Z

One more is the arguments to for, which hopefully could be similar to the arguments to :generator

e = quote
    result = 0
    for i = 1:10, j = 11:20
        result = result + i + j
    end
    (i + j for i = 1:10, j = 11:20)
end
e.args[4].args
e.args[6].args

part of #21774

part of JuliaLang#21774

parse `let` the same as `for`. part of #21774

JeffBezanson · 2017-09-14T15:50:00Z

Status of this:

elseif and let have been fixed
Seems we are leaning against changing kw vs. =
I don't really want to parse 1(2) as a call, but perhaps there should be an expression head for juxtaposition?
? and do are still up for debate

Actual round-trip-ability is not on the table here; CSTParser handles that very well and moving fully to that is way out of scope for 1.0.

JeffBezanson · 2017-09-14T16:31:55Z

From triage: preserving do sounds good, that's next on the list.

JeffBezanson · 2017-09-21T18:05:45Z

Please mention more specific examples of weird ASTs if anybody finds any, but otherwise I think we're done here.

MikeInnes · 2017-09-22T10:34:08Z

Awesome. Do I take it we are not in favour of preserving ?, or of delaying for $cond end errors later in the pipeline?

JeffBezanson added breaking This change will break code parser Language parsing and surface syntax labels May 10, 2017

JeffBezanson added this to the 1.0 milestone May 10, 2017

ihnorton mentioned this issue Jun 27, 2017

Round-trippable representation of source code JuliaLang/JuliaParser.jl#22

Closed

StefanKarpinski assigned JeffBezanson Jul 6, 2017

JeffBezanson added a commit that referenced this issue Jul 31, 2017

preserve more syntax in parsing ?, do, and elseif

16cf6d8

part of #21774

JeffBezanson mentioned this issue Jul 31, 2017

RFC: preserve more syntax in parsing ?, do, and elseif #23061

Closed

JeffBezanson added a commit that referenced this issue Aug 3, 2017

preserve elseif syntax in parser

89fc930

part of #21774

JeffBezanson mentioned this issue Aug 3, 2017

preserve elseif syntax in parser #23122

Merged

JeffBezanson added a commit that referenced this issue Aug 4, 2017

preserve elseif syntax in parser

e13ac1f

part of #21774

pkofod pushed a commit to pkofod/julia that referenced this issue Aug 5, 2017

preserve elseif syntax in parser

1fdb84f

part of JuliaLang#21774

JeffBezanson added a commit that referenced this issue Sep 1, 2017

parse let the same as for. part of #21774

8f60df5

JeffBezanson added a commit that referenced this issue Sep 1, 2017

parse let the same as for. part of #21774

29370a1

JeffBezanson added a commit that referenced this issue Sep 2, 2017

parse let the same as for. part of #21774

a7890f6

JeffBezanson added a commit that referenced this issue Sep 4, 2017

parse let the same as for. part of #21774

dbd1574

JeffBezanson added a commit that referenced this issue Sep 5, 2017

Merge pull request #23553 from JuliaLang/jb/parselet

2bdcd5f

parse `let` the same as `for`. part of #21774

JeffBezanson added the triage This should be discussed on a triage call label Sep 14, 2017

JeffBezanson removed the triage This should be discussed on a triage call label Sep 14, 2017

JeffBezanson added a commit that referenced this issue Sep 14, 2017

preserve do syntax in AST. part of #21774

f0d8480

JeffBezanson mentioned this issue Sep 14, 2017

preserve do syntax in AST #23718

Merged

Keno added the triage This should be discussed on a triage call label Sep 20, 2017

JeffBezanson closed this as completed Sep 21, 2017

JeffBezanson removed the triage This should be discussed on a triage call label Sep 21, 2017

rdeits mentioned this issue Sep 21, 2017

fix @manipulate hygiene and add a test JuliaGizmos/Interact.jl#185

Merged

JeffBezanson added a commit that referenced this issue Dec 19, 2017

preserve do syntax in AST. part of #21774

4439f6c

JeffBezanson added a commit that referenced this issue Dec 20, 2017

preserve do syntax in AST. part of #21774 (#23718)

2d45ef5

JeffBezanson mentioned this issue Dec 23, 2017

preserve structure of using and import statements in parser #25256

Merged

JeffBezanson mentioned this issue Feb 22, 2018

RFC: fix #26137, change parsing of unary ops on parenthesized expressions #26154

Merged

epatters mentioned this issue Aug 11, 2018

Broken in Julia v0.7 JuliaServices/Match.jl#47

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AST Format Cleanup #21774

AST Format Cleanup #21774

MikeInnes commented May 10, 2017

JeffBezanson commented May 10, 2017

tkelman commented May 10, 2017

MikeInnes commented May 10, 2017

JeffBezanson commented May 10, 2017

tkelman commented May 11, 2017 •

edited

Loading

StefanKarpinski commented May 12, 2017

ZacLN commented May 12, 2017

bramtayl commented May 23, 2017

yuyichao commented May 23, 2017

bramtayl commented Jun 22, 2017

JeffBezanson commented Sep 14, 2017

JeffBezanson commented Sep 14, 2017

JeffBezanson commented Sep 21, 2017

MikeInnes commented Sep 22, 2017

AST Format Cleanup #21774

AST Format Cleanup #21774

Comments

MikeInnes commented May 10, 2017

Specific Issues

Motivation

Concerns

JeffBezanson commented May 10, 2017

tkelman commented May 10, 2017

MikeInnes commented May 10, 2017

JeffBezanson commented May 10, 2017

tkelman commented May 11, 2017 • edited Loading

StefanKarpinski commented May 12, 2017

ZacLN commented May 12, 2017

bramtayl commented May 23, 2017

yuyichao commented May 23, 2017

bramtayl commented Jun 22, 2017

JeffBezanson commented Sep 14, 2017

JeffBezanson commented Sep 14, 2017

JeffBezanson commented Sep 21, 2017

MikeInnes commented Sep 22, 2017

tkelman commented May 11, 2017 •

edited

Loading