Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: generator expressions #14848

Merged
merged 3 commits into from
Feb 19, 2016
Merged

RFC: generator expressions #14848

merged 3 commits into from
Feb 19, 2016

Conversation

JeffBezanson
Copy link
Member

I realized a good approach to comprehensions would be to add the generalized syntax (#4470) first, and once we are happy with how it works then use it to implement comprehensions.

Example:

julia> sqrt(6sum(1/i^2 for i=1:1000000))
3.1415916986605086

@JeffBezanson JeffBezanson added needs tests Unit tests are required for this change needs docs Documentation for this change is required labels Jan 29, 2016
@StefanKarpinski
Copy link
Member

+1 to the approach – allows getting the feature onto master quickly and with minimal disruption.

@mschauer
Copy link
Contributor

I see how the product only needs a hidden _size now. Both generators could probably have length and the 1-d also size.

@hayd
Copy link
Member

hayd commented Jan 30, 2016

This is fantastic ...but greedy for predicates! 🚶 :)

@JeffBezanson
Copy link
Member Author

I'm thinking about how N-d generators should work, in particular whether the generator or the inner iterator should determine the result shape, and how this affects syntax.

There are 4 cases, based on whether the desired result is 1-d or n-d, and whether you want to use N variables for N dimensions, or refer to the index tuple as a whole.

Here's what the 4 cases look like if the type of generator determines the result shape. I use the following definitions:

spread(f) = (arg)->f(arg...)
gather(f) = (arg...)->f(arg)
d # v syntax structure
1d 1 _ for I in product(X,Y) Generator(I->_, Product(X,Y))
1d n _ for (i,j) in product(X,Y) Generator(spread((i,j)->_), Product(X,Y))
nd 1 _ for I... in product(X,Y) GeneratorND(gather(I->_), Product(X,Y))
nd n _ for i in X, j in Y GeneratorND((i,j)->_, Product(X,Y))

We don't have the syntax in the 3rd row, and it seems a bit doubtful. Here's how the last 2 rows change if iterator type determines shape:

d # v syntax structure
nd 1 _ for I in cartesian(X,Y) Generator(I->_, Cartesian(X,Y))
nd n _ for i in X, j in Y Generator(spread((i,j)->_), Cartesian(X,Y))

This seems to be much more symmetrical, so I conclude that the inside iterator needs to determine the result shape, and that we want only one type of Generator. Naturally, that's not what I implemented here :)

cc @timholy

@JeffBezanson
Copy link
Member Author

Also, that analysis implies that something like [ 2x for x in rand(5,5) ] should give a 2-d array, which is a breaking change. It seems like an improvement to me though.

@mschauer
Copy link
Contributor

mschauer commented Feb 9, 2016

iterator type determines shape

Yes, that seems to be the natural choice for a language in which arrays are first class citizens.

@timholy
Copy link
Member

timholy commented Feb 9, 2016

I also think the iterator type should determine the shape, and I like the proposed (breaking) change.

I haven't followed recent changes as carefully as I'd like, but what's the product/cartesian distinction? Is product a "vector-producing" cartesian iterator and cartesian a "shape-preserving iterator"? These seem like extremes of a general principle, that you should be able to call reshape on an iterator and have it determine the output shape. That is, cartesian(X,Y,...) expresses "input arguments" to your generator, but the output shape is still configurable. Analogous to this (EDITed to use the nice product syntax in 0.5):

julia> reshape(collect(Base.product(1:3,1:4,1:3)), (12,3))
12x3 Array{Tuple{Int64,Int64,Int64},2}:
 (1,1,1)  (1,1,2)  (1,1,3)
 (2,1,1)  (2,1,2)  (2,1,3)
 (3,1,1)  (3,1,2)  (3,1,3)
 (1,2,1)  (1,2,2)  (1,2,3)
 (2,2,1)  (2,2,2)  (2,2,3)
 (3,2,1)  (3,2,2)  (3,2,3)
 (1,3,1)  (1,3,2)  (1,3,3)
 (2,3,1)  (2,3,2)  (2,3,3)
 (3,3,1)  (3,3,2)  (3,3,3)
 (1,4,1)  (1,4,2)  (1,4,3)
 (2,4,1)  (2,4,2)  (2,4,3)
 (3,4,1)  (3,4,2)  (3,4,3)

With this line of thinking, your first two cases could be written

_ for I in vec(cartesian(X,Y))
_ for (i,j) in vec(cartesian(X,Y))

@mschauer
Copy link
Contributor

mschauer commented Feb 9, 2016

The two perspectives can be unified if the shape of the iterator only materializes within [ ]-expressions for comprehensions and f[ ] for maps which should return an array and not a flattened object.

@mbauman
Copy link
Member

mbauman commented Feb 9, 2016

If Generators are iterators with a shape… and that shape is respected by concatenations and comprehensions, etc., then it makes sense that an iterators shape should be respected by generators themselves, too.

Are there other functions or syntaxes that should respect iterator shape? collect?

Also… in a somewhat frustrating twist… the pending concatenation change makes the generator comprehension syntax not quite as clear cut.

g = (i for i in 1:10) # a ten element generator
v = [i for i in 1:10] # a ten element vector
x = [g] # a one element vector of generators?

@iamed2
Copy link
Contributor

iamed2 commented Feb 9, 2016

@mbauman If we take a clue from Python, [g] would indeed be a one-element vector of generators. That's probably what people will expect.

@mschauer
Copy link
Contributor

mschauer commented Feb 9, 2016

I talked earlier about a sister function to collect which could be named tabulate which is collect+shape, but if shape has to do with brackets, then I think [g;] indicates shape-preserving collection of an generator in a clear way.
Maybe some of you want to have a look at https://gist.github.com/mschauer/b04e000e9d0963e40058 at this point which I wrote a while ago and which elaborates the idea that things within brackets have a shape (if, please read that as an "essay", not as a specific proposal).

@JeffBezanson
Copy link
Member Author

@timholy Yes, I was thinking of Cartesian as wrapping a product iterator with a shape the same way Array wraps a linear memory vector with a shape. This could indeed be expressed with reshape.

@JeffBezanson JeffBezanson removed needs docs Documentation for this change is required needs tests Unit tests are required for this change labels Feb 11, 2016
@JeffBezanson
Copy link
Member Author

Have added NEWS and docs.

So far, performance is a bit sub-par and we seem to need more inlining.

@JeffBezanson
Copy link
Member Author

I propose merging this. Any final suggestions or objections?

The syntax `f(x) for x in iter` is syntax for constructing an instance of this
type.
"""
immutable Generator{I,F}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any particular reason you chose to go with {I,F}, but ::F, ::I (for the order)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think Generator(func, range) is a bit more natural since it matches the order of comprehensions, but I suspect dispatching on the kind of iterator will be much more common than dispatching on the type of function. For example below I dispatch on Generator{IteratorND}.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generator(func, range) also allows Generator(range) do ... end syntax.

@timholy
Copy link
Member

timholy commented Feb 12, 2016

This is going to be so nice. Big 👍

enclosed in parentheses to avoid ambiguity::

julia> collect(1/(i+j) for i=1:2, j=1:2)
ERROR: function collect does not accept keyword arguments
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit unfortunate. How about parsing this as a generator instead and using ; to give keywords in such cases:

collect(1/(i+j) for i=1:2, j=1:2) # 2d generator
collect(1/(i+j) for i=1:2; j=1:2) # 1d generator and keyword argument `j`
collect(j=1:2, 1/(i+j) for i=1:2) # ditto

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would consider giving a syntax error since this is so ambiguous.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the presence of the for makes this quite unambigous in the programmer's mind. You wouldn't write a generator and then a keyword argument without adding parentheses around the former.

It's ambiguous for the parser, but as @StefanKarpinski noted ; can be used to pass keyword arguments.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could also write this:

collect((1/(i+j) for i=1:2), j=1:2) # 1d generator and keyword argument `j`

Given that there are so many other ways to express the 1d generator + keyword arg and this particular one looks so much like it should be a 2d generator, it seems to me that it's a bit of a shame not to parse it as such. If you're not convinced by that, at least make it an error so that we can change our minds later without breaking anyone's code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Julia's parser is "eager" at many occasions: As long as the next token can be part of the current expression, it is. Given all the ambiguity with operators, macro calls, one-line if statements etc., that's an important rule.

Thus this should be a generator, not a keyword. (Or an error, of course.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm in favor of after a for i=1:2 in parens having both , j=1:2 and , j in 1:2 be interpreted as more of generator. If not that, then an error. Another note is that if we eliminated the comma from the comprehension syntax, then I believe this issue would really go away:

[1/(i+j) for i=1:2 j=1:2]
collect(1/(i+j) for i=1:2 j=1:2)

This is currently a syntax error.

Copy link
Contributor

@toivoh toivoh Feb 12, 2016 via email

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What @toivoh said. It might be briefly painful but I think we could consider being more picky about semicolons vs commas for kwargs at call sites.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha. Fair enough. I retract the space-separated proposal :-P

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think it's worth pointing out that kwargs aren't the only ambiguous case for ,. other comma-sensitive contexts include tuples and array literals as well as assignment/let/return expressions (although assignment in those other contexts is mostly a syntax error)

@toivoh
Copy link
Contributor

toivoh commented Feb 13, 2016

How about allowing to specify several indices parenthesized after for?:

[1/(i+j) for (i=1:2, j=1:2)]
collect(1/(i+j) for (i=1:2, j=1:2))

Or perhaps you might as well just put the parentheses around the whole generator expression.

@JeffBezanson
Copy link
Member Author

Ok, I have changed it to make all trailing comma-separated expressions part of the generator.

@StefanKarpinski
Copy link
Member

Nice.

@quinnj
Copy link
Member

quinnj commented Feb 16, 2016

If we don't want to include the parsing of for x in iter if cond to create a Filter around the Generator right now, I can open a separate issue.......but I think it'd be awesome to have out of the box :)

@JeffBezanson
Copy link
Member Author

doctests added.

@tkelman
Copy link
Contributor

tkelman commented Feb 18, 2016

I vote for merging, but only after Tim's inline comments are addressed. They look like pretty minor details.

function (::Type{IteratorND}){I,N}(iter::I, shape::NTuple{N,Integer})
if length(iter) != prod(shape)
throw(DimensionMismatch("dimensions $shape must be consistent with iterator length $(iter(a))"))
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ERROR: UndefVarError: a not defined

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, thanks.

This introduces the types `Generator`, which maps a function over
an iterator, and `IteratorND`, which wraps an iterator with a
shape tuple.
JeffBezanson added a commit that referenced this pull request Feb 19, 2016
@JeffBezanson JeffBezanson merged commit d8b5679 into master Feb 19, 2016
@yuyichao yuyichao deleted the jb/generators branch February 19, 2016 05:08
@pkofod
Copy link
Contributor

pkofod commented Feb 19, 2016

I am going to work on my thesis today...
... building master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.