Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge collect() into Vector()? #16029

Open
nalimilan opened this issue Apr 24, 2016 · 45 comments
Open

Merge collect() into Vector()? #16029

nalimilan opened this issue Apr 24, 2016 · 45 comments
Labels
design Design of APIs or of the language itself domain:arrays [a, r, r, a, y, s] domain:collections Data structures holding multiple items, e.g. sets

Comments

@nalimilan
Copy link
Member

nalimilan commented Apr 24, 2016

I wonder whether collect() isn't always equivalent to Vector(), and could therefore be merged into it:

julia> methods(collect)
#3 methods for generic function "collect":
collect(r::Range{T<:Any}) at range.jl:753
collect{T}(::Type{T}, itr) at array.jl:217
collect(itr) at array.jl:224

The method for Range returns a vector, so it could be replaced with Array(), as both call vcat() in the end. convert(Vector, ...) currently fails and should be fixed.

The methods for general iterables are more complex, but they essentially always create an Array too. Even when they call similar(), it's on a 1:1 range, so they actually create a Vector. So the collect() machinery could be moved to Array()/Vector() to make them support conversion from any iterable.

This issue is related to discussions regarding similar() (#13731). Indeed, Julia provides a few functions which appear to behave differently, but in practice behave like Array, and callers end up depending on this assumption. This makes the API more complex and less explicit.

See also #12251 about merging full into Array.

@tkelman
Copy link
Contributor

tkelman commented Apr 25, 2016

duplicate of #12251?

@nalimilan
Copy link
Member Author

Indeed. Let's keep this issue to discuss the case of collect then, to avoid polluting the other thread.

@nalimilan nalimilan changed the title Merge collect() and full() into Array()? Merge collect() into Vector()? Apr 25, 2016
@pkofod
Copy link
Contributor

pkofod commented Apr 25, 2016

It would be a lot cleaner, as that is actually what people want when using "our" collect method. I only know the method name collect from Ruby where it is identical to our map (and their own map,...), just to add to the confusion.

@mschauer
Copy link
Contributor

collect returns an Array, not (always) a Vector. To usecollect rather than convert(Array,...) indicates better that values/objects ) are to be gathered which possibly do not yet exists and which possibly will not exist anymore as iterable after the collection (for example when collecting iterables of random numbers, user inputs or unique ids). My impression is thatconvert should work on things which exist and should not wait an indefinite time for the generation of a new value/event.

@nalimilan
Copy link
Member Author

convert(Array, itr) might indeed look a bit surprising when doing an irreversible operation, so we could only support the Array(itr) form. I don't think it would really be less explicit than collect. The fact that the operation might block is dependent on/indicated by the type of the iterator, not be the function. Currently collect(1:3) will never block, but other types may do so.

@mschauer
Copy link
Contributor

mschauer commented Apr 26, 2016

By the way convert(Array, 1:3) is defined anyway because isa(1:3,AbstractArray) is true.

@davidanthoff
Copy link
Contributor

I really like this suggestion, lets get rid of collect!

@JeffBezanson
Copy link
Sponsor Member

I like the idea too, but it is ambiguous with constructors that take dimensions, e.g. Array{T}((m,n)). That will need to be resolved.

@nalimilan
Copy link
Member Author

Yes, but how? Can we deprecate Array{T}(::Tuple) in favor of Array{T}(::Integer...)? Or would that introduce a performance penalty?

@StefanKarpinski
Copy link
Sponsor Member

It's not about the performance penalty so much as many people prefer to use the dimension tuple form, and in some sense it's a better, more consistent way to specify this.

@JeffBezanson
Copy link
Sponsor Member

It might be too late for this, but I'd almost say arguments like reshape would be ideal: Array(iter, dims). A zeros matrix would be Array(repeated(0), (m,n)). We could allow passing an empty iterator to get an uninitialized array.

@nalimilan
Copy link
Member Author

So if we don't want to get rid of Array{T}(::Tuple), we can't use Array{T}(itr) as a replacement for collect(T, itr). That's why I wonder whether it's really needed: yes it's consistent, but varargs are just another way of passing a tuple, so it's also consistent IMHO.

Anyway, we can still use convert(Array{T}, itr) to replace collect(T, itr), though it's more verbose. Opinions?

@JeffBezanson I'm not sure I get your point. That indeed sounds interesting, but it doesn't replace collect, and there's still an ambiguity with Array{T}(iter, dims).

@carlobaldassi
Copy link
Member

Just to add a data point, the recently introduced constructors from iterables of BitArrays (#19018) also inevitably ended up having the slightly annoying feature that tuples of Ints are special-cased:

julia> BitArray([0,1,0])
3-element BitArray{1}:
 false
  true
 false

julia> BitArray((0,1,0))
0×1×0 BitArray{3}

julia> BitArray((false,true,false))
3-element BitArray{1}:
 false
  true
 false

It's probably less severe than the Array case. But perhaps if a general solution is not found and collect is ultimately kept, we should have bitcollect, by analogy with other methods like bitrand and bitbroadcast, instead of BitArray(iter)? (I don't like it at all, but it would be more consistent.)

@nalimilan
Copy link
Member Author

It's probably less severe than the Array case. But perhaps if a general solution is not found and collect is ultimately kept, we should have bitcollect, by analogy with other methods like bitrand and bitbroadcast, instead of BitArray(iter)? (I don't like it at all, but it would be more consistent.)

I thought we were rather trying to get rid of the bit* functions in favor or more generic constructs. We should either support collect(arraytype, itr) or recommend using BitArray. Unfortunately, the former in ambiguous if you want to collect an iterator of arrays, and the second is ambiguous if you want to collect a tuple of integers...

@carlobaldassi
Copy link
Member

I thought we were rather trying to get rid of the bit* functions in favor or more generic constructs.

In principle yes, except that good generic constructs to specify the output container type have not yet emerged to my knowledge (other than the constructors, of course).

@Sacha0
Copy link
Member

Sacha0 commented Oct 22, 2016

good generic constructs to specify the output container type

Ref. #16740, work in that direction. Best!

@JeffBezanson JeffBezanson added this to the 1.0 milestone Aug 10, 2017
@JeffBezanson JeffBezanson added domain:collections Data structures holding multiple items, e.g. sets design Design of APIs or of the language itself labels Aug 10, 2017
@JeffBezanson
Copy link
Sponsor Member

Now that Array( ... ) has been deprecated, it would be totally possible just to rename collect to Array. Sadly, that would include Array(T, iter), but that's no worse than collect itself.

For Array{T} and Array{T,N} the only unambiguous signature available is Array{T}(iter, dims::Tuple). That might be ok sometimes, but unfortunately Array{T}(iter) is what we'd want most.

One option is to deprecate e.g. Array{T}(1,2,3) to Array{T,3}(1,2,3). Or we could require something like Array{T}(uninitialized, dims) for the current behavior, explicitly requesting uninitialized memory (uninitialized would be a special constant).

@StefanKarpinski
Copy link
Sponsor Member

StefanKarpinski commented Aug 11, 2017

Since collect produces a vector, what about spelling this as Vector(itr) and Vector{T}(itr)?

Edit: I realized the problem with this as soon as I posted it – Vector(n) and Vector{T}(n) alread mean something and integers are iterable. Sad trombone...

@mbauman mbauman added the domain:arrays [a, r, r, a, y, s] label Aug 30, 2017
@vtjnash
Copy link
Sponsor Member

vtjnash commented Sep 5, 2017

ref #15120

@StefanKarpinski
Copy link
Sponsor Member

Arrays are at this point one of the only collections that doesn't have constructors that take arbitrary iterators of contents to construct with, making them quite the odd one out. Perhaps we should consider deprecating Vector(dims...) and Vector(dims) instead and have blank(T, dims...) and blank(T, dims) (or maybe uninitialized?) functions similar to zeros or ones instead. Then Vector(itr) would be available for this kind of construction.

@timholy
Copy link
Sponsor Member

timholy commented Sep 6, 2017

A blank what? Why isn't Dict() also blank? Should it be blank(Array{T}, dims)? Then I can have blank(MyAbstractArray{T}, args...)?

@fredrikekre
Copy link
Member

Then I can have blank(MyAbstractArray{T}, args...)?

That's essentially what #16740 implements, for ones, zeros etc.

@Sacha0
Copy link
Member

Sacha0 commented Nov 19, 2017

(If the following doesn't make sense, please check out #24595.) Deprecation process conundrum: We can deprecate Array(shape...)-like constructors in favor of Array(uninitialized, shape...)-like constructors in 0.7, allowing Array(...) to take on collect(...)'s behaviors in 1.0. Great. Sticking point: That sequence of changes allows deprecation of collect(...) to Array(...) equivalents only in 1.0 (or requires a hard break somewhere). Thoughts on how best to go about this process? Thanks!

@Sacha0 Sacha0 added the status:triage This should be discussed on a triage call label Nov 20, 2017
@Sacha0
Copy link
Member

Sacha0 commented Nov 20, 2017

Working plan from triage: Deprecate collect in 1.0. Best!

@StefanKarpinski StefanKarpinski removed the status:triage This should be discussed on a triage call label Nov 20, 2017
@nalimilan
Copy link
Member Author

What the status of this issue? Triage just decided to use collect at #25217 (comment), which sounds contradictory with last comment.

@StefanKarpinski
Copy link
Sponsor Member

Right, that's a good point. I'm not sure we need to fully deprecate collect but it would be nice if Vector(itr) worked at least. Then collect(itr) or collect(lazy) can be the function that says "go evaluate all of these things" and collect them for me.

@Sacha0
Copy link
Member

Sacha0 commented Dec 21, 2017

Everything necessary to give e.g. Arrays the relevant subset of collect's behaviors between 0.7 and 1.0 is complete IIRC. So happily the better constructor behavior can come about independent of collect's evolution :).

@JeffBezanson
Copy link
Sponsor Member

Yes, I agree we seem to need collect. Most (all?) current uses can be replaced by Array(itr), which is great, but we need a function that picks an appropriate array type (e.g. for materializing a lazy view of an immutable or distributed array, etc.). AbstractArray(itr) or DenseArray(itr) might be able to do that, but (1) that's pretty long, and (2) lazy views might still be subtypes of AbstractArray.

@davidanthoff
Copy link
Contributor

but we need a function that picks an appropriate array type (e.g. for materializing a lazy view of an immutable or distributed array, etc.). AbstractArray(itr) or DenseArray(itr) might be able to do that, but (1) that's pretty long

The very unpopular #22630 proposal would solve that. But obviously not (2) and I also realize that at this stage in the release process it is probably not even worth considering that option anymore :) So treat this comment as noise.

@o314
Copy link
Contributor

o314 commented Dec 29, 2017

The very unpopular #22630 proposal would solve that. But obviously not (2) and I also realize that at this stage in the release process it is probably not even worth considering that option anymore :) So treat this comment as noise.

IMHO Factory pattern was too entangled in OOP land to succeed.
A Service Locator pattern escapes this hell and provides opportunies regarding cloud/cluster ctx that we face regularly today
https://stackoverflow.com/questions/5698026/is-the-service-locator-pattern-any-different-from-the-abstract-factory-pattern

Anyway having a simple wrapper - a function, let's call it collect - to handle creation with correct type info is a good - if not essential - step in the good direction.
So +1 with

Yes, I agree we seem to need collect. Most (all?) current uses can be replaced by Array(itr), which is great, but we need a function that picks an appropriate array type (e.g. for materializing a lazy view of an immutable or distributed array, etc.). AbstractArray(itr) or DenseArray(itr) might be able to do that, but (1) that's pretty long, and (2) lazy views might still be subtypes of AbstractArray.

And move forward to 1.0 :)

@JeffBezanson
Copy link
Sponsor Member

Looks like we are set for 1.0 here.

@JeffBezanson JeffBezanson modified the milestones: 1.0, 1.x Dec 30, 2017
@Sacha0
Copy link
Member

Sacha0 commented Jan 7, 2018

Given last week's triage's decision to extend copy for materialization of Adjoint/Transpose rather than collect, and the accompanying extensive and inconclusive discussion regarding semantics and naming of materialization-like operations, deprecating collect (at least in its present form) once again seems the way forward?

Operating under that assumption, I prepared #25450 as a head start on the potential deprecation between 0.7 and 1.0. Best!

@vtjnash
Copy link
Sponsor Member

vtjnash commented Jan 7, 2018

Yes (note that the title is a bit misleading, since the goal is to merge it into Array, not Vector; possibly by using copyto)

@jlperla
Copy link
Contributor

jlperla commented May 2, 2018

Discussing this with @mschauer

I just wanted to check in on the thinking here. I understand completely the motivation to take more control over things with teh constructors in generic library code.

But collect is extremely useful for users - in part because it doesn't require explicitly defining the templated types that are easy for package developers.

For example, start with the simplest example:

collect(1:10);

#Would this now require?
Array{Int64,1}(1:10);

#Tricky to train users on this... In fact, this was my first try
Array{Int, 1)(1:10) #Not an array of concrete types

Next, Imagine the following function which uses the current collect

f(a,b) = collect(a:b)

#I could write the following, but it would be wrong.
f(a,b) = Array{Int64, 1)(a:b)

The issue, of course, is that typeof(a) is not necessarily Int64

So I think the only way to do it correctly (without worrying about promotion... another big issue we didn't need to worry about in the f(a,b) = collect(a:b)

function f(a::T, b::T) where T
    Array{T,1)(a:b)
end

This is all very confusing for a new user, when all they want to do is get an array from a call to linspace and don't really want to worry about generic programming.

Right now the following works great

grid = collect(linspace(0.0, 1.0, 10))

and I would hate to see this require a bunch of generic programming just to teach something similar.

@mbauman
Copy link
Sponsor Member

mbauman commented May 2, 2018

No, the proposal here is to use the un-parameterized Array as the replacement for collect. Your example would be Array(a:b). You could still specify an element type if you wanted, but it wouldn't be required. You could also use Vector or Matrix to specify just a particular dimensional if you wanted, but again, not required.

@jlperla
Copy link
Contributor

jlperla commented May 2, 2018

OK, then I think I agree with the change. As long as I don't have to teach my students about generic programming before they can do the basics!

@nsajko
Copy link
Contributor

nsajko commented Apr 27, 2024

Is the proposal here to define Array constructors that take

  1. a Base.Generator, or
  2. Any value, which is assumed to be an iterator

The downside of (1) is that it wouldn't work for arbitrary iterators, while (2) seems prone to accidental misuse and bugs.

Perhaps it would be better to introduce a new function (perhaps called from_iterator, or generic_collect), or just extend collect so it could support collection types other than Array? EDIT: that's just #36288

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design Design of APIs or of the language itself domain:arrays [a, r, r, a, y, s] domain:collections Data structures holding multiple items, e.g. sets
Projects
None yet
Development

No branches or pull requests