Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: concatenation interface #10338

Closed
Jutho opened this issue Feb 26, 2015 · 8 comments
Closed

RFC: concatenation interface #10338

Jutho opened this issue Feb 26, 2015 · 8 comments

Comments

@Jutho
Copy link
Contributor

Jutho commented Feb 26, 2015

Some days (by now weeks) ago, I started working on cleaning up and improving the efficiency of the concatenation code (hcat, vcat, cat, ... ? ). A first attempt was #10155, but I closed it in favor a fresh restart. However, in the mean time several questions and comments have arisen (see e.g. https://groups.google.com/forum/#!topic/julia-users/E3G686bg9lE, #10204, ... ). So I thought it was good to have a poll about a number of decisions.

  • I think most people agree on WIP: Make [a, b] non-concatenating #8599 , which will become in effect after the depreciation period, that [a, b, c] should just construct a vector of its elements and not concatenate. T[a, b, c] can be used to specify the element type of the vector and convert all elements a,b,c to type T.
  • [a; b; c], [a b c] and [a b; c d] currently concatenate, i.e. they expand arguments a, b, .. of type AbstractArray into their individual elements. There also is a typed version T[a; b; c] etc to force the element type of the resulting output array. There are the following critiques (not necessarily my personal opinion):
    1. The syntax [a b; c d] is also useful to construct a Matrix by just specifying simple arguments (e.g. a,b,c,d of type Number). This is not really concatenation, just filling a matrix. Also, there is no way to construct a (n,1) or (1,1) Matrix. See also https://groups.google.com/forum/m/?fromgroups#!topic/julia-users/E4kX3XGzao4 .
    2. It is impossible to build Matrix with elements of type AbstractArray using this syntax, since they will always concatenate. In fact, trying to use the typed syntax such as Vector{Int}[a b c] just fails (cat-error in constructing Array-of-Arrays #10204).
    3. This lead to the proposal to get rid of the typed concatenation syntax all together (cat-error in constructing Array-of-Arrays #10204) and disallow the construction of arrays of arrays using the concatenation interface.
    4. Other questions: how should the return array type of concatenation depend on the input, currently most cases use Array, but some cases return e.g. BitArray or some sparse matrix, but this is hard to alter this behavior in the case of mixed inputs (e.g. vcat between Arrays and DataArrays JuliaStats/DataArrays.jl#130), especially if you still want to maintain high efficiency for the simpler case and don't want to write many different method definitions of the same functions.

I could see two different proposals two continue

  1. Basically the current proposal:
  • [a,b,c] never concatenates, it can only be used to construct Vector objects.
  • [a b c] and [a; b; c] always concatenate, you could not use this syntax to build arrays of arrays, there would still need to be a decision regarding the fate of the typed concatenation syntax and question iv.
  1. A new proposal:
  • [a, b, c], [a b; c d] etc never concatenate. [a, b, c] is a constructor for type Vector and thus creates a vector its elements (size (3,)). [a b c], [a; b; c] and [a b; c d] are Matrix constructors and thus create matrices of respectively sizes (1,3), (3,1), (2,2). In particular, [a;] can be used to create a Matrix of size (1,1). They all can be combined with a type T specification and it is possible to have elements which are themselves arrays.
  • A new syntax (e.g. using [| ... |] brackets) is used for concatenation. There might still be a use case to distinguish between , and ; although I don't immediately see one.
@Jutho
Copy link
Contributor Author

Jutho commented Feb 26, 2015

As for my personal opinion. I liked proposal one, but I can also see merit in the second proposal. Creating vectors and matrices from arbitrary elements is one of the most basic functionalities of a scientific language and should maybe not be conflated with concatenation (that's also not the case with proposal one, but there is just no dedicated syntax for constructing and filling a Matrix). The second proposal further separates the different goals of array construction and of array concatenation. The first set of functions (related to the [ ... ] brackets) can be written to have fast creation and filling of arrays, whereas the second set of functions (the [| ... |] brackets) can be written for flexibility in concatenation and don't have to care about the efficiency of 'concatenating' just plain numbers.

@jdlangs
Copy link
Contributor

jdlangs commented Feb 26, 2015

For proposal 2, wouldn't the new syntax have to be used for the nonconcat-ing construction? It would be a massive breaking change over the entire ecosystem if something like: [rand(3,3) zeros(3); [0 0 0 1]] no longer created a 4x4 matrix.

It think it's awesome that Julia has a unified Array type for all dimensions, but it needs to be remembered different dimensions do have different use cases. I think in general it's far more common for people to construct matrices by combining other matrices and vectors, etc... Constructing vectors is different because 1D arrays are ubiquitous even for non-numerical work so nonconcat-ing construction has a lot more use cases.

@Jutho
Copy link
Contributor Author

Jutho commented Feb 26, 2015

I think that would be the same kind of massive breaking change as [1:3, 1:3] no longer concatenating with #8599 (which I absolutely support). So that cannot be the argument right now, it's only an argument for considering this thoughtfully right now, rather than having to go through this again at some later point. I have never said that proposal 2 would be pleasant, and I am fine with both proposals.

Note that that going from [ , , ,] to [ ; ; ; ] requires the number of symbols to be changed proportional to the number of elements, whereas going from [ ... ] to [| ... |] requires a constant number of symbols to be changed. Although that's probably the weakest of all arguments :-).

@Jutho
Copy link
Contributor Author

Jutho commented Feb 26, 2015

In addition, I also thought that there was no real use for a non-concatenating matrix constructor, but the discussion in https://groups.google.com/forum/#!topic/julia-users/E3G686bg9lE made me consider otherwise.

@SimonDanisch
Copy link
Contributor

Thank you for this issue @Jutho !
This is a nice summary, and when I read it, I feel even more strongly about option 2, now that you mention that the code could be even cleaner! (which is not a big surprise, if you better encapsulate 2 different concepts)
Also, I didn't realize that in the case of concatenation you don't immediately need , and ;, which makes it even more counter intuitive, to have more shape defining symbols only for the concatenating behavior.

If non concatenating is done by [|...|], there wouldn't even be much breakage, or am I missing something? I personally don't have an opinion about which one should be which. They're pretty neutral in terms of semantic, so it should probably be only decided by convenience, and which is used more.

Here some more related weird behaviors:

immutable Vec3{T} <: DenseArray{T, 1}
    x::T
    y::T
    z::T
end
Vec3(a)                                                 = Vec3(a,a,a)
Base.zero{T}(a::Type{Vec3{T}})                          = Vec3(zero(T));
Base.length(a::Vec3)                                    = 3;
Base.getindex(A::Vec3, i::Integer)                      = A.(i);
Base.size(A::Vec3)                                      = (3,);
Base.size(A::Vec3, i::Integer)                          = (3,)[i];
Base.ndims(A::Vec3)                                     = 1;
Base.eltype{T}(A::Vec3{T})                              = T;

Base.start(A::Vec3)                                     = 1;
Base.next (A::Vec3, state::Integer)                     = (A[state], state+1);
Base.done (A::Vec3, state::Integer)                     = length(A) < state;

[Vec3(1,3,4); Vec3(1,2,3)]) # #MethodError(similar,([1,3,4],Int64,(6,)))
[Vec3(1,3,4), Vec3(1,2,3)]) #[a,b] concatenation is deprecated; use [a;b] instead  
# +MethodError(similar,([1,3,4],Int64,(6,)))
[Vec3(1,3,4)  Vec3(1,2,3)] #MethodError(similar,([1,3,4],Int64,(3,2)))
Vec3[Vec3(1,3,4); Vec3(1,2,3)] #MethodError(similar,([1,3,4],Vec3{T},(6,)))
Vec3[Vec3(1,3,4), Vec3(1,2,3)] #Vec3[[1,3,4],[1,2,3]]
Vec3[Vec3(1,3,4) Vec3(1,2,3)] #MethodError(similar,([1,3,4],Vec3{T},(3,2)))

#with
Base.similar{T}(a::Vec3, t::Type{T}, z::(Integer...))   = zero(Vec3{T}) #can only be really implemented with some FixedSizeArray implementation

[Vec3(1,3,4); Vec3(1,2,3)] #MethodError(setindex!,([0,0,0],[1,3,4],1:3))
[Vec3(1,3,4), Vec3(1,2,3)] #WARNING: [a,b] concatenation is deprecated; use [a;b] instead
#MethodError(setindex!,([0,0,0],[1,3,4],1:3))
[Vec3(1,3,4)  Vec3(1,2,3)] #BoundsError(#undef,#undef)
Vec3[Vec3(1,3,4); Vec3(1,2,3)] #MethodError(zero,(Vec3{T},))
Vec3[Vec3(1,3,4),Vec3(1,2,3)] #Vec3[[1,3,4],[1,2,3]]
Vec3[Vec3(1,3,4) Vec3(1,2,3)] #MethodError(zero,(Vec3{T},))

#with cheated similar
Base.similar{T}(a::Vec3, t::Type{T}, z::(Integer...))   = Array(t,z)
[Vec3(1,3,4); Vec3(1,2,3)] # [1,3,4,1,2,3]
[Vec3(1,3,4), Vec3(1,2,3)] # [1,3,4,1,2,3] + 
#WARNING: [a,b] concatenation is deprecated; use [a;b] instead
[Vec3(1,3,4) Vec3(1,2,3)] # [1 1
# 3 2
# 4 3]
Vec3[Vec3(1,3,4); Vec3(1,2,3)] # MethodError(convert,(Vec3{T},1))
Vec3[Vec3(1,3,4),Vec3(1,2,3)] # Vec3[[1,3,4],[1,2,3]]
Vec3[Vec3(1,3,4) Vec3(1,2,3)] # MethodError(convert,(Vec3{T},1))

I'm pretty sure, that this should be doable without copying and similar... (probably related to your fast vs general concat)

@timholy
Copy link
Member

timholy commented Feb 27, 2015

It is an interesting proposal to make some new operator be the non-concatenating operator; it would be less breakage.

@dhoegh
Copy link
Contributor

dhoegh commented Feb 28, 2015

I like the second proposal because it makes it easy to make an array of vectors/ranges, which is useful. I have always found that the concatenation feature should be separated from array construction. To request the concatenation feature it seems reasonable to do [|1:10|] instead of [1:10;] it is only one more character and in my mind it would be easier to say to newcomers to use [| ... |] for Matlab like concatenation instead of [1:10;]. It just seems less subtle using [| ... |] for concatenation instead of looking at the kind of separation.
Then there is an Array constructor for folks doing linear algebra and one for the generalists which would like to have arrays of different iterable content.

@kmsquire
Copy link
Member

kmsquire commented Mar 8, 2015

(closed in lieu of #7128)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants