Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Introduce promotion mechanism for concatenation #20815

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

nalimilan
Copy link
Member

@nalimilan nalimilan commented Feb 26, 2017

This is a first attempt to explore a fix for #2326, i.e. stop relying on the type of the first array to decide the type of the result of cat functions. The idea is to introduce a promotion system similar to promote but for concatenation.

Remarks/issues to resolve:

  • I first wanted to use the standard promote mechanism, but it turns out we need a separate one. For example, concatenating Diagonal matrices should give a sparse matrix, not a Diagonal matrix. IOW, promoting a series of identical types for concatenation is not the same as promoting them for conversion (which is the goal of promote).
  • The promotion system deals with both the element type (replacing uses of promote_eltype) and the container type. That could allow e.g. returning a BitArray when concatenating an Array{Bool} with a BitArray`. The number of dimensions isn't involved since it depends on the kind of concatenation performed in a way that would make the system too complex.
  • By default, concatenations return standard Array objects unless a promotion rule has been defined. This could be changed so that concatenations involving only arrays of a custom type default to that type (though I'm not sure it's easy to write using dispatch). Thoughts? This is also an issue due to the next bullet.
  • This relies on similar(::Type{A{T}}, dims) methods being defined for all types which define promotion rules, i.e. which want to opt-out from concatenation returning standard Array objects. AFAICT this is required since there is no generic method to take an array type and change just its element type (cf. this discussion about fixate_eltype); indeed that's not possible e.g. for BitArray. I think that method could be made one of the fundamental requirements of AbstractArray, possibly redefining other similar methods to call it by default: with an instance, you also have the type, but the contrary doesn't work.
  • We could use the same approach as promote_rule to avoid the need to define methods twice for each ordering of arguments.
  • I temporarily changed the typed_*cat lowering to eltyped_*cat as it was simpler to prevent confusion. A less disruptive solution would be to preserve the existing lowering, and use another name for the new methods which take the array type instead of the element type.
  • The promote_type_cat name should probably be improved.
  • More existing code needs to be moved to the new system (notably special matrix types). This can probably allow for some simplification.

@ararslan ararslan added domain:arrays [a, r, r, a, y, s] domain:types and dispatch Types, subtyping and method dispatch labels Feb 26, 2017
@cjprybol
Copy link
Sponsor Contributor

cjprybol commented Mar 4, 2017

Could you show an example of how this can be used to address JuliaStats/NullableArrays.jl#167?

@nalimilan
Copy link
Member Author

Haven't tested yet, but something like this should work:

promote_type_cat(::Type{S}, ::Type{T}) where {S<:AbstractArray, T<:NullableArray} =
    NullableArray{promote_type(eltype(S), eltype(T))}
promote_type_cat(::Type{S}, ::Type{T}) where {S<:NullableArray, T<:AbstractArray} =
    NullableArray{promote_type(eltype(S), eltype(T))}
similar{T,A<:NullableArray{T},N}(::Type{A}, dims::Dims{N}) = NullableArray{T, N}(dims)

@andyferris
Copy link
Member

This is very interesting @nalimilan.

From your example, it seems that promote_type_cat is not commutative by default, unlike promote_rule. I'm curious if this was a deliberate decision?

The final thing is - if we need a specific promotion mechanism for concatenation - do we in general need a promotion mechanism for +, or for - or for *, or a variety of other operations? Or is automated promotion mostly suitable only to the idea of putting things together into the one container?

@nalimilan
Copy link
Member Author

From your example, it seems that promote_type_cat is not commutative by default, unlike promote_rule. I'm curious if this was a deliberate decision?

That's just for simplicity for a first approach. I've listed it in the TODOs (when I mentioned promote_rule).

The final thing is - if we need a specific promotion mechanism for concatenation - do we in general need a promotion mechanism for +, or for - or for *, or a variety of other operations? Or is automated promotion mostly suitable only to the idea of putting things together into the one container?

That's an interesting question. Currently the fallbacks for +, -, * and / all call promote. We could imagine having separate methods so that e.g. promote_type(+, Int, Int) == Int but promote_type(/, Int, Int) == Float64. Not sure how useful it would be in practice, since defining these promotion rules takes as much code as defining the operations themselves. Anyway I'd rather discuss this in a separate issue, as the mechanism developed here could easily be changed to integrate in a more general framework (just need to rename a few methods).

@nalimilan
Copy link
Member Author

For triage: This clearly won't make the feature freeze, but in the perspective of introducing this mechanism in the 1.x series I wonder whether we shouldn't change the rule that the type of the first array is used. Indeed, it would not fit very well in a general promotion mechanism: it's as if broadcast used the container type of the first argument as a fallback.

Maybe it would be more appropriate to always return an Array when no specific method is defined, just like for broadcast? Custom array types would still be able to override this when all concatenated arrays are of the same type. In the future, a promotion mechanism would allow customizing what happens for mixed types, but the fallback would remain the same (i.e. Array) so that nothing breaks.

@JeffBezanson
Copy link
Sponsor Member

That might be a bit less breaking but not by much. Triage would rather punt on this for 1.0.

@MikeInnes
Copy link
Member

FWIW, this would be really useful for operator-overloading-based AD systems like Flux and Knet currently have. It's current hard for us to intercept calls like vcat(x, y, z::TrackedArray).

@nalimilan
Copy link
Member Author

Interestingly, dplyr is developing a system for concatenation which is very similar to this PR: https://stat.ethz.ch/pipermail/r-devel/2018-August/076550.html For this to work, they even have to introduce a promotion mechanism (which doesn't exist in R).

vtjnash added a commit that referenced this pull request Apr 11, 2023
This used to make a lot of references to design issues with the
SparseArrays package (#2326 /
#20815), which result in a
non-sensical dispatch arrangement, and contribute to a slow loading
experience do to the nonsense Unions that must be checked by subtyping.
vtjnash added a commit that referenced this pull request Apr 11, 2023
This used to make a lot of references to design issues with the
SparseArrays package (#2326 /
#20815), which result in a
non-sensical dispatch arrangement, and contribute to a slow loading
experience do to the nonsense Unions that must be checked by subtyping.
vtjnash added a commit that referenced this pull request Apr 20, 2023
This used to make a lot of references to design issues with the
SparseArrays package (#2326 /
#20815), which result in a
non-sensical dispatch arrangement, and contribute to a slow loading
experience do to the nonsense Unions that must be checked by subtyping.
vtjnash added a commit that referenced this pull request Jul 12, 2023
This used to make a lot of references to design issues with the
SparseArrays package (#2326 /
#20815), which result in a
non-sensical dispatch arrangement, and contribute to a slow loading
experience do to the nonsense Unions that must be checked by subtyping.
vtjnash added a commit that referenced this pull request Jul 12, 2023
This used to make a lot of references to design issues with the
SparseArrays package (#2326 /
#20815), which result in a
non-sensical dispatch arrangement, and contribute to a slow loading
experience do to the nonsense Unions that must be checked by subtyping.
vtjnash added a commit that referenced this pull request Jul 12, 2023
This used to make a lot of references to design issues with the
SparseArrays package (#2326 /
#20815), which result in a
non-sensical dispatch arrangement, and contribute to a slow loading
experience do to the nonsense Unions that must be checked by subtyping.
vtjnash added a commit that referenced this pull request Jul 12, 2023
This used to make a lot of references to design issues with the
SparseArrays package (#2326 /
#20815), which result in a
non-sensical dispatch arrangement, and contribute to a slow loading
experience do to the nonsense Unions that must be checked by subtyping.
vtjnash added a commit that referenced this pull request Jul 13, 2023
This used to make a lot of references to design issues with the
SparseArrays package (#2326 /
#20815), which result in a
non-sensical dispatch arrangement, and contribute to a slow loading
experience do to the nonsense Unions that must be checked by subtyping.
vtjnash added a commit that referenced this pull request Jul 13, 2023
This used to make a lot of references to design issues with the
SparseArrays package (#2326 /
#20815), which result in a
non-sensical dispatch arrangement, and contribute to a slow loading
experience do to the nonsense Unions that must be checked by subtyping.
KristofferC pushed a commit that referenced this pull request Jul 18, 2023
This used to make a lot of references to design issues with the
SparseArrays package (#2326 /
#20815), which result in a
non-sensical dispatch arrangement, and contribute to a slow loading
experience do to the nonsense Unions that must be checked by subtyping.

(cherry picked from commit 5a922fa)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:arrays [a, r, r, a, y, s]
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants