Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What does T mean in AbstractArray{T}? #9586

Closed
timholy opened this issue Jan 3, 2015 · 11 comments
Closed

What does T mean in AbstractArray{T}? #9586

timholy opened this issue Jan 3, 2015 · 11 comments
Labels
domain:docs This change adds or pertains to documentation

Comments

@timholy
Copy link
Sponsor Member

timholy commented Jan 3, 2015

This is inspired by our head-scratching over in JuliaMath/Interpolations.jl#17. I couldn't find an open issue on this, but apologies if this is a dupe. It is surely related to #8974, #8027.

The meaning of eltype is fairly clear for a stored array, but it's less clear for an array type that involves computation. For example, here's something that sure looks like an AbstractArray, for which it seems to make sense to define a getindex that involves computation:

type InterpArray1D{T} <: AbstractArray{T, 1}
    a::Array{T,1}
    shift::Float64
end

function getindex(A::InterpArray1D, i)
    ix = floor(Int, real(i)+A.shift)
    dx = i - ix
    convert(eltype(A), (1-dx)*A.a[ix] + dx*A.a[ix+1])
end

Note that if you omit the convert, you are not guaranteed to return an "element" of type T---the return type would depend not only on the concrete type of A, but also that of i.

There are situations, however, when multiple-dispatch type-dependency might be desirable. For example, if one omitted the call to convert, one could use DualNumbers.jl to also compute the gradient (e.g., A[dual(5,1)] would compute both the value and slope at i=5). Skipping the convert does not automatically make it type-unstable, of course, as long as the return type can be inferred. In that sense, one can make the argument that getindex is just a normal function and our usual preference for duck-typing should apply.

So this raises the question: do we have a clear definition of what AbstractArray{T} really means? This documentation page says it means "the type of the elements contained in A", which would appear to make it OK to skip the convert in getindex. However, I'll wager that we have a lot of code that assumes that A[1] returns an object of type eltype(A).

FYI over in Interpolations we've also thought about not making this kind of thing a subtype of AbstractArray; the biggest loss would seem to be the ability to create a SubArray out of an InterpArray1D, which is definitely something that comes up in practice.

@timholy timholy changed the title What does T mean in AbstractArray{T} What does T mean in AbstractArray{T}? Jan 3, 2015
@jiahao
Copy link
Member

jiahao commented Jan 3, 2015

do we have a clear definition of what AbstractArray{T} really means?

Ah, #987, the undead issue that continues to haunt us despite being closed.

Last I heard over there, the consensus was that an AbstractArray{T,N} was a type with the following interface:

  • size is defined and returns a tuple of length N,
  • getindex is defined and returns data of type T "cheaply", where cheaply means "in O(1) time preferably but we will allow some small growth bounded very weakly above by O(n) where n=length(N)=prod(size(N)) to accommodate sparse matrices".

The complexity guarantee inherent in the interface of getindex precludes more exotic objects like QRCompactWYQ and Cholesky factors from the AbstractArray{T,2} hierarchy, since getindex(X, 1, 1) would in principle take O(n^2) or even O(n^3) time to compute.

Note that the getindex part of the interface is defined very nebulously, and I think is worth scrutinizing, especially since one can have indexing operations that are quite nontrivial.

@jiahao
Copy link
Member

jiahao commented Jan 3, 2015

The part of the getindex interface question that drags in #4744 is whether we hold these properties to be self-evident:

  • The output of getindex(A, 1) is of type T,
  • The output of getindex(A, :) is of type AbstractArray{T} (possibly implementing the flattening indexing behavior so that the resultant rank is N=1),
  • For rank N=2, the output of getindex(A, :, 1) is of type AbstractArray{T, 1},

and more complicated versions thereof.

In #4744, the possibility of M[:, 1] returning a CoVector (representing a row vector) was raised, which came with the question of whether CoVector{T} <: AbstractArray{T, 1} should be true.

@garborg
Copy link
Contributor

garborg commented Jan 3, 2015

Not sure if relevant, but a couple other AbstractArray-ish types where 'the output of getindex(A, 1) is of type T' would not hold:

  • NullableArray
  • [Cardinal/Ordinal/Categorical][Array]

@JeffBezanson
Copy link
Sponsor Member

It seems to me we should maybe have NullableArray{T} <: AbstractArray{Nullable{T}}.

@JeffBezanson
Copy link
Sponsor Member

I'd also say the T in AbstractArray{T} should have something to do with the interface, i.e. what elements you actually get from the array, and not just the representation. If there's a type where the representation and element types are different, and the user might care about both, then my first impression is you'd need an extra type parameter.

@johnmyleswhite
Copy link
Member

It seems to me we should maybe have NullableArray{T} <: AbstractArray{Nullable{T}}.

Completely agree. I think of NullableArray as a hand written application of the array-of-structs to struct-of-arrays transform operating on Array{Nullable{T}}.

@tomasaschan
Copy link
Member

I'd also say the T in AbstractArray{T} should have something to do with the interface, i.e. what elements you actually get from the array, and not just the representation.

The problem for Interpolations that spawned this discussion, is that you get some really nice things for free if the return type of getindex may depend on the type of the indexing variable. That is, strictly, having to do something with the interface, but it's not as simple as for most array types.

Take @timholy's example above, remove the convert call in the return statement, and consider the following:

julia> A = InterpArray1D(rand(10),1)
InterpArray1D{Float64}
julia> A[3.4]
0.36
julia> A[dual(3.4, 1)]
0.36 - 0.14du

The T in InterpArray1D{T} definitely has "something to do with the interface", but it is just one of several types that are involved in defining the return type of getindex, via promotion rules with the type of the index. Thus, it's not really known until afterwards what typeof(A[theindex]) is - and the user might do something completely unpredictable by introducing a new type for indexing, which follows its own semantics for + and * (the only operations otherwise needed).

The current WIP implementation in JuliaMath/Interpolations.jl#19 gets around this by enforcing that all indices inherit Real, but that very efficiently rules out black-magic stuff like dual numbers, rendering the library much less useful. If implementing AbstractArray{T}, with all the goodies that come with that, means guaranteeing that getindex returns a T, then Interpolations.jl has to choose between inheriting AbstractArray, and having really awesome indexing properties - and as a Julia programmer I'm greedy; of course I want both.

@JeffBezanson
Copy link
Sponsor Member

AbstractArray{T} doesn't mean all calls to getindex on that array must return a T. After all, we already do things like A[[1,2]], which will return a vector and not a T. T should describe elements iterated by the array, and probably also getindex with integer indexes.

@tomasaschan
Copy link
Member

That is good news! Interpolation objects (probably) won't support linear indexing, so the default iteration won't work for us, but we can already guarantee that getindex(itp::Interpolation{T}, i::Integer)::T. Am I correct in thinking that if we implement our own iteration methods (start, next, done), then we fulfill the "contract" of AbstractArrays?

@JeffBezanson
Copy link
Sponsor Member

Yes, that all sounds fine to me.

@timholy timholy closed this as completed in 9451e4a Jan 4, 2015
@tkelman tkelman added domain:docs This change adds or pertains to documentation backport pending labels Jan 4, 2015
timholy added a commit that referenced this issue Jan 6, 2015
Closes #9586.
[av skip]

(cherry picked from commit 9451e4a)
@tkelman
Copy link
Contributor

tkelman commented Jan 6, 2015

backported the clarification in b22d42a

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:docs This change adds or pertains to documentation
Projects
None yet
Development

No branches or pull requests

7 participants