RFC: Add ArrayLike #34196

bramtayl · 2019-12-25T16:38:30Z

This is going to need a bit of explanation.

There is a meaningful idea of an array without a predefined eltype. The most notable examples are generators and broadcasts over arrays. If they could be <: ArrayLike, then they could automatically inherit much of the machinery already constructed for AbstractArrays.

The next step would be to audit all uses of AbstractArray in Base and the stdlibs and decide whether they could be widened to ArrayLike instead.

One way to do this would be simply to replace every occurrence unless the method dispatches on the eltype, i.e. AbstractArray{T} where T. There are two reasons why I didn't do this:

It introduced a lot of hard to fix ambiguity errors
I don't understand a lot of the code, so I'm not certain whether many of these methods could be meaningfully extended to arrays without a predefined eltype.

The good news is that this is barely breaking, and I only had to change one test (show_supertypes).

Closes #32940

bramtayl · 2019-12-25T16:40:05Z

src/jltypes.c

    tv = jl_svec2(tvar("T"), tvar("N"));
    jl_abstractarray_type = (jl_unionall_t*)
        jl_new_abstracttype((jl_value_t*)jl_symbol("AbstractArray"), core,
-                            jl_any_type, tv)->name->wrapper;
+                            (jl_datatype_t*)jl_apply_type((jl_value_t*)jl_arraylike_type, jl_svec_data(jl_svec1(tv_N)), 1),
+                            tv)->name->wrapper;



I wrote this code by guess and check because I don't know C, but it seemed to work

bramtayl · 2019-12-25T18:44:29Z

src/dump.c

@@ -3396,7 +3396,7 @@ void jl_init_serializer(void)
        deser_tag[LAST_TAG+1+i] = (jl_value_t*)vals[i];
        i += 1;
    }
-    assert(LAST_TAG+1+i < 256);
+    assert(LAST_TAG+1+i < 257);


Not sure it this is valid???

timholy · 2019-12-28T13:20:24Z

Can you explain why this is better than

const AnyArray{N} = AbstractArray{<:Any,N}

bramtayl · 2019-12-28T13:37:20Z

That could also be a workable option. I think that what makes this special is that this is a meaningful subtype relationship: arrays with an eltype are a subtype of arrays

timholy · 2019-12-28T13:50:48Z

Understood, but are there methods that would be defined for ArrayLike{N} that couldn't reasonably be defined for AnyArray{N}? If not, then I'm still not sure I understand the need for the subtype.

bramtayl · 2019-12-28T14:02:00Z

I think the main difference is:

AnyArray <: AbstractArray vs.
AbstractArray <: ArrayLike

There are methods defined for AbstractArray that make use of an eltype that we wouldn't want to be inherited by ArrayLike objects.

timholy · 2019-12-28T14:41:36Z

I understand the difference in the type hierarchy. I am just asking about methods. I am not opposed to inserting something into the type hierarchy, but I'm trying to understand what advantages accrue if we decide to do so. #32940 asserts that this would be useful without really making it clear what problem gets solved that can't be solved by alternative means. I see Jeff wondered the same thing, #32940 (comment).

For example, there's always the option to define

Base.eltype(::Type{<:MyArrayLikeType}) = error("eltype is undefined for MyArrayLikeType")

That would automatically invalidate all methods that call eltype, without having to manually go through and decide which can be defined for ArrayLike and which require the "real" AbstractArray. Since one is easy and complete, and the other is harder and likely incomplete, I am trying to understand why you want to pursue the more difficult option.

bramtayl · 2019-12-28T14:45:18Z

I guess that could work. Bottom line, this just seems like a more elegant solution to me.

bramtayl · 2019-12-28T14:56:22Z

Or I guess, to be more specific, that solution would work, but it would require encoding an eltype of Any into the signature of an AbstractArray, but then error when an eltype is requested, which seems paradoxical.

tkf · 2019-12-28T21:53:37Z

That would automatically invalidate all methods that call eltype, without having to manually go through and decide which can be defined for ArrayLike and which require the "real" AbstractArray. Since one is easy and complete, and the other is harder and likely incomplete, I am trying to understand why you want to pursue the more difficult option.

@timholy I think it depends on what do you mean by "complete"; i.e., if you prefer false-positive dispatch (error when a method should not be called) or false-negative dispatch (the method really needs eltype). The latter scenario may arise in code like

f(array::AbstractArray) = # code using `eltype`
f(itr) = # code using mutate-or-widen approach

(I guess I'm repeating what @bramtayl just said.)

As Julia language usually encourages duck-typing, I can see that false-positive dispatch might be preferable. You can also "fix" the code above by using f(array::AbstractArray{T}) where {T} = .... But now that Julia is in the post-v1 era I think it also makes sense to play on the safe side. For example, the "fix" similar to the above example for IteratorEltype

IteratorEltype(::Type{<:AbstractArray}) = EltypeUnknown()
IteratorEltype(::Type{<:AbstractArray{T}}) where {T} = HasEltype()

is technically breaking (ATM, IteratorEltype(AbstractArray) returns HasEltype()).

Having said that, using a trait may be a better option. For example, skipmissing(::AbstractArray) is array-like but skipmissing(::Generator) is not. See: #31020 (comment)

bramtayl · 2019-12-29T17:14:06Z

Well anyway, if people are interested in this I can take the next step of auditing the use of AbstractArray in Base

timholy · 2019-12-29T19:10:43Z

Certainly if there are cases that need to be intercepted prior to failing on eltype, that's an advantage with this approach. I'd say that getting to the stage of seeing a practical use case would be helpful.

tkf · 2019-12-29T19:56:35Z

@bramtayl It would be nice if you clarify your view on some challenges with ArrayLike{N} approach:

An useful case may be to declare Broadcasted to be a subtype of ArrayLike{N} Faster mapreduce for Broadcasted #31020. However, it is difficult to do ATM as the dimension is not known until the Broadcasted object is instantiated. (~~One possible solution is to define (say) InstantiatedBroadcasted{N, ...} <: ArrayLike{N}.~~ This is a breaking change.)
Defining view(::ArrayLike, ...) and reshape(::ArrayLike, ...)is somewhat tricky. Returning a SubArray from view(::ArrayLike, ...) is not an option as the element type is unknown. (One option may be to add SubArrayLike wrapper struct and then re-define all methods in terms of a union (say) SubArrayLikeImpl = Union{SubArray, SubArrayLike}. But AbstractArray{<:Any,N} approach would be much nicer here.)

martinholters · 2019-12-29T20:20:54Z

Can someone explain the conceptual difference between leaving the eltype undefined and defining it as Any? Doesn't it both mean that the type(s) of the elements can not be known without actually looking at them individually? What I'm getting at is: What would be an ArrayLike that couldn't be an AbstractArray{Any} and how do consumers act differently on those two?

tkf · 2019-12-29T22:26:00Z

@martinholters My understanding is that the common expectation is collect(x) isa Array{eltype(x)} if IteratorEltype(x) isa HasEltype, rather than collect(x) isa Array{<:eltype(x)}. That is to say, collect-like functions are "allowed" to minimize the element types of the output container if the input IteratorEltype is EltypeUnknown; but it's expected to preserve the element type if it is HasEltype.

bramtayl · 2019-12-30T01:03:18Z

@tkf I agree. On the two challenges you mention:

Yes, it would be nice. I haven't looked at the broadcast internals, but a simple (maybe non-performant answer) would to be call instantiate as part of the size method.
I like the idea of adding SubArray <: SubArrayLike and ReshappedArray <: ReshapedArrayLike and again auditing base to replace, for example, SubArray with SubArrayLike where possible.

I haven't really gotten far enough along in my thinking to plan out an interface for ArrayLike objects, but it seems to me the biggest problem will be similar, because it's a fundamental part of the AbstractArray machinery that implicitly requires an eltype.

tkf · 2019-12-30T05:14:55Z

size actually already works for Broadcasted (in the way you mentioned). The problem is rather the type parameter N which would be required if we want Broadcasted <: ArrayLike. Adding this type parameter is very tricky because it is not determined at construction time (by design, I suppose).

I don't think we can do SubArray <: SubArrayLike and ReshapedArray <: ReshapedArrayLike as we don't have multiple inheritance. That's why I proposed the Union-based approach. To be clearer, something like

SubArrayLike <: ArrayLike
ReshapedArrayLike <: ArrayLike
AbstractArray <: ArrayLike
SubArray <: AbstractArray
ReshapedArray <: AbstractArray
SubArrayLikeImpl = Union{SubArray, SubArrayLike}
ReshapedArrayLikeImpl = Union{ReshapedArray, ReshapedArrayLike}

where SubArrayLikeImpl and ReshapedArrayLikeImpl are implementation detail used only for implementation sharing.

bramtayl · 2019-12-30T16:27:20Z

Darn you're right. I wonder how much would be lost if all views and reshapes were ArrayLike?

tkf · 2019-12-30T18:33:13Z

I don't think that's possible as linear algebra functions often need to dispatch on the element type of these arrays.

bramtayl · 2019-12-30T20:54:01Z

I don't think that's possible as linear algebra functions often need to dispatch on the element type of these arrays.

:( oh that's too bad

The problem is rather the type parameter N which would be required if we want Broadcasted <: ArrayLike

Would it be possible to compute this at compile time?

tkf · 2020-01-01T05:58:28Z

The problem is rather the type parameter N which would be required if we want Broadcasted <: ArrayLike

Would it be possible to compute this at compile time?

If we forget backward compatibility, I think it's possible but it'd still be hard to do it without increase compilation time. I think you need to ask @mbauman to be sure. But, as it'd be a breaking change, I think it's impossible to change this in Julia 1.x.

bramtayl · 2020-01-01T20:54:11Z

I thought that widening type parameters from AbstractArray to ArrayLike would be non-breaking, but it turns out that it can create ambiguity errors. Is there a go-to solution for these kind of problems?

tkf · 2020-01-03T05:55:27Z

I think the go-to solution is trait. One idea I had was to use the IndexStyle by defining it for all types:

struct NonIndexable <: IndexStyle end
IndexStyle(::Type) = NonIndexable()

As you can safely call IndexStyle for any type after this, indexability of a type can be queried just by calling IndexStyle(T). Users would then define something like

f(x::MyArrayLike) = _f(IndexStyle(x), x)
_f(::IndexCartesian, x) = ...
_f(::NonIndexable, x) = ...

This way, defining MyArrayLike <: ArrayLike is equivalent to defining IndexStyle(::Type{<:MyArrayLike}) = IndexCartesian() or IndexLinear().

bramtayl · 2020-01-06T19:31:46Z

Ok, I've audited Base, test, and the standard library for AbstractArray, AbstractVector, and AbstractMatrix. If the {T} parameter was extracted and used, I left them alone, and otherwise, I replaced them with ArrayLike, ArrayLike{1}, and ArrayLike{2} respectively. There are a few exceptions which I noted with the # specific comment required to resolve ambiguity.

I'm managed to convince myself that this should be non-breaking (again); the only way it could introduce ambiguities in a package is if that package is conducting piracy. Is that correct?

I used this overly aggressive strategy not because I think that all these methods should be extended to ArrayLike, but only because I think that someone more knowledgeable than me should be the one to decide whether or not it makes sense.

The next step is to see if I can't get Generator{Array} and Broadcasted to be ArrayLike.

mbauman · 2020-01-06T19:59:53Z

I definitely get the need for improvements surrounding non-AbstractArray indexability, but I'm not sure this is it. I've been following this thread closely but I remain skeptical. My reservations include:

Why is dimensionality included in ArrayLike but not the element type? I think it's also likely to have an indexable object that doesn't know its dimensionality in the type domain.
I'm still not clear what the definition of ArrayLike is. Is it exactly an AbstractArray but without the type parameter? Is the documented AbstractArray interface actually an ArrayLike interface? If subtyping AbstractArray{Any} gives you additional functionality over subtyping ArrayLike — what is that additional functionality? Edit: Or is the more specific subtype actually a detriment in functionality?

bramtayl · 2020-01-06T20:33:48Z

I think it's also likely to have an indexable object that doesn't know its dimensionality in the type domain as it is to not know the element type.

I guess I assumed without thinking about it too carefully. Can you give some examples?

Is it exactly an AbstractArray but without the type parameter?

That's how I'm thinking of it.

Is the documented AbstractArray interface actually an ArrayLike interface?

I need to add documentation, but yes, that's how I'm thinking of it, with two exceptions: eltype and similar(A, T) to replace similar(A).

If subtyping AbstractArray gives you additional functionality over subtyping ArrayLike — what is that additional functionality?

I'm not sure about the "should" answer (what do you think?). Currently the "is" answer is that the extra functionality provided by AbstractArrays is simply everything without an ArrayLike method in the PR, that is, every function that dispatches on the eltype. But there are two reasons this could change:

I think that some of the methods in this PR currently extended to ArrayLike should not be.
We might add new generic methods for ArrayLike objects.

mbauman · 2020-01-06T20:46:21Z

It just feels far too fiddly If the only difference between <: ArrayLike{N} and <: AbstractArray{Any,N} is that the latter preserves the Any in new (similar) outputs whereas the former does concrete widening.

When to preserve eltypes is a hard problem — especially in the context of Union{Missing, T}.

If we're inserting a new supertype here, I'd rather it represent a bigger departure from the requirements of AbstractArray.

bramtayl · 2020-01-06T21:24:12Z

That's fair. I'm not particularly wedded to this solution, just looking for a way to get index-able iterators within multiple dispatch.

bramtayl · 2020-01-06T22:23:30Z

If you want to see the kind of things that are possible with this PR, see my fork of MappedArrays https://github.com/bramtayl/MappedArrays.jl which I updated to be ArrayLike

bramtayl · 2020-01-08T06:25:08Z

@mbauman I started to look at the broadcast.jl code to see if I could get it to be <:ArrayLike{N}. It turned out to be remarkably easy:

struct Broadcasted{N, Style<:Union{Nothing,BroadcastStyle}, Axes, F, Args<:Tuple} <: ArrayLike{N}
    f::F
    args::Args
    axes::Axes
end

Broadcasted(f::F, args::Args, axes=nothing) where {F, Args<:Tuple} =
    Broadcasted{length(axes), typeof(combine_styles(args...))}(f, args, axes)
function Broadcasted{Style}(f::F, args::Args, axes=nothing) where {Style, F, Args<:Tuple}
    Broadcasted{length(axes), Style, typeof(axes), Core.Typeof(f), Args}(f, args, axes)
end

And because ArrayLikes automatically come with (most) of the AbstractArray machinery, I suspect that much of the broadcast machinery wouldn't be necessary, or, perhaps, the methods in the broadcast file could be used as generic methods for all ArrayLike objects. It seems like even if we don't export ArrayLike, this could be a win from the standpoint of code reduction?

timholy · 2020-01-08T17:09:57Z

If you want to see the kind of things that are possible with this PR, see my fork of MappedArrays https://github.com/bramtayl/MappedArrays.jl which I updated to be ArrayLike

To me it looks like you had to delete some nice functionality. What's the benefit?

bramtayl · 2020-01-08T17:21:04Z

I did have to delete some stuff, but hopefully some of it can come back:

Offset arrays: had to delete because the OffsetArrays package has a pirate-y similar method that turns into an ambiguity error. This could be fixed by adding a disambiguation method
Reinterpret arrays: I'm not sure how to change it to support ArrayLike objects, but maybe it could be done?
setindex! automatic type conversion: without an eltype, I don't know how we could support this.

Are those the things you were referring to?

The benefit is that it removes the need for the maybe a bit brittle testvalue function:

function testvalue(data)
    if !isempty(data)
        first(data)
    else
        zero(eltype(data))
    end::eltype(data)
end

If a type don't have a zero defined, and if the array contains a small union of eltypes (e.g. Union{Thing ,Missing}, or if the array is empty, then I don't think this method would work?

timholy · 2020-01-08T19:21:14Z

Yeah, that function is brittle. See JuliaArrays/MappedArrays.jl#29 for a potential workaround.

To me this mostly illustrates why we want traits. There's no doubt that it's nice to be able to wrap a more general object, but when it's wrapping an AbstractArray I want to be able to call eltype and have it be treated like an AbstractArray; I also want to have all the printing niceties that I get to inherit/support with AbstractArray. Since MappedArray has to be a subtype of something, the type system has to choose one or the other, but with traits we can have both.

I should get back to andyferris/Traitor.jl#8 someday.

bramtayl · 2020-01-08T19:24:40Z

I would be very happy with a trait based solution too!

bramtayl · 2020-01-09T16:42:57Z

Well, I guess I'll close this in favor of a trait based solution, and hope that it comes sooner rather than later.

bramtayl added 2 commits December 25, 2019 11:31

added ArrayLike

85b1005

Merge https://github.com/bramtayl/julia

4fa41e1

bramtayl commented Dec 25, 2019

View reviewed changes

bramtayl changed the title ~~Added ArrayLike~~ RFC: Added ArrayLike Dec 25, 2019

bramtayl changed the title ~~RFC: Added ArrayLike~~ RFC: Add ArrayLike Dec 25, 2019

serialization fix?

468a005

bramtayl commented Dec 25, 2019

View reviewed changes

bramtayl added 2 commits December 25, 2019 14:38

add docs

3b1845f

fix

c7a2daf

bramtayl added 2 commits January 3, 2020 21:37

Merge https://github.com/JuliaLang/julia

7dc508d

widen methods

d4734e2

fix

9b1b7c4

bramtayl closed this Jan 9, 2020

bramtayl mentioned this pull request Jan 11, 2020

Revisiting fast mapslices #28431

Open

tkf mentioned this pull request Jan 14, 2020

generators should lower to Iterators.map, not Base.Generator #34368

Open

bramtayl mentioned this pull request Jan 21, 2020

Arraylike: AbstractArrays without an eltype #32940

Closed

bramtayl mentioned this pull request Feb 24, 2020

Method inheritance #30408

Closed

bramtayl mentioned this pull request May 19, 2020

Towards true duck-typing #35940

Closed

bramtayl mentioned this pull request Feb 11, 2021

Remove eltype parameter from AbstractArray and use IteratorEltype instead #39626

Open

RFC: Add ArrayLike #34196

RFC: Add ArrayLike #34196

Conversation

bramtayl commented Dec 25, 2019 • edited Loading

bramtayl Dec 25, 2019

Choose a reason for hiding this comment

bramtayl Dec 25, 2019

Choose a reason for hiding this comment

timholy commented Dec 28, 2019 • edited Loading

bramtayl commented Dec 28, 2019

timholy commented Dec 28, 2019

bramtayl commented Dec 28, 2019

timholy commented Dec 28, 2019

bramtayl commented Dec 28, 2019

bramtayl commented Dec 28, 2019

tkf commented Dec 28, 2019

bramtayl commented Dec 29, 2019

timholy commented Dec 29, 2019

tkf commented Dec 29, 2019 • edited Loading

martinholters commented Dec 29, 2019

tkf commented Dec 29, 2019

bramtayl commented Dec 30, 2019

tkf commented Dec 30, 2019

bramtayl commented Dec 30, 2019

tkf commented Dec 30, 2019

bramtayl commented Dec 30, 2019

tkf commented Jan 1, 2020

bramtayl commented Jan 1, 2020

tkf commented Jan 3, 2020

bramtayl commented Jan 6, 2020

mbauman commented Jan 6, 2020 • edited Loading

bramtayl commented Jan 6, 2020

mbauman commented Jan 6, 2020

bramtayl commented Jan 6, 2020

bramtayl commented Jan 6, 2020

bramtayl commented Jan 8, 2020 • edited Loading

timholy commented Jan 8, 2020

bramtayl commented Jan 8, 2020 • edited Loading

timholy commented Jan 8, 2020 • edited Loading

bramtayl commented Jan 8, 2020

bramtayl commented Jan 9, 2020

bramtayl commented Dec 25, 2019 •

edited

Loading

timholy commented Dec 28, 2019 •

edited

Loading

tkf commented Dec 29, 2019 •

edited

Loading

mbauman commented Jan 6, 2020 •

edited

Loading

bramtayl commented Jan 8, 2020 •

edited

Loading

bramtayl commented Jan 8, 2020 •

edited

Loading

timholy commented Jan 8, 2020 •

edited

Loading