Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interfaces for immutable arrays #11610

Closed
SimonDanisch opened this issue Jun 7, 2015 · 11 comments
Closed

Interfaces for immutable arrays #11610

SimonDanisch opened this issue Jun 7, 2015 · 11 comments

Comments

@SimonDanisch
Copy link
Contributor

I hope this is not too redundant. I have the feeling that this was discussed before, but I couldn't find the issues.
Now that #10525 is coming around, I was digging around in array.jl to get an idea of how far we're away of having all array functions of FixedSizeArrays defined by simply inheriting from AbstractArray.
(SimonDanisch/FixedSizeArrays.jl#6)
The similar function seems to be the biggest problem, as this pattern is more or less impossible for immutable types. I think the new getindex operations in #10525 also relies on similar :(
I created quite a few mapping operations in order to avoid similar, which helped to reduce most of the indexing and other operations to a call to map.

So this :

function ($f){S,T}(A::AbstractArray{S}, B::AbstractArray{T})
            F = similar(A, promote_type(S,T), promote_shape(size(A),size(B)))
            for i in eachindex(A,B)
                @inbounds F[i] = ($f)(A[i], B[i])
            end
            return F
end

Would need to change.
We could just use map($(generatefunctor(f)), A, B) in this place and we would be all set, I think.
If we do this elegantly, we could just use map and map! to also generate in-place versions without much effort.
Are there reasons to not have it this way?
The only explanation I came up with is, that the current map implementation seems to guard against functions which are not type stable, which is not necessary in this well-defined case.

I think all this is another call for having return types and purity encoded into the function type.
With that, we could even default to a parallel map implementations for pure functions.
Or define an OpenCL kernel for map, which seems to generate superior assembly for mapping operations. (and all functions like +,.* etc would immediately be made faster)
Even GPU arrays could just inherit from abstract array and would have all base functions defined for it (Especially if we enable SPIR as an LLVM target. But even now this would be possible with a well-defined set of functors, for which we create a lookup table with OpenCL kernel source).

To sum this up, I just want to ask if we can use map here and if not, discuss what other options we have.

Here are some benchmark from Commit 2fda426 (1 day old master) (windows):

function parallelmap{S,T}(f, A::AbstractArray{S}, B::AbstractArray{T})
    F = similar(A, promote_type(S,T), promote_shape(size(A),size(B)))
    @parallel for i in eachindex(A,B)
        @inbounds F[i] = f(A[i], B[i])
    end
    return F
end
function simplemap{S,T}(f, A::AbstractArray{S}, B::AbstractArray{T})
    F = similar(A, promote_type(S,T), promote_shape(size(A),size(B)))
    for i in eachindex(A,B)
        @inbounds F[i] = f(A[i], B[i])
    end
    return F
end

const a = rand(10^7)
const b = rand(10^7)


@time c = parallelmap(+, a, b)
@time parallelmap(+, a, b)
@time d = parallelmap(Base.AddFun(), a, b)
@time parallelmap(Base.AddFun(), a, b)


@time e = simplemap(+, a, b)
@time simplemap(+, a, b)
@time f = simplemap(Base.AddFun(), a, b)
@time simplemap(Base.AddFun(), a, b)

@time g = map(+, a, b)
@time map(+, a, b)
@time h = map(Base.AddFun(), a, b)
@time map(Base.AddFun(), a, b)

@assert c==d && c==e && c==f && c==g && c==h

parallelmap: 1.181 milliseconds (35 allocations: 78127 KB, 90.36% gc time)
functor: 1.146 milliseconds (35 allocations: 78127 KB, 84.34% gc time)

simplemap: 1.177 seconds      (49999 k allocations: 839 MB, 3.62% gc time)
functor: 36.936 milliseconds (8 allocations: 78125 KB, 1.21% gc time)

map: 512.129 milliseconds (30029 k allocations: 535 MB, 13.29% gc time)
functor:  53.398 milliseconds (8 allocations: 78125 KB, 1.06% gc time)

Best,
Simon

@JeffBezanson
Copy link
Sponsor Member

functions which are not type stable, which is not necessary in this well-defined case.

What exactly is the well-defined case?

I think the term "type stable" is getting a bit out of hand. Consider

A = Any["x", 1.0, [1]]
map(identity, A)

Is the identity function type stable? Don't ask me; the term does not occur in my thesis :P

@JeffBezanson
Copy link
Sponsor Member

Also #8450

@SimonDanisch
Copy link
Contributor Author

Haha, yeah... Sorry, for contributing to further dilute this term.
But it's not really the point here ;)
I was just trying to understand, why anyone would decide against the use of map in that case.
It's pretty annoying, that a lot of code connected to AbstractArrays uses similar, which completely blocks any immutable array from using that interface.
Especially, as I'm getting a lot of warnings when I do this:

import Base: sum
abstract FixedArray{T, NDim} <: AbstractArray{T, NDim}
sum{T <: FixedArray}(x::T) = ...
Warning: New definition 
    sum(#T<:Main.FixedArray) at C:\Users\Sim\.julia\v0.4\GLVisualize\test\runtests.jl:4
is ambiguous with: 
    sum(AbstractArray{Bool, N<:Any}) at reduce.jl:191.
To fix, define 
    sum(_<:Main.FixedArray{Bool, N<:Any})
before the new definition.
Warning: New definition 
    sum(#T<:Main.FixedArray) at C:\Users\Sim\.julia\v0.4\GLVisualize\test\runtests.jl:4
is ambiguous with: 
    sum(AbstractArray{Base.GMP.BigInt, N<:Any}) at gmp.jl:455.
To fix, define 
    sum(_<:Main.FixedArray{Base.GMP.BigInt, N<:Any})
before the new definition.
Warning: New definition 
    sum(#T<:Main.FixedArray) at C:\Users\Sim\.julia\v0.4\GLVisualize\test\runtests.jl:4
is ambiguous with: 
    sum(AbstractArray{Base.MPFR.BigFloat, N<:Any}) at mpfr.jl:537.
To fix, define 
    sum(_<:Main.FixedArray{Base.MPFR.BigFloat, N<:Any})
before the new definition.

I'm not even sure if these warnings are even justified.

@SimonDanisch
Copy link
Contributor Author

The benchmark results are a little weird though. I think the @time macro does not work with @parallel
correctly, as it should yield the slowest timings. I looked up @parallel and it said it actually fetches the values before returning, and it also copies the values used. Which explains why it's not used in base :D

@timholy
Copy link
Sponsor Member

timholy commented Jun 8, 2015

You need @sync around @parallel blocks if you want to make sure the computation is actually done---otherwise the time returned by @time only reflects the time required to "launch" rather than "complete" the task.

Is your concern about similar simply that it's relatively useless for an array type with immutable elements? You can, of course, override similar(F::FixedArray, ...) to return another FixedArray, so if FixedArray has mutable elements then this will be sufficient. For arrays with immutable elements, I suspect we have to rethink more than just similar, and a more map-like interface will indeed be necessary.

Unfortunately, I fear those ambiguity warnings are justified (at least, in their current form).

@SimonDanisch
Copy link
Contributor Author

@async

Good to know!

Unfortunately, I fear those ambiguity warnings are justified (at least, in their current form).

Makes sense now. If you have FSA{BigFloat, N} and the signatures AbstractArray{BigFloat, N} and FSA{T, N}, it is not clear what type fits better.

Yes, immutable arrays with immutable elements are the issue (like 90% of the types in FixedSizeArrays)

@mbauman
Copy link
Sponsor Member

mbauman commented Jun 8, 2015

There are currently some performance issues with indexing into SubArrays, but it'd be very interesting to see what would happen if you simply defined:

getindex(FA::FixedArray, I::Integer...) =  # scalar lookup
getindex(FA::FixedArray, I...) = sub(FA, I...) # nonscalar indexing returns a SubArray

It's definitely worth considering how we can improve the interfaces for arrays with immutable elements. Actually, would you mind changing the issue title? "Interfaces for arrays with immutable elements" seems like a good description here. I was a little confused what this was about at first.

@SimonDanisch SimonDanisch changed the title AbstractArray should use map Interfaces for arrays with immutable elements Jun 8, 2015
@tkelman
Copy link
Contributor

tkelman commented Jun 8, 2015

Kind of a counterpoint to #11574, now that similar is an important part of the AbstractArray interface post-#10525.

@JeffBezanson JeffBezanson changed the title Interfaces for arrays with immutable elements Interfaces for immutable arrays Jun 9, 2015
@JeffBezanson
Copy link
Sponsor Member

related: #6960 #10001

I agree that it would be better to move to using map more where appropriate.

@KristofferC
Copy link
Sponsor Member

Anything here not included in something like #11902?

@SimonDanisch
Copy link
Contributor Author

This can be closed, now that almost everything is a broadcast ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants