Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster mapreduce for Broadcasted #31020

Merged
merged 12 commits into from
Apr 30, 2020
Merged

Faster mapreduce for Broadcasted #31020

merged 12 commits into from
Apr 30, 2020

Conversation

tkf
Copy link
Member

@tkf tkf commented Feb 9, 2019

I suggest to add indexing-based specializations for Broadcasted in mapreduce. As Broadcasted is almost like an array, it is mostly about "routing" Broadcasted to the right method.

I need to fill some more details, but it already works well for simple example:

julia> a = rand(10_000); b = rand(10_000);

julia> @btime sum(Broadcast.instantiate(Broadcast.broadcasted(*, $a, $b)));
  1.935 μs (2 allocations: 48 bytes)

which is somewhat comparable to LinearAlgebra.dot:

julia> @btime dot($a, $b);
  1.693 μs (0 allocations: 0 bytes)

Before it was 5x slower:

julia> @btime sum(Broadcast.instantiate(Broadcast.broadcasted(*, $a, $b)));
  10.067 μs (2 allocations: 48 bytes)

I'm opening a PR to get some feedback for polishing the implementation. I'll ask specific questions below.

(Edit: now we require Broadcast.instantiate for fast reduce)

@@ -12,6 +12,9 @@ else
const SmallUnsigned = Union{UInt8,UInt16,UInt32}
end

abstract type AbstractBroadcasted end
const AbstractArrayOrBroadcasted = Union{AbstractArray, AbstractBroadcasted}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I needed to define this AbstractBroadcasted as broadcast.jl is loaded later than reduce.jl. ATM it's just a hack to make this work, but I wonder if it makes sense to define

abstract type AbstractIndexable end
const Indexable = Union{AbstractArray, AbstractIndexable}

or even

abstract type Indexable{N} end
abstract type AbstractArray{T,N} <: Indexable{N} end

so that it's much easier for downstream projects to use indexing-based mapreduce implementation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having an abstract type or a trait like Indexable would also be useful to support mapreduce over dimensions with skipmissing (#28027). It would avoid duplicating some methods.

Though a trait would probably be better than an abstract type, since it would be more flexible. In particular, an object returned by skipmissing could be marked as Indexable or not depending on whether it wraps an Indexable object (like an array) or another iterable. It would also allow objects that already inherit from a non-indexable abstract type to implement Indexable if they want, which wouldn't be possible with an abstract type.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we go with trait direction, one option is to use IndexStyle. Something like this:

struct NonIndexable <: IndexStyle end
IndexStyle(::Type) = NonIndexable()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a clever idea. I think the union here is sensible for now — that trait could possibly come later.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Differing the whole designing work for the trait makes sense.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vtjnash mentioned there may be another way around this bootstrapping issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that the 1.4 feature freeze is approaching, I support merging the PR with the Union for now, and try to use a more general trait later.

FWIW, this PR is giving us incredibly good performance to be able to reuse the pairwise summation algorithm with weights at JuliaStats/StatsBase.jl#518. Great work!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for testing this PR in a real-world scenario!

_combined_indexstyle(Args)
Base.IndexStyle(::Type{<:Broadcasted{<:Any}}) = IndexCartesian()

Base.LinearIndices(bc::Broadcasted) = LinearIndices(eachindex(bc))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to look into how indexing is defined for Broadcasted (especially for how it interacts with Axes) and if this IndexStyle is correct.

However, it seems to me that it'd be much simpler if we define IndexStyle(::Broadcasted) than IndexStyle(::Type{<:Broadcasted}) as it would let us use eachindex(::Broadcasted) in IndexStyle. Does it makes sense to define IndexStyle(::Broadcasted) directly than the type trait?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK IndexStyle needs to take a type rather than an instance so that you can know statically what kind of indices to expect (that's more or less the definition of a trait). At least that's how its documented.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comment is based on the implementation detail in reduce.jl that IndexStyle is called on instance

_mapreduce(f, op, A::AbstractArray) = _mapreduce(f, op, IndexStyle(A), A)

which is valid since IndexStyle is callable on AbstractArray:

IndexStyle(A::AbstractArray) = IndexStyle(typeof(A))

However, to make IndexStyle a proper trait for Broadcasted, I defined methods for types and added the method for instance on top of them, just like how it's done for AbstractArray:

Base.IndexStyle(bc::Broadcasted) = IndexStyle(typeof(bc))

Note that direct definition of IndexStyle(::Broadcasted) would be type-stable if eachindex is so. If eachindex is not type-stable we don't have the performance gain anyway. Initially, I wasn't aiming for adding public interface for Broadcasted in this PR so I thought relying on the implementation detail that IndexStyle is only called on instance was OK. However, I also understand that contaminating "signature space" of IndexStyle is not particularly a clean solution.

If we say that IndexStyle for Broadcasted has to be a proper trait, we need to note that IndexStyle must be properly defined whenever axes is customized since axes is a public interface.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but as long as you define IndexStyle it should be a proper trait, independent from how it's used internally. Otherwise, you'd better define a custom internal function.

Anyway, what's the problem with the current implementation in this PR? Just that's it's complicated?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. So, reading the documentation and the code more, it looks like Base.axes(::Broadcasted{MyStyle}) is not an overloadable method? From the documentation, Base.axes(x) is overloadable if x participate in the broadcasting. This method is then called via combine_axes in axes(::Broadcasted). I was initially worried that IndexStyle(::Type{<:Broadcasted}) and Base.axes(::Broadcasted) can become incompatible if downstream package authors are not careful but it looks like I don't need to worry about it.

@tkf tkf changed the title WIP: Use indexing-based mapreduce for Broadcasted WIP: Faster mapreduce for Broadcasted Feb 9, 2019
@nalimilan nalimilan requested a review from mbauman February 11, 2019 08:46
@@ -219,6 +219,12 @@ argtype(bc::Broadcasted) = argtype(typeof(bc))
_eachindex(t::Tuple{Any}) = t[1]
_eachindex(t::Tuple) = CartesianIndices(t)

Base.IndexStyle(bc::Broadcasted) = IndexStyle(typeof(bc))
Base.IndexStyle(::Type{<:Broadcasted{<:Any,Tuple{Axis}}}) where {Axis} = IndexStyle(Axis)
Base.IndexStyle(::Type{<:Broadcasted{<:Any}}) = IndexCartesian()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turned out my previous approach (combine IndexStyle of each argument) was wrong since it didn't work with, e.g., broadcasted(+, zeros(5), zeros(1, 4)). Each argument can be IndexLinear even though the broadcasted indexing is not (that's the main job of broadcasted).

So now the new approach is to return an equivalent of IndexStyle(bc.axes[1]) if length(bc.axes) == 1 and then return IndexCartesian() otherwise. As I need Axes type parameter now, this implementation requires the Broadcasted to be instantiated first. I guess this is OK since that's the usual broadcasting pipeline.

Is this implementation fine? Is IndexStyle(bc.axes[1]) == IndexLinear() && length(bc.axes) == 1 the only possible IndexLinear() case that is detectable in a type-stable manner? That is to say, I'm ignoring the case like broadcasted(+, zeros(1, 4), zeros(1, 4)) as I need to look at the value to check that it supports linear indexing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't really require the Broadcasted to be instantiated to get an IndexStyle — it just means that if it's not instantiated that it falls back to a cartesian implementation. All Broadcasteds should support cartesian indexing; only one-dimensional ones will allow straight integers.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's much more accurate way to put it. I wanted to say that "for the broadcasted to be dispatched to the optimized method", it has to be instantiated first.

base/broadcast.jl Outdated Show resolved Hide resolved
@@ -219,6 +219,12 @@ argtype(bc::Broadcasted) = argtype(typeof(bc))
_eachindex(t::Tuple{Any}) = t[1]
_eachindex(t::Tuple) = CartesianIndices(t)

Base.IndexStyle(bc::Broadcasted) = IndexStyle(typeof(bc))
Base.IndexStyle(::Type{<:Broadcasted{<:Any,Tuple{Axis}}}) where {Axis} = IndexStyle(Axis)
Base.IndexStyle(::Type{<:Broadcasted{<:Any}}) = IndexCartesian()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't really require the Broadcasted to be instantiated to get an IndexStyle — it just means that if it's not instantiated that it falls back to a cartesian implementation. All Broadcasteds should support cartesian indexing; only one-dimensional ones will allow straight integers.

base/broadcast.jl Outdated Show resolved Hide resolved
@@ -12,6 +12,9 @@ else
const SmallUnsigned = Union{UInt8,UInt16,UInt32}
end

abstract type AbstractBroadcasted end
const AbstractArrayOrBroadcasted = Union{AbstractArray, AbstractBroadcasted}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a clever idea. I think the union here is sensible for now — that trait could possibly come later.

@chethega
Copy link
Contributor

Can we also speed up all, any and count?

These reductions currently don't use the mapreduce framework and therefore have to be done separately.

@tkf
Copy link
Member Author

tkf commented Feb 14, 2019

I think all and any use mapreduce:

julia/base/reducedim.jl

Lines 664 to 677 in b1acb3c

for (fname, op) in [(:sum, :add_sum), (:prod, :mul_prod),
(:maximum, :max), (:minimum, :min),
(:all, :&), (:any, :|)]
fname! = Symbol(fname, '!')
_fname = Symbol('_', fname)
@eval begin
$(fname!)(f::Function, r::AbstractArray, A::AbstractArray; init::Bool=true) =
mapreducedim!(f, $(op), initarray!(r, $(op), init, A), A)
$(fname!)(r::AbstractArray, A::AbstractArray; init::Bool=true) = $(fname!)(identity, r, A; init=init)
$(_fname)(A, dims) = $(_fname)(identity, A, dims)
$(_fname)(f, A, dims) = mapreduce(f, $(op), A, dims=dims)
end
end

count doesn't, but it should be easy to add (Edit: it's added and tested):

julia/base/reduce.jl

Lines 749 to 755 in 111b385

function count(pred, a::AbstractArray)
n = 0
for i in eachindex(a)
@inbounds n += pred(a[i])::Bool
end
return n
end

@tkf
Copy link
Member Author

tkf commented Feb 15, 2019

@mbauman I added some tests for this feature. This is all I wanted to do in this PR. Please review and merge or let me know if there are anything I need to fix.

@tkf tkf changed the title WIP: Faster mapreduce for Broadcasted Faster mapreduce for Broadcasted Feb 16, 2019
@tkf tkf mentioned this pull request Feb 16, 2019
@mbauman
Copy link
Member

mbauman commented Feb 20, 2019

This patch looks good to me and seems worth doing. I'd like to hold off just a bit to see what the triage reactions are to #19198 (and its implementation).

@StefanKarpinski StefanKarpinski added the triage This should be discussed on a triage call label Feb 20, 2019
@StefanKarpinski
Copy link
Member

Marked for triage so that it actually gets discussed.

@mbauman
Copy link
Member

mbauman commented Feb 20, 2019

Yeah, I just put #31088 on the triage list earlier this afternoon — that's more the thing to talk about.

@mbauman mbauman added broadcast Applying a function over a collection and removed triage This should be discussed on a triage call labels Mar 14, 2019
@mbauman
Copy link
Member

mbauman commented Mar 14, 2019

Triage agrees this this is worth doing once we finalize #31088.

@tkf
Copy link
Member Author

tkf commented Jan 7, 2020

friendly bump

@mbauman Can you review this?

@oschulz
Copy link
Contributor

oschulz commented Mar 11, 2020

Just curious, any news on this? I saw some nice performance improvements with this.

@tkf
Copy link
Member Author

tkf commented Mar 11, 2020

I think there are two questions:

(Q1) Can this PR be merged without a short-hand syntax/macro to construct Broadcasted objects?

(Q2) Is my implementation OK?

While it is better to wait for @mbauman to look into it to answer Q2, I think other core devs can answer Q1 (or maybe it can be briefly discussed in triage?). Maybe @StefanKarpinski can help to clarify Q1?

I'm bringing this up as it seems that triage wanted to implement the syntax at the same time #31020 (comment) and if the answer to Q1 is no, we need to do some more discussion in #19198 while waiting for Q2. It also gives people some idea of how long they need to wait to get this. If this requires a syntax, it's imaginable that this takes way longer than just adding an overload.

@PhilipVinc
Copy link

PhilipVinc commented Mar 15, 2020

Out of curiosity: I just noticed that in 1.4-rc2
Juno.@Enter sum(Broadcast.broadcasted(*, rand(10,20), rand(10,20)), dims=1) fails with a no method error.

Since this PR is relevant, does it fix this issue?

(it's a shame it was not included in v1.4)

@tkf
Copy link
Member Author

tkf commented Mar 15, 2020

I guess sum with dims is implemented only for arrays before this PR. So, it's relevant in the sense that this PR makes it work.

But I'm not sure about Juno.@enter part. If it's something debugger-specific, maybe report it to Juno or Debugger.jl?

@tkf
Copy link
Member Author

tkf commented Mar 15, 2020

Actually, no, this PR does not fix it (yet). It should be easy to add, though.

@PhilipVinc
Copy link

PhilipVinc commented Mar 15, 2020

Sorry, my bad. The Juno.@enter part I left after copy pasting. Of course it's completely irrelevant to my question.

But after looking through your this PR, I noticed you implement Base.mapreducedim! which is what sum with dims calls under the hood, so I thought it should work. But I'm probably wrong.

@tkf
Copy link
Member Author

tkf commented Mar 15, 2020

Yeah, we have all the internals we need. But I think we still need to define the dispatch. For sum etc., this is just replacing AbstractArray with AbstractArrayOrBroadcasted in

julia/base/reducedim.jl

Lines 648 to 653 in 94b29d5

for (fname, _fname, op) in [(:sum, :_sum, :add_sum), (:prod, :_prod, :mul_prod),
(:maximum, :_maximum, :max), (:minimum, :_minimum, :min)]
@eval begin
# User-facing methods with keyword arguments
@inline ($fname)(a::AbstractArray; dims=:) = ($_fname)(a, dims)
@inline ($fname)(f, a::AbstractArray; dims=:) = ($_fname)(f, a, dims)

I'm suspecting there is a somewhat long list of functions receiving dims. So I think I'll do it just before or after this PR is merged (as otherwise, it can cause conflict with other PRs).

@mbauman
Copy link
Member

mbauman commented Apr 28, 2020

So to (finally) answer your two questions:

  • Q1 (Can this PR be merged without a short-hand syntax/macro to construct Broadcasted objects?): Absofreakenlutely
  • Q2 (Is my implementation OK?): Yes, looks great to me.

@tkf
Copy link
Member Author

tkf commented Apr 28, 2020

Great! Thanks a lot for reviewing this.

@oschulz
Copy link
Contributor

oschulz commented Apr 28, 2020

Now it's approved - Is there a chance for this to make it into 1.5?

@KristofferC
Copy link
Member

It would be good re-rebase this on master so that CI reruns on top of that (since it was a while CI ran).

@tkf
Copy link
Member Author

tkf commented Apr 28, 2020

I suppose merging master to this branch should be fine (as the commits are going to be squashed)? I can rebase, of course, if it's better.

Base.IndexStyle(::Type{<:Broadcasted{<:Any,<:Tuple{Any}}}) = IndexLinear()
Base.IndexStyle(::Type{<:Broadcasted{<:Any}}) = IndexCartesian()

Base.LinearIndices(bc::Broadcasted{<:Any,<:Tuple{Any}}) = axes(bc)[1]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am right that this usually won't return a LinearIndices? Should this rather be LinearIndices(axes(bc))?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that'd probably be better. This sort of specialization should probably be on eachindex(::IndexLinear, ...) or LinearIndices itself if it's even needed at all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
broadcast Applying a function over a collection
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants