Faster mapreduce for Broadcasted #31020

tkf · 2019-02-09T08:43:52Z

I suggest to add indexing-based specializations for Broadcasted in mapreduce. As Broadcasted is almost like an array, it is mostly about "routing" Broadcasted to the right method.

I need to fill some more details, but it already works well for simple example:

julia> a = rand(10_000); b = rand(10_000);

julia> @btime sum(Broadcast.instantiate(Broadcast.broadcasted(*, $a, $b)));
  1.935 μs (2 allocations: 48 bytes)

which is somewhat comparable to LinearAlgebra.dot:

julia> @btime dot($a, $b);
  1.693 μs (0 allocations: 0 bytes)

Before it was 5x slower:

julia> @btime sum(Broadcast.instantiate(Broadcast.broadcasted(*, $a, $b)));
  10.067 μs (2 allocations: 48 bytes)

I'm opening a PR to get some feedback for polishing the implementation. I'll ask specific questions below.

(Edit: now we require Broadcast.instantiate for fast reduce)

tkf · 2019-02-09T08:44:39Z

base/reduce.jl

@@ -12,6 +12,9 @@ else
    const SmallUnsigned = Union{UInt8,UInt16,UInt32}
 end

+abstract type AbstractBroadcasted end
+const AbstractArrayOrBroadcasted = Union{AbstractArray, AbstractBroadcasted}


I needed to define this AbstractBroadcasted as broadcast.jl is loaded later than reduce.jl. ATM it's just a hack to make this work, but I wonder if it makes sense to define

abstract type AbstractIndexable end const Indexable = Union{AbstractArray, AbstractIndexable}

or even

abstract type Indexable{N} end abstract type AbstractArray{T,N} <: Indexable{N} end

so that it's much easier for downstream projects to use indexing-based mapreduce implementation.

Having an abstract type or a trait like Indexable would also be useful to support mapreduce over dimensions with skipmissing (#28027). It would avoid duplicating some methods.

Though a trait would probably be better than an abstract type, since it would be more flexible. In particular, an object returned by skipmissing could be marked as Indexable or not depending on whether it wraps an Indexable object (like an array) or another iterable. It would also allow objects that already inherit from a non-indexable abstract type to implement Indexable if they want, which wouldn't be possible with an abstract type.

If we go with trait direction, one option is to use IndexStyle. Something like this:

struct NonIndexable <: IndexStyle end IndexStyle(::Type) = NonIndexable()

That's a clever idea. I think the union here is sensible for now — that trait could possibly come later.

Differing the whole designing work for the trait makes sense.

@vtjnash mentioned there may be another way around this bootstrapping issue.

Given that the 1.4 feature freeze is approaching, I support merging the PR with the Union for now, and try to use a more general trait later.

FWIW, this PR is giving us incredibly good performance to be able to reuse the pairwise summation algorithm with weights at JuliaStats/StatsBase.jl#518. Great work!

Thanks a lot for testing this PR in a real-world scenario!

tkf · 2019-02-09T08:45:21Z

base/broadcast.jl

+    _combined_indexstyle(Args)
+Base.IndexStyle(::Type{<:Broadcasted{<:Any}}) = IndexCartesian()
+
+Base.LinearIndices(bc::Broadcasted) = LinearIndices(eachindex(bc))


I need to look into how indexing is defined for Broadcasted (especially for how it interacts with Axes) and if this IndexStyle is correct.

However, it seems to me that it'd be much simpler if we define IndexStyle(::Broadcasted) than IndexStyle(::Type{<:Broadcasted}) as it would let us use eachindex(::Broadcasted) in IndexStyle. Does it makes sense to define IndexStyle(::Broadcasted) directly than the type trait?

AFAIK IndexStyle needs to take a type rather than an instance so that you can know statically what kind of indices to expect (that's more or less the definition of a trait). At least that's how its documented.

My comment is based on the implementation detail in reduce.jl that IndexStyle is called on instance

julia/base/reduce.jl

Line 299 in 236c69b

_mapreduce(f, op, A::AbstractArray) = _mapreduce(f, op, IndexStyle(A), A)

which is valid since IndexStyle is callable on AbstractArray:

julia/base/indices.jl

Line 70 in 236c69b

IndexStyle(A::AbstractArray) = IndexStyle(typeof(A))

However, to make IndexStyle a proper trait for Broadcasted, I defined methods for types and added the method for instance on top of them, just like how it's done for AbstractArray:

julia/base/broadcast.jl

Line 227 in f4c68ee

Base.IndexStyle(bc::Broadcasted) = IndexStyle(typeof(bc))

Note that direct definition of IndexStyle(::Broadcasted) would be type-stable if eachindex is so. If eachindex is not type-stable we don't have the performance gain anyway. Initially, I wasn't aiming for adding public interface for Broadcasted in this PR so I thought relying on the implementation detail that IndexStyle is only called on instance was OK. However, I also understand that contaminating "signature space" of IndexStyle is not particularly a clean solution.

If we say that IndexStyle for Broadcasted has to be a proper trait, we need to note that IndexStyle must be properly defined whenever axes is customized since axes is a public interface.

Yeah, but as long as you define IndexStyle it should be a proper trait, independent from how it's used internally. Otherwise, you'd better define a custom internal function.

Anyway, what's the problem with the current implementation in this PR? Just that's it's complicated?

OK. So, reading the documentation and the code more, it looks like Base.axes(::Broadcasted{MyStyle}) is not an overloadable method? From the documentation, Base.axes(x) is overloadable if x participate in the broadcasting. This method is then called via combine_axes in axes(::Broadcasted). I was initially worried that IndexStyle(::Type{<:Broadcasted}) and Base.axes(::Broadcasted) can become incompatible if downstream package authors are not careful but it looks like I don't need to worry about it.

tkf · 2019-02-13T06:52:20Z

base/broadcast.jl

@@ -219,6 +219,12 @@ argtype(bc::Broadcasted) = argtype(typeof(bc))
 _eachindex(t::Tuple{Any}) = t[1]
 _eachindex(t::Tuple) = CartesianIndices(t)

+Base.IndexStyle(bc::Broadcasted) = IndexStyle(typeof(bc))
+Base.IndexStyle(::Type{<:Broadcasted{<:Any,Tuple{Axis}}}) where {Axis} = IndexStyle(Axis)
+Base.IndexStyle(::Type{<:Broadcasted{<:Any}}) = IndexCartesian()


It turned out my previous approach (combine IndexStyle of each argument) was wrong since it didn't work with, e.g., broadcasted(+, zeros(5), zeros(1, 4)). Each argument can be IndexLinear even though the broadcasted indexing is not (that's the main job of broadcasted).

So now the new approach is to return an equivalent of IndexStyle(bc.axes[1]) if length(bc.axes) == 1 and then return IndexCartesian() otherwise. As I need Axes type parameter now, this implementation requires the Broadcasted to be instantiated first. I guess this is OK since that's the usual broadcasting pipeline.

Is this implementation fine? Is IndexStyle(bc.axes[1]) == IndexLinear() && length(bc.axes) == 1 the only possible IndexLinear() case that is detectable in a type-stable manner? That is to say, I'm ignoring the case like broadcasted(+, zeros(1, 4), zeros(1, 4)) as I need to look at the value to check that it supports linear indexing.

This doesn't really require the Broadcasted to be instantiated to get an IndexStyle — it just means that if it's not instantiated that it falls back to a cartesian implementation. All Broadcasteds should support cartesian indexing; only one-dimensional ones will allow straight integers.

Yes, that's much more accurate way to put it. I wanted to say that "for the broadcasted to be dispatched to the optimized method", it has to be instantiated first.

base/broadcast.jl

mbauman · 2019-02-13T21:40:28Z

base/broadcast.jl

@@ -219,6 +219,12 @@ argtype(bc::Broadcasted) = argtype(typeof(bc))
 _eachindex(t::Tuple{Any}) = t[1]
 _eachindex(t::Tuple) = CartesianIndices(t)

+Base.IndexStyle(bc::Broadcasted) = IndexStyle(typeof(bc))
+Base.IndexStyle(::Type{<:Broadcasted{<:Any,Tuple{Axis}}}) where {Axis} = IndexStyle(Axis)
+Base.IndexStyle(::Type{<:Broadcasted{<:Any}}) = IndexCartesian()


This doesn't really require the Broadcasted to be instantiated to get an IndexStyle — it just means that if it's not instantiated that it falls back to a cartesian implementation. All Broadcasteds should support cartesian indexing; only one-dimensional ones will allow straight integers.

base/broadcast.jl

mbauman · 2019-02-13T21:46:55Z

base/reduce.jl

@@ -12,6 +12,9 @@ else
    const SmallUnsigned = Union{UInt8,UInt16,UInt32}
 end

+abstract type AbstractBroadcasted end
+const AbstractArrayOrBroadcasted = Union{AbstractArray, AbstractBroadcasted}


That's a clever idea. I think the union here is sensible for now — that trait could possibly come later.

Co-Authored-By: tkf <[email protected]>

chethega · 2019-02-14T11:48:37Z

Can we also speed up all, any and count?

These reductions currently don't use the mapreduce framework and therefore have to be done separately.

tkf · 2019-02-14T21:31:34Z

I think all and any use mapreduce:

julia/base/reducedim.jl

Lines 664 to 677 in b1acb3c

    
           for (fname, op) in [(:sum, :add_sum), (:prod, :mul_prod), 
        
                               (:maximum, :max), (:minimum, :min), 
        
                               (:all, :&),       (:any, :|)] 
        
               fname! = Symbol(fname, '!') 
        
               _fname = Symbol('_', fname) 
        
               @eval begin 
        
                   $(fname!)(f::Function, r::AbstractArray, A::AbstractArray; init::Bool=true) = 
        
                       mapreducedim!(f, $(op), initarray!(r, $(op), init, A), A) 
        
                   $(fname!)(r::AbstractArray, A::AbstractArray; init::Bool=true) = $(fname!)(identity, r, A; init=init) 
        
                   $(_fname)(A, dims)    = $(_fname)(identity, A, dims) 
        
                   $(_fname)(f, A, dims) = mapreduce(f, $(op), A, dims=dims) 
        
               end 
        
           end

count doesn't, but it should be easy to add (Edit: it's added and tested):

julia/base/reduce.jl

Lines 749 to 755 in 111b385

    
           function count(pred, a::AbstractArray) 
        
               n = 0 
        
               for i in eachindex(a) 
        
                   @inbounds n += pred(a[i])::Bool 
        
               end 
        
               return n 
        
           end

tkf · 2019-02-15T07:28:46Z

@mbauman I added some tests for this feature. This is all I wanted to do in this PR. Please review and merge or let me know if there are anything I need to fix.

mbauman · 2019-02-20T20:02:39Z

This patch looks good to me and seems worth doing. I'd like to hold off just a bit to see what the triage reactions are to #19198 (and its implementation).

StefanKarpinski · 2019-02-20T20:05:33Z

Marked for triage so that it actually gets discussed.

mbauman · 2019-02-20T20:24:55Z

Yeah, I just put #31088 on the triage list earlier this afternoon — that's more the thing to talk about.

mbauman · 2019-03-14T19:51:05Z

Triage agrees this this is worth doing once we finalize #31088.

tkf · 2020-01-07T21:33:09Z

friendly bump

@mbauman Can you review this?

oschulz · 2020-03-11T14:37:51Z

Just curious, any news on this? I saw some nice performance improvements with this.

tkf · 2020-03-11T15:24:07Z

I think there are two questions:

(Q1) Can this PR be merged without a short-hand syntax/macro to construct Broadcasted objects?

(Q2) Is my implementation OK?

While it is better to wait for @mbauman to look into it to answer Q2, I think other core devs can answer Q1 (or maybe it can be briefly discussed in triage?). Maybe @StefanKarpinski can help to clarify Q1?

I'm bringing this up as it seems that triage wanted to implement the syntax at the same time #31020 (comment) and if the answer to Q1 is no, we need to do some more discussion in #19198 while waiting for Q2. It also gives people some idea of how long they need to wait to get this. If this requires a syntax, it's imaginable that this takes way longer than just adding an overload.

PhilipVinc · 2020-03-15T12:16:12Z

Out of curiosity: I just noticed that in 1.4-rc2
~~Juno.@Enter~~ sum(Broadcast.broadcasted(*, rand(10,20), rand(10,20)), dims=1) fails with a no method error.

Since this PR is relevant, does it fix this issue?

(it's a shame it was not included in v1.4)

tkf · 2020-03-15T19:26:37Z

I guess sum with dims is implemented only for arrays before this PR. So, it's relevant in the sense that this PR makes it work.

But I'm not sure about Juno.@enter part. If it's something debugger-specific, maybe report it to Juno or Debugger.jl?

tkf · 2020-03-15T19:28:37Z

Actually, no, this PR does not fix it (yet). It should be easy to add, though.

PhilipVinc · 2020-03-15T19:33:23Z

Sorry, my bad. The Juno.@enter part I left after copy pasting. Of course it's completely irrelevant to my question.

But after looking through your this PR, I noticed you implement Base.mapreducedim! which is what sum with dims calls under the hood, so I thought it should work. But I'm probably wrong.

tkf · 2020-03-15T19:42:21Z

Yeah, we have all the internals we need. But I think we still need to define the dispatch. For sum etc., this is just replacing AbstractArray with AbstractArrayOrBroadcasted in

julia/base/reducedim.jl

Lines 648 to 653 in 94b29d5

    
           for (fname, _fname, op) in [(:sum,     :_sum,     :add_sum), (:prod,    :_prod,    :mul_prod), 
        
                                       (:maximum, :_maximum, :max),     (:minimum, :_minimum, :min)] 
        
               @eval begin 
        
                   # User-facing methods with keyword arguments 
        
                   @inline ($fname)(a::AbstractArray; dims=:) = ($_fname)(a, dims) 
        
                   @inline ($fname)(f, a::AbstractArray; dims=:) = ($_fname)(f, a, dims)

I'm suspecting there is a somewhat long list of functions receiving dims. So I think I'll do it just before or after this PR is merged (as otherwise, it can cause conflict with other PRs).

mbauman · 2020-04-28T03:26:39Z

So to (finally) answer your two questions:

Q1 (Can this PR be merged without a short-hand syntax/macro to construct Broadcasted objects?): Absofreakenlutely
Q2 (Is my implementation OK?): Yes, looks great to me.

tkf · 2020-04-28T04:28:43Z

Great! Thanks a lot for reviewing this.

oschulz · 2020-04-28T06:22:51Z

Now it's approved - Is there a chance for this to make it into 1.5?

KristofferC · 2020-04-28T07:08:32Z

It would be good re-rebase this on master so that CI reruns on top of that (since it was a while CI ran).

tkf · 2020-04-28T07:36:14Z

I suppose merging master to this branch should be fine (as the commits are going to be squashed)? I can rebase, of course, if it's better.

martinholters · 2020-08-24T13:00:40Z

base/broadcast.jl

+Base.IndexStyle(::Type{<:Broadcasted{<:Any,<:Tuple{Any}}}) = IndexLinear()
+Base.IndexStyle(::Type{<:Broadcasted{<:Any}}) = IndexCartesian()
+
+Base.LinearIndices(bc::Broadcasted{<:Any,<:Tuple{Any}}) = axes(bc)[1]


Am right that this usually won't return a LinearIndices? Should this rather be LinearIndices(axes(bc))?

Yeah, that'd probably be better. This sort of specialization should probably be on eachindex(::IndexLinear, ...) or LinearIndices itself if it's even needed at all.

Better mapreduce for Broadcasted

f4c68ee

tkf commented Feb 9, 2019

View reviewed changes

tkf mentioned this pull request Feb 9, 2019

Lazy broadcasting macro JuliaArrays/LazyArrays.jl#21

Merged

tkf changed the title ~~WIP: Use indexing-based mapreduce for Broadcasted~~ WIP: Faster mapreduce for Broadcasted Feb 9, 2019

nalimilan requested a review from mbauman February 11, 2019 08:46

Use Axes in IndexStyle for Broadcasted

85f6405

tkf commented Feb 13, 2019

View reviewed changes

mbauman reviewed Feb 13, 2019

View reviewed changes

mbauman and others added 5 commits February 13, 2019 15:16

Apply suggestions from code review

cc7c05a

Co-Authored-By: tkf <[email protected]>

Update base/broadcast.jl

1a60367

Co-Authored-By: tkf <[email protected]>

Fix IndexStyle for IndexLinear case

ae55dc2

Fix LinearIndices for Broadcasted

ece81cf

Test that pairwise mapreduce is used

ee68a45

tkf force-pushed the bcreduce branch from b4b956a to ee68a45 Compare February 14, 2019 07:52

Test count(::Broadcasted)

758e003

tkf changed the title ~~WIP: Faster mapreduce for Broadcasted~~ Faster mapreduce for Broadcasted Feb 16, 2019

tkf mentioned this pull request Feb 16, 2019

Notation for lazy map #19198

Open

StefanKarpinski added the status:triage This should be discussed on a triage call label Feb 20, 2019

mbauman mentioned this pull request Feb 28, 2019

RFC: Use @: to construct a broadcasted object #31088

Open

nalimilan mentioned this pull request Mar 3, 2019

Allow Base.filter to work with all iterators #31188

Open

mbauman added domain:broadcast Applying a function over a collection and removed status:triage This should be discussed on a triage call labels Mar 14, 2019

tkf mentioned this pull request Dec 28, 2019

RFC: Add ArrayLike #34196

Closed

tkf mentioned this pull request Jan 7, 2020

Transducer as an optimization: map, filter and flatten #33526

Merged

tkf mentioned this pull request Feb 8, 2020

keys(::Generator) for find* and arg* #34674

Closed

This was referenced Mar 5, 2020

Add function + two-argument method to reducers #35017

Closed

Non-propogation, skipmissing-related improvements to Missing handling. #35050

Open

mbauman approved these changes Apr 28, 2020

View reviewed changes

Merge branch 'master' into bcreduce

cedeec6

mbauman merged commit 2f90dde into JuliaLang:master Apr 30, 2020

tkf deleted the bcreduce branch April 30, 2020 22:17

nalimilan mentioned this pull request May 2, 2020

Make covariance and correlation work for iterators, skipmissing in particular. JuliaStats/Statistics.jl#34

Open

maleadt mentioned this pull request May 7, 2020

Support and use broadcast with mapreduce. JuliaGPU/GPUArrays.jl#270

Merged

martinholters reviewed Aug 24, 2020

View reviewed changes

mcabbott mentioned this pull request Jun 2, 2021

reduce_init undefined for Broadcasted #41054

Open

devmotion mentioned this pull request Oct 22, 2021

Use pairwise summation in loglikelihood JuliaStats/Distributions.jl#1409

Closed

devmotion mentioned this pull request Dec 1, 2021

Benchmarking expected_loglik JuliaGaussianProcesses/ApproximateGPs.jl#82

Open

devmotion mentioned this pull request Jan 15, 2023

Addressing performance issues with broadcasting TuringLang/DistributionsAD.jl#230

Closed

devmotion mentioned this pull request Nov 18, 2023

Weighted mean with function JuliaStats/StatsBase.jl#886

Open

Faster mapreduce for Broadcasted #31020

Faster mapreduce for Broadcasted #31020

Conversation

tkf commented Feb 9, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chethega commented Feb 14, 2019

tkf commented Feb 14, 2019 • edited Loading

tkf commented Feb 15, 2019

mbauman commented Feb 20, 2019

StefanKarpinski commented Feb 20, 2019

mbauman commented Feb 20, 2019

mbauman commented Mar 14, 2019

tkf commented Jan 7, 2020

oschulz commented Mar 11, 2020

tkf commented Mar 11, 2020

PhilipVinc commented Mar 15, 2020 • edited Loading

tkf commented Mar 15, 2020

tkf commented Mar 15, 2020

PhilipVinc commented Mar 15, 2020 • edited Loading

tkf commented Mar 15, 2020

mbauman commented Apr 28, 2020

tkf commented Apr 28, 2020

oschulz commented Apr 28, 2020

KristofferC commented Apr 28, 2020

tkf commented Apr 28, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tkf commented Feb 9, 2019 •

edited

Loading

tkf commented Feb 14, 2019 •

edited

Loading

PhilipVinc commented Mar 15, 2020 •

edited

Loading

PhilipVinc commented Mar 15, 2020 •

edited

Loading