-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster mapreduce for Broadcasted #31020
Conversation
@@ -12,6 +12,9 @@ else | |||
const SmallUnsigned = Union{UInt8,UInt16,UInt32} | |||
end | |||
|
|||
abstract type AbstractBroadcasted end | |||
const AbstractArrayOrBroadcasted = Union{AbstractArray, AbstractBroadcasted} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I needed to define this AbstractBroadcasted
as broadcast.jl is loaded later than reduce.jl. ATM it's just a hack to make this work, but I wonder if it makes sense to define
abstract type AbstractIndexable end
const Indexable = Union{AbstractArray, AbstractIndexable}
or even
abstract type Indexable{N} end
abstract type AbstractArray{T,N} <: Indexable{N} end
so that it's much easier for downstream projects to use indexing-based mapreduce
implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having an abstract type or a trait like Indexable
would also be useful to support mapreduce
over dimensions with skipmissing
(#28027). It would avoid duplicating some methods.
Though a trait would probably be better than an abstract type, since it would be more flexible. In particular, an object returned by skipmissing
could be marked as Indexable
or not depending on whether it wraps an Indexable
object (like an array) or another iterable. It would also allow objects that already inherit from a non-indexable abstract type to implement Indexable
if they want, which wouldn't be possible with an abstract type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we go with trait direction, one option is to use IndexStyle
. Something like this:
struct NonIndexable <: IndexStyle end
IndexStyle(::Type) = NonIndexable()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a clever idea. I think the union here is sensible for now — that trait could possibly come later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Differing the whole designing work for the trait makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vtjnash mentioned there may be another way around this bootstrapping issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that the 1.4 feature freeze is approaching, I support merging the PR with the Union
for now, and try to use a more general trait later.
FWIW, this PR is giving us incredibly good performance to be able to reuse the pairwise summation algorithm with weights at JuliaStats/StatsBase.jl#518. Great work!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for testing this PR in a real-world scenario!
base/broadcast.jl
Outdated
_combined_indexstyle(Args) | ||
Base.IndexStyle(::Type{<:Broadcasted{<:Any}}) = IndexCartesian() | ||
|
||
Base.LinearIndices(bc::Broadcasted) = LinearIndices(eachindex(bc)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to look into how indexing is defined for Broadcasted
(especially for how it interacts with Axes
) and if this IndexStyle
is correct.
However, it seems to me that it'd be much simpler if we define IndexStyle(::Broadcasted)
than IndexStyle(::Type{<:Broadcasted})
as it would let us use eachindex(::Broadcasted)
in IndexStyle
. Does it makes sense to define IndexStyle(::Broadcasted)
directly than the type trait?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK IndexStyle
needs to take a type rather than an instance so that you can know statically what kind of indices to expect (that's more or less the definition of a trait). At least that's how its documented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My comment is based on the implementation detail in reduce.jl that IndexStyle
is called on instance
Line 299 in 236c69b
_mapreduce(f, op, A::AbstractArray) = _mapreduce(f, op, IndexStyle(A), A) |
which is valid since IndexStyle
is callable on AbstractArray
:
Line 70 in 236c69b
IndexStyle(A::AbstractArray) = IndexStyle(typeof(A)) |
However, to make IndexStyle
a proper trait for Broadcasted
, I defined methods for types and added the method for instance on top of them, just like how it's done for AbstractArray
:
Line 227 in f4c68ee
Base.IndexStyle(bc::Broadcasted) = IndexStyle(typeof(bc)) |
Note that direct definition of IndexStyle(::Broadcasted)
would be type-stable if eachindex
is so. If eachindex
is not type-stable we don't have the performance gain anyway. Initially, I wasn't aiming for adding public interface for Broadcasted
in this PR so I thought relying on the implementation detail that IndexStyle
is only called on instance was OK. However, I also understand that contaminating "signature space" of IndexStyle
is not particularly a clean solution.
If we say that IndexStyle
for Broadcasted
has to be a proper trait, we need to note that IndexStyle
must be properly defined whenever axes
is customized since axes
is a public interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, but as long as you define IndexStyle
it should be a proper trait, independent from how it's used internally. Otherwise, you'd better define a custom internal function.
Anyway, what's the problem with the current implementation in this PR? Just that's it's complicated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. So, reading the documentation and the code more, it looks like Base.axes(::Broadcasted{MyStyle})
is not an overloadable method? From the documentation, Base.axes(x)
is overloadable if x
participate in the broadcasting. This method is then called via combine_axes
in axes(::Broadcasted)
. I was initially worried that IndexStyle(::Type{<:Broadcasted})
and Base.axes(::Broadcasted)
can become incompatible if downstream package authors are not careful but it looks like I don't need to worry about it.
@@ -219,6 +219,12 @@ argtype(bc::Broadcasted) = argtype(typeof(bc)) | |||
_eachindex(t::Tuple{Any}) = t[1] | |||
_eachindex(t::Tuple) = CartesianIndices(t) | |||
|
|||
Base.IndexStyle(bc::Broadcasted) = IndexStyle(typeof(bc)) | |||
Base.IndexStyle(::Type{<:Broadcasted{<:Any,Tuple{Axis}}}) where {Axis} = IndexStyle(Axis) | |||
Base.IndexStyle(::Type{<:Broadcasted{<:Any}}) = IndexCartesian() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It turned out my previous approach (combine IndexStyle
of each argument) was wrong since it didn't work with, e.g., broadcasted(+, zeros(5), zeros(1, 4))
. Each argument can be IndexLinear
even though the broadcasted indexing is not (that's the main job of broadcasted).
So now the new approach is to return an equivalent of IndexStyle(bc.axes[1])
if length(bc.axes) == 1
and then return IndexCartesian()
otherwise. As I need Axes
type parameter now, this implementation requires the Broadcasted
to be instantiate
d first. I guess this is OK since that's the usual broadcasting pipeline.
Is this implementation fine? Is IndexStyle(bc.axes[1]) == IndexLinear() && length(bc.axes) == 1
the only possible IndexLinear()
case that is detectable in a type-stable manner? That is to say, I'm ignoring the case like broadcasted(+, zeros(1, 4), zeros(1, 4))
as I need to look at the value to check that it supports linear indexing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't really require the Broadcasted
to be instantiate
d to get an IndexStyle
— it just means that if it's not instantiated that it falls back to a cartesian implementation. All Broadcasted
s should support cartesian indexing; only one-dimensional ones will allow straight integers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's much more accurate way to put it. I wanted to say that "for the broadcasted to be dispatched to the optimized method", it has to be instantiate
d first.
@@ -219,6 +219,12 @@ argtype(bc::Broadcasted) = argtype(typeof(bc)) | |||
_eachindex(t::Tuple{Any}) = t[1] | |||
_eachindex(t::Tuple) = CartesianIndices(t) | |||
|
|||
Base.IndexStyle(bc::Broadcasted) = IndexStyle(typeof(bc)) | |||
Base.IndexStyle(::Type{<:Broadcasted{<:Any,Tuple{Axis}}}) where {Axis} = IndexStyle(Axis) | |||
Base.IndexStyle(::Type{<:Broadcasted{<:Any}}) = IndexCartesian() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't really require the Broadcasted
to be instantiate
d to get an IndexStyle
— it just means that if it's not instantiated that it falls back to a cartesian implementation. All Broadcasted
s should support cartesian indexing; only one-dimensional ones will allow straight integers.
@@ -12,6 +12,9 @@ else | |||
const SmallUnsigned = Union{UInt8,UInt16,UInt32} | |||
end | |||
|
|||
abstract type AbstractBroadcasted end | |||
const AbstractArrayOrBroadcasted = Union{AbstractArray, AbstractBroadcasted} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a clever idea. I think the union here is sensible for now — that trait could possibly come later.
Can we also speed up These reductions currently don't use the mapreduce framework and therefore have to be done separately. |
I think Lines 664 to 677 in b1acb3c
Lines 749 to 755 in 111b385
|
@mbauman I added some tests for this feature. This is all I wanted to do in this PR. Please review and merge or let me know if there are anything I need to fix. |
This patch looks good to me and seems worth doing. I'd like to hold off just a bit to see what the triage reactions are to #19198 (and its implementation). |
Marked for triage so that it actually gets discussed. |
Yeah, I just put #31088 on the triage list earlier this afternoon — that's more the thing to talk about. |
Triage agrees this this is worth doing once we finalize #31088. |
friendly bump @mbauman Can you review this? |
Just curious, any news on this? I saw some nice performance improvements with this. |
I think there are two questions: (Q1) Can this PR be merged without a short-hand syntax/macro to construct (Q2) Is my implementation OK? While it is better to wait for @mbauman to look into it to answer Q2, I think other core devs can answer Q1 (or maybe it can be briefly discussed in triage?). Maybe @StefanKarpinski can help to clarify Q1? I'm bringing this up as it seems that triage wanted to implement the syntax at the same time #31020 (comment) and if the answer to Q1 is no, we need to do some more discussion in #19198 while waiting for Q2. It also gives people some idea of how long they need to wait to get this. If this requires a syntax, it's imaginable that this takes way longer than just adding an overload. |
Out of curiosity: I just noticed that in 1.4-rc2 Since this PR is relevant, does it fix this issue? (it's a shame it was not included in v1.4) |
I guess But I'm not sure about |
Actually, no, this PR does not fix it (yet). It should be easy to add, though. |
Sorry, my bad. The But after looking through your this PR, I noticed you implement |
Yeah, we have all the internals we need. But I think we still need to define the dispatch. For Lines 648 to 653 in 94b29d5
I'm suspecting there is a somewhat long list of functions receiving |
So to (finally) answer your two questions:
|
Great! Thanks a lot for reviewing this. |
Now it's approved - Is there a chance for this to make it into 1.5? |
It would be good re-rebase this on master so that CI reruns on top of that (since it was a while CI ran). |
I suppose merging |
Base.IndexStyle(::Type{<:Broadcasted{<:Any,<:Tuple{Any}}}) = IndexLinear() | ||
Base.IndexStyle(::Type{<:Broadcasted{<:Any}}) = IndexCartesian() | ||
|
||
Base.LinearIndices(bc::Broadcasted{<:Any,<:Tuple{Any}}) = axes(bc)[1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am right that this usually won't return a LinearIndices
? Should this rather be LinearIndices(axes(bc))
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that'd probably be better. This sort of specialization should probably be on eachindex(::IndexLinear, ...)
or LinearIndices
itself if it's even needed at all.
I suggest to add indexing-based specializations for
Broadcasted
inmapreduce
. AsBroadcasted
is almost like an array, it is mostly about "routing"Broadcasted
to the right method.I need to fill some more details, but it already works well for simple example:
which is somewhat comparable to
LinearAlgebra.dot
:Before it was 5x slower:
I'm opening a PR to get some feedback for polishing the implementation. I'll ask specific questions below.
(Edit: now we require
Broadcast.instantiate
for fast reduce)