-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize foldl/foreach for zip(arrays...), CartesianIndices, etc. #35036
base: master
Are you sure you want to change the base?
Conversation
Very nice addition! I also agree with your choice of language: it "addresses" #9080 but is not a fix for it. We want to be able to support both paradigms efficiently. My guess is the same transformation, unwrapping the loops, needs to be applied to the iterator version, but as you say it's a much harder transformation to do automatically and might require some compiler magic. So it's great to have this. |
base/multidimensional.jl
Outdated
@@ -381,6 +381,16 @@ module IteratorsMD | |||
first(iter::CartesianIndices) = CartesianIndex(map(first, iter.indices)) | |||
last(iter::CartesianIndices) = CartesianIndex(map(last, iter.indices)) | |||
|
|||
# Use nested for-loop in `foldl` as it is much faster than `iterate`: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... as it preserves LLVM's ability to vectorize.
E.g. don't just state that it is faster, but why
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is 37e5971 enough?
base/reduce.jl
Outdated
struct _InitialValue end | ||
@inline function _foldl_impl(op::OP, init, array::AbstractArray) where {OP} | ||
if IndexStyle(array) isa IndexLinear | ||
return invoke(_foldl_impl, Tuple{Any,Any,Any}, op, init, array) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That we have to use invoke
here is a codesmell for me. Why can't this be done with a dispatch on the IndexStyle?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just simply renamed the default implementation a56c371. I can wrap it with a new function with IndexStyle
argument but I think this is simpler. What do you think?
The reason why I don't think IndexStyle
is a good approach here is that this trait is insufficient for supporting more complex collections such as sparse arrays.
bump, I think this shouldn't need to wait until we figure out the generic compiler work. |
Is this still needed? I see the linked issues are closed now, not a few months after Tim sounded that he despaired of seeing #9080 be solved. |
This PR implements performance an optimization for
foldl
onCartesianIndices
andproduct
by executing them as nested loops rather than invoking their customiterate
function on a single loop. From this optimization, we can easily add performance optimizations of other functions such asfoldl(_, zip(arrays...))
andforeach(_, arrays...)
. For more contexts, see #9080 (comment), #9080 (comment), and #15648 (comment).As we already have iterators-to-transducers automatic conversions #33526, iterator comprehensions wrapping
product
, i.e., anything of the formcan automatically get some performance boost.
I think this PR also addresses issue #9080.
Benchmarks (issue #9080)
Some more benchmarks
Before (1.5.0-DEV.416):
After: