-
-
Notifications
You must be signed in to change notification settings - Fork 5.7k
Use size based MultiplicativeInverse to speedup sequential access of ReshapedArray
#43518
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
mi to speedup sequential access of ReshapedArraysize based MultiplicativeInverse to speedup sequential access of ReshapedArray
Update reshapedarray.jl
|
this appears to still be a decent performance improvement (& non-conflicted) on my machine is 2ms --> 1.6ms. any interest in picking it back up? |
I believe it is because the sequence of and note that the improvement can get really quite large when dimensionality is high. on e.g. this PR is 3x faster than master |
|
seems reasonable to me! |
|
test error is real, but weird. I reduced it to this: which works on master but segfaults on this PR I doubt this is technically this PR's fault... but I will keep trying to understand why it happens anyway |
|
fixed after #59525 |
|
I hope it is ok that I pushed directly here. I also changed the |
|
Thanks so much for picking this up and exploring the root cause of the improvements @adienes |
This performance difference was found when working on #42736.
Currently, our
ReshapedArrayuse stride basedMultiplicativeInverseto speed up index transformation.For example, for
a::AbstractArray{T,3}andb = vec(a), the index transformation is equivalent to:(All the
strideis replaced with aMultiplicativeInverseto acceleratedivrem)This PR wants to replace the above machinery with:
For random access, they should have the same computational cost. But for sequential access, like
sum(b),sizebased transformation seems faster.To avoid bottleneck from IO, use
reshape(::CartesianIndices, x...)to benchmark:I haven't looked into the reason for this performance difference.
Beside acceleration, this also makes it possible to reuse the
MultiplicativeInversein some cases (like #42736).So I think it might be useful?