Use `size` based `MultiplicativeInverse` to speedup sequential access of `ReshapedArray` #43518

N5N3 · 2021-12-22T03:07:11Z

This performance difference was found when working on #42736.
Currently, our ReshapedArray use stride based MultiplicativeInverse to speed up index transformation.
For example, for a::AbstractArray{T,3} and b = vec(a), the index transformation is equivalent to:

offset = i - 1                        # b[i]
d1, r1 = divrem(offset, stride(a, 3)) # stride(a, 3) = size(a, 1) * size(a, 2)
d2, r2 = divrem(r1, stride(a, 2))     # stride(a, 2) = size(a, 1)
CartesianIndex(r2 + 1, d2 +1, d1 + 1) # a has one-based axes

(All the stride is replaced with a MultiplicativeInverse to accelerate divrem)

This PR wants to replace the above machinery with:

offset = i - 1               
d1, r1 = divrem(offset, size(a, 1))
d2, r2 = divrem(d1, size(a, 2))
CartesianIndex(r1 + 1, r2 +1, d2 + 1)

For random access, they should have the same computational cost. But for sequential access, like sum(b), size based transformation seems faster.
To avoid bottleneck from IO, use reshape(::CartesianIndices, x...) to benchmark:

f(x) = let r = 0
           for i in eachindex(x)
               @inbounds r |= +(x[i].I...)
           end
           r
       end
a = CartesianIndices((99,100,101));
@btime f(vec($a));                 #2.766 ms  -->  2.591 ms
@btime f(reshape($a,990,1010));    #3.412 ms  -->  2.626 ms
@btime f(reshape($a,33,300,101));  #3.422 ms  -->  2.342 ms

I haven't looked into the reason for this performance difference.

Beside acceleration, this also makes it possible to reuse the MultiplicativeInverse in some cases (like #42736).
So I think it might be useful?

Update reshapedarray.jl

adienes · 2025-06-21T13:54:11Z

this appears to still be a decent performance improvement (& non-conflicted) on my machine is 2ms --> 1.6ms. any interest in picking it back up?

adienes · 2025-09-09T15:58:49Z

I haven't looked into the reason for this performance difference.

I believe it is because the sequence of divrem has more favorable data dependency. since in q, r = divrem(x, d) the q is computed by multiplication + shift (a la MultiplicativeInverse) then r requires an additional multiplication + subtraction. if r is required for the next divrem call then the cpu has to wait, but if only q is required it can do these out of order

and note that the improvement can get really quite large when dimensionality is high. on e.g.

const a = CartesianIndices((2,3,4,5,1,1,1,2,3,4,1,1,1,2,1,2,1,1,3))
const va = vec(a)
@btime f($va);

this PR is 3x faster than master

oscardssmith · 2025-09-09T16:35:05Z

seems reasonable to me!

adienes · 2025-09-09T20:07:24Z

test error is real, but weird. I reduced it to this:

using LinearAlgebra


function test_triangular(elty1)
    X = randn(elty1, 3, 3)
    psd_matrix = X' * X
    data = cholesky(psd_matrix).U

    A1 = UpperTriangular(data)
    exp(Matrix(log(A1))) ≈ A1
end

test_triangular(Float32)

which works on master but segfaults on this PR

I doubt this is technically this PR's fault... but I will keep trying to understand why it happens anyway

adienes · 2025-09-09T22:07:13Z

fixed after #59525

adienes · 2025-09-10T17:41:55Z

I hope it is ok that I pushed directly here. I also changed the _ind2sub method for AbstractArray to match a similar form. I don't think there will be much performance improvement on that method since it's not using MultiplicativeInverse, but it seems a bit more consistent & clearer implementation anyway

N5N3 · 2025-09-11T02:56:31Z

Thanks so much for picking this up and exploring the root cause of the improvements @adienes

N5N3 changed the title ~~Use sized based mi to speedup sequential access of ReshapedArray~~ Use size based MultiplicativeInverse to speedup sequential access of ReshapedArray Dec 22, 2021

dkarrasch added the arrays [a, r, r, a, y, s] label Dec 22, 2021

N5N3 added 2 commits February 26, 2022 11:31

Use sized based mi to speedup sequential access of ReshapedArray

bec6c6f

Update reshapedarray.jl

Reuse mi in simd_outer_range

6e1d666

N5N3 force-pushed the size_reshape branch from 67c737c to 6e1d666 Compare February 26, 2022 03:34

Merge branch 'master' into size_reshape

4b7a545

adienes added the performance Must go faster label Jun 21, 2025

Merge branch 'master' into size_reshape

b3d48d5

adienes added 2 commits September 10, 2025 13:00

Merge branch 'master' into size_reshape

fa14da5

make _ind2sub_recurse(::AbstractArray) follow a similar form

6777cb3

adienes approved these changes Sep 10, 2025

View reviewed changes

N5N3 merged commit 7de5585 into JuliaLang:master Sep 11, 2025
7 checks passed

N5N3 deleted the size_reshape branch September 11, 2025 02:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Use `size` based `MultiplicativeInverse` to speedup sequential access of `ReshapedArray` #43518

Use `size` based `MultiplicativeInverse` to speedup sequential access of `ReshapedArray` #43518

Uh oh!

N5N3 commented Dec 22, 2021 •

edited

Loading

Uh oh!

adienes commented Jun 21, 2025 •

edited

Loading

Uh oh!

adienes commented Sep 9, 2025 •

edited

Loading

Uh oh!

oscardssmith commented Sep 9, 2025

Uh oh!

adienes commented Sep 9, 2025 •

edited

Loading

Uh oh!

adienes commented Sep 9, 2025

Uh oh!

adienes commented Sep 10, 2025

Uh oh!

Uh oh!

N5N3 commented Sep 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Use size based MultiplicativeInverse to speedup sequential access of ReshapedArray #43518

Use size based MultiplicativeInverse to speedup sequential access of ReshapedArray #43518

Uh oh!

Conversation

N5N3 commented Dec 22, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

adienes commented Jun 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

adienes commented Sep 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

oscardssmith commented Sep 9, 2025

Uh oh!

adienes commented Sep 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

adienes commented Sep 9, 2025

Uh oh!

adienes commented Sep 10, 2025

Uh oh!

Uh oh!

N5N3 commented Sep 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Use `size` based `MultiplicativeInverse` to speedup sequential access of `ReshapedArray` #43518

Use `size` based `MultiplicativeInverse` to speedup sequential access of `ReshapedArray` #43518

N5N3 commented Dec 22, 2021 •

edited

Loading

adienes commented Jun 21, 2025 •

edited

Loading

adienes commented Sep 9, 2025 •

edited

Loading

adienes commented Sep 9, 2025 •

edited

Loading