Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add nospecialize and precompiles to reduce latency #97

Merged
merged 1 commit into from
Apr 14, 2020
Merged

Add nospecialize and precompiles to reduce latency #97

merged 1 commit into from
Apr 14, 2020

Conversation

timholy
Copy link
Contributor

@timholy timholy commented Apr 14, 2020

In my test this shaved about 6s off the time to run the tests (which might be within the noise, not sure, the total was around 350s). For a focused test like

using LoopVectorization, OffsetArrays

function avxgeneric!(out, A, kern, R=CartesianIndices(out), z=zero(eltype(out)))
    Rk = CartesianIndices(kern)
    @avx for I in R
        tmp = z
        for J in Rk
            tmp += A[I+J]*kern[J]
        end
        out[I] = tmp
    end
    out
end

A = rand(Float32, 100, 100);
kern = OffsetArray(rand(Float32, 3, 3), -1:1, -1:1);
out1 = OffsetArray(similar(A, size(A).-2), 1, 1);   # stay away from the edges of A
avxgeneric!(out1, A, kern)

it changed the total time (starting from the Linux command line) from about 10.8s to 10.0s. If you take away the time to load LoopVectorization itself, it's about 9.8s vs 8.8s.

So, it's a modest improvement. Feel free to take it or leave it, not a big deal either way for me.

@chriselrod
Copy link
Member

Looking at the precompile, I see solve_tilesize which is being renamed in #95 .
What's the best approach to handling that / which PR do you want me to merge first?

@timholy
Copy link
Contributor Author

timholy commented Apr 14, 2020

If you want this you can merge it first, I will fix in the other PR.

@chriselrod chriselrod merged commit 307d6b5 into JuliaSIMD:master Apr 14, 2020
@chriselrod
Copy link
Member

Thanks, I appreciate a free 0.8-1s!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants