Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[email protected] and up warns about check_args for the same code #438

Closed
cuihantao opened this issue Oct 7, 2022 · 4 comments
Closed

Comments

@cuihantao
Copy link

cuihantao commented Oct 7, 2022

I just updated LoopVectorization to the latest version and got a warning about LoopVectorization.check_args.

@turbo is applied to a a function that take a long list of arguments:

function g_update_kernel_Line!(a1, a2, bh, bhk, gh, ghk, itap, itap2, phi, u, v1, v2, a1_rhs, a2_rhs, v1_rhs, v2_rhs)
    @turbo @. a1_rhs = u*(-itap*v1*v2*(-bhk*sin(-a1 + a2 + phi) + ghk*cos(-a1 + a2 + phi)) + itap2*v1^2 *(gh + ghk))
    @turbo @. a2_rhs = u*(-itap*v1*v2*(bhk*sin(-a1 + a2 + phi) + ghk*cos(-a1 + a2 + phi)) + v2^2 *(gh + ghk))
    @turbo @. v1_rhs = u*(-itap*v1*v2*(-bhk*cos(-a1 + a2 + phi) - ghk*sin(-a1 + a2 + phi)) - itap2*v1^2 *(bh + bhk))
    @turbo @. v2_rhs = u*(itap*v1*v2*(bhk*cos(-a1 + a2 + phi) - ghk*sin(-a1 + a2 + phi)) - v2^2 *(bh + bhk))
    nothing
end

I checked every variable in argument list to LoopVectorization.check_args and all got true (they are all Vector{Float64}). Time went from 2 ms in 0.12.129 to 5.5 ms in 0.12.130.

Version 0.12.129 is the last version without the warning (and the speed is obviously faster). Looking at the release notes, I'm not sure which function can_turbo determines to be false.

Your help is appreciated!

Edit: Changing it to @turbo safe=false helps. I keep getting confused about safe=true by using it to denote a safe expression. It instead means "be safe".

@cuihantao
Copy link
Author

And using LoopVectorization 0.12.125, the same code on the same data only takes 1.206 ms. I will post an MWE later today.

@chriselrod
Copy link
Member

Version 0.12.129 is the last version without the warning (and the speed is obviously faster). Looking at the release notes, I'm not sure which function can_turbo determines to be false.

That would be good to know. It obviously has a false positive.
It'd be helpful to get a message notifying why it didn't work...

@cuihantao
Copy link
Author

Thanks for the reply! Below is the code to reproduce the warning:

using LoopVectorization
using BenchmarkTools

symbols = [:a1, :a2, :bh, :bhk, :gh, :ghk, :itap, :itap2, :phi, :u, :v1, :v2, :a1_rhs, :a2_rhs, :v1_rhs, :v2_rhs];

Nelem = 88_900;

for sym in symbols
    @eval $sym = rand(Nelem);
end


function g_update_kernel_Line!(a1, a2, bh, bhk, gh, ghk, itap, itap2, phi, u, v1, v2, a1_rhs, a2_rhs, v1_rhs, v2_rhs)
    @turbo @. a1_rhs = u*(-itap*v1*v2*(-bhk*sin(-a1 + a2 + phi) + ghk*cos(-a1 + a2 + phi)) + itap2*v1^2 *(gh + ghk))
    @turbo @. a2_rhs = u*(-itap*v1*v2*(bhk*sin(-a1 + a2 + phi) + ghk*cos(-a1 + a2 + phi)) + v2^2 *(gh + ghk))
    @turbo @. v1_rhs = u*(-itap*v1*v2*(-bhk*cos(-a1 + a2 + phi) - ghk*sin(-a1 + a2 + phi)) - itap2*v1^2 *(bh + bhk))
    @turbo @. v2_rhs = u*(itap*v1*v2*(bhk*cos(-a1 + a2 + phi) - ghk*sin(-a1 + a2 + phi)) - v2^2 *(bh + bhk))
    nothing
end

@btime g_update_kernel_Line!($a1, $a2, $bh, $bhk, $gh, $ghk, $itap, $itap2,
                             $phi, $u, $v1, $v2, $a1_rhs, $a2_rhs, $v1_rhs, $v2_rhs)

On my device, it gets

  • 0.12.129: 1.2 ms
  • 0.12.130: 6.9 ms (with warning)

But this code does not reproduce the slow down reported in my second post. I will wait for a possible fix and see if the slow down goes away.

chriselrod added a commit that referenced this issue Oct 8, 2022
@cuihantao
Copy link
Author

Thanks for the new release! Please disregard the issue reported in my reply. The performance is now back with 0.12.133.

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants