-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[email protected] and up warns about check_args
for the same code
#438
Comments
And using LoopVectorization 0.12.125, the same code on the same data only takes 1.206 ms. I will post an MWE later today. |
That would be good to know. It obviously has a false positive. |
Thanks for the reply! Below is the code to reproduce the warning: using LoopVectorization
using BenchmarkTools
symbols = [:a1, :a2, :bh, :bhk, :gh, :ghk, :itap, :itap2, :phi, :u, :v1, :v2, :a1_rhs, :a2_rhs, :v1_rhs, :v2_rhs];
Nelem = 88_900;
for sym in symbols
@eval $sym = rand(Nelem);
end
function g_update_kernel_Line!(a1, a2, bh, bhk, gh, ghk, itap, itap2, phi, u, v1, v2, a1_rhs, a2_rhs, v1_rhs, v2_rhs)
@turbo @. a1_rhs = u*(-itap*v1*v2*(-bhk*sin(-a1 + a2 + phi) + ghk*cos(-a1 + a2 + phi)) + itap2*v1^2 *(gh + ghk))
@turbo @. a2_rhs = u*(-itap*v1*v2*(bhk*sin(-a1 + a2 + phi) + ghk*cos(-a1 + a2 + phi)) + v2^2 *(gh + ghk))
@turbo @. v1_rhs = u*(-itap*v1*v2*(-bhk*cos(-a1 + a2 + phi) - ghk*sin(-a1 + a2 + phi)) - itap2*v1^2 *(bh + bhk))
@turbo @. v2_rhs = u*(itap*v1*v2*(bhk*cos(-a1 + a2 + phi) - ghk*sin(-a1 + a2 + phi)) - v2^2 *(bh + bhk))
nothing
end
@btime g_update_kernel_Line!($a1, $a2, $bh, $bhk, $gh, $ghk, $itap, $itap2,
$phi, $u, $v1, $v2, $a1_rhs, $a2_rhs, $v1_rhs, $v2_rhs) On my device, it gets
But this code does not reproduce the slow down reported in my second post. I will wait for a possible fix and see if the slow down goes away. |
Thanks for the new release! Please disregard the issue reported in my reply. The performance is now back with 0.12.133. |
I just updated LoopVectorization to the latest version and got a warning about
LoopVectorization.check_args
.@turbo is applied to a a function that take a long list of arguments:
I checked every variable in argument list to
LoopVectorization.check_args
and all gottrue
(they are all Vector{Float64}). Time went from 2 ms in 0.12.129 to 5.5 ms in 0.12.130.Version 0.12.129 is the last version without the warning (and the speed is obviously faster). Looking at the release notes, I'm not sure which function
can_turbo
determines to befalse.
Your help is appreciated!
Edit: Changing it to
@turbo safe=false
helps. I keep getting confused aboutsafe=true
by using it to denote a safe expression. It instead means "be safe".The text was updated successfully, but these errors were encountered: