-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ERROR: UndefVarError: ####temporary#425_ not defined #18
Comments
Thanks again for these reports. This commit fixed your
But to get the example to work, I'll also have to decide a strategy to deal with peeling. s = B[1,d1]*B[1,κ]
v[1:U*W] .= 0 # U is the number of SIMD vectors, W their width
for d2=2:U*W:d # handle remainder appropriately, as written it will often go out of bounds
@. v[1:U*W] += B[d2:d2+U*W-1,d1]*B[d2:d2+U*W-1,κ]
end
G[d1,κ] = s + sum(v) which faithfully follows what the user wrote. I think a much better transformation, matching the intention, would more along the lines of: v[1:U*W] .= B[1:U*W,d1]*B[1:U*W,κ]
for d2=1+W:W:d # handle remainder appropriately, as written it will often go out of bounds
@. v[1:U*W] += B[d2:d2+U*W-1,d1]*B[d2:d2+U*W-1,κ]
end
G[d1,κ] = sum(v) This is more difficult: it'll have to recognize the pattern of N-loop iterations getting peeled off. But it is what it ought to do, and therefore what I'll go for. I'll leave this issue open until I/someone else implements that. An easier means of getting that behavior would be an |
Thanks! You asked how often things work when I use LoopVectorization; this morning was the first time that I was able to use the Thanks for your comments on Thanks for looking at this! |
This morning was the first time without a compilation error? |
On LoopVectorization 0.3.6: julia> using LoopVectorization
julia> using BenchmarkTools
julia> function toy1!(G, B,κ)
d = size(G,1)
@inbounds for d1=1:d
G[d1,κ] = B[1,d1]*B[1,κ]
for d2=2:d
G[d1,κ] += B[d2,d1]*B[d2,κ]
end
end
end
toy1! (generic function with 1 method)
julia> function toy2!(G, B,κ)
d = size(G,1)
@avx for d1=1:d
G[d1,κ] = B[1,d1]*B[1,κ]
for d2=2:d
G[d1,κ] += B[d2,d1]*B[d2,κ]
end
end
end
toy2! (generic function with 1 method)
julia> function toy3!(G, B,κ)
d = size(G,1)
z = zero(eltype(G))
@avx for d1=1:d
G[d1,κ] = z
for d2=1:d
G[d1,κ] += B[d2,d1]*B[d2,κ]
end
end
end
toy3! (generic function with 1 method)
julia> N = 8
8
julia> B = randn(N, N);
julia> G1 = zeros(N, N).*NaN;
julia> G2 = similar(G1);
julia> G3 = similar(G1);
julia> toy1!(G1,B,1)
julia> toy2!(G2,B,1)
julia> toy3!(G3,B,1)
julia> @assert @views G1[:,1] ≈ G2[:,1]
julia> @assert @views G1[:,1] ≈ G3[:,1]
julia> @benchmark toy1!($G1,$B,1)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 33.883 ns (0.00% GC)
median time: 34.104 ns (0.00% GC)
mean time: 34.862 ns (0.00% GC)
maximum time: 69.998 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 993
julia> @benchmark toy2!($G1,$B,1)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 25.958 ns (0.00% GC)
median time: 26.100 ns (0.00% GC)
mean time: 26.140 ns (0.00% GC)
maximum time: 82.856 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 996
julia> @benchmark toy3!($G1,$B,1)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 19.281 ns (0.00% GC)
median time: 19.478 ns (0.00% GC)
mean time: 19.480 ns (0.00% GC)
maximum time: 21.995 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 997 I fixed the remaining problem by (a) allowing loops that start at values other than For now, I will close the issue because the examples work. I'm hoping more loops will work without errors! |
Thanks! This helps quite a bit. For some reason the code I'm trying to optimize requires something closer to the Thanks for your work on this |
Possibly related to #15, following are a couple of compilation errors.
The text was updated successfully, but these errors were encountered: