ERROR: UndefVarError: ####temporary#425_ not defined #18

chrisvwx · 2020-01-12T01:10:37Z

Possibly related to #15, following are a couple of compilation errors.

               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.3.1 (2019-12-30)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> using LoopVectorization
julia> using BenchmarkTools
julia> function toy1!(G, B,κ)
           d = size(G,1)
           @inbounds for d1=1:d
               G[d1,κ] = B[1,d1]*B[1,κ]
               for d2=2:d
                   G[d1,κ] += B[d2,d1]*B[d2,κ]
               end
           end
       end
toy1! (generic function with 1 method)
julia> function toy2!(G, B,κ)
           d = size(G,1)
           @avx for d1=1:d
               G[d1,κ] = B[1,d1]*B[1,κ]
               for d2=2:d
                   G[d1,κ] += B[d2,d1]*B[d2,κ]
               end
           end
       end
toy2! (generic function with 1 method)
julia> function toy3!(G, B,κ)
           d = size(G,1)
           z = zero(eltype(G))
           @avx for d1=1:d
               G[d1,κ] = z
               for d2=1:d
                   G[d1,κ] += B[d2,d1]*B[d2,κ]
               end
           end
       end
toy3! (generic function with 1 method)

julia> N = 8
8
julia> B = randn(N, N);
julia> G1 = zeros(N, N).*NaN;
julia> G2 = similar(G1);
julia> G3 = similar(G1);
julia> toy1!(G1,B,1)
julia> toy2!(G2,B,1)
ERROR: UndefVarError: ####temporary#425_ not defined
Stacktrace:
 [1] toy2!(::Array{Float64,2}, ::Array{Float64,2}, ::Int64) at ./gcutils.jl:91
 [2] top-level scope at REPL[12]:1

julia> toy3!(G3,B,1)
ERROR: UndefVarError: ##G_0 not defined
Stacktrace:
 [1] macro expansion at ./gcutils.jl:91 [inlined]
 [2] toy3!(::Array{Float64,2}, ::Array{Float64,2}, ::Int64) at ./REPL[5]:4
 [3] top-level scope at REPL[13]:1

The text was updated successfully, but these errors were encountered:

chriselrod · 2020-01-13T14:18:05Z

Thanks again for these reports.
Out of curiosity, roughly what percentage of the time do things work when you play around with the library?

This commit fixed your toy3!.

toy2! will be more difficult.
For one thing, I realized the library had been assuming loops start at the start of an array, regardless of index (so offset arrays should have been supported). For now, however, it will throw an error if loops don't.
It shouldn't require too much change to fix this.

But to get the example to work, I'll also have to decide a strategy to deal with peeling.
One idea would be to basically transform it into...

s = B[1,d1]*B[1,κ]
v[1:U*W] .= 0 # U is the number of SIMD vectors, W their width
for d2=2:U*W:d # handle remainder appropriately, as written it will often go out of bounds
    @. v[1:U*W] += B[d2:d2+U*W-1,d1]*B[d2:d2+U*W-1,κ]
end
G[d1,κ] = s + sum(v)

which faithfully follows what the user wrote.
This will also generally be slower than not peeling, except perhaps when d % W == 1.

I think a much better transformation, matching the intention, would more along the lines of:

v[1:U*W] .= B[1:U*W,d1]*B[1:U*W,κ]
for d2=1+W:W:d # handle remainder appropriately, as written it will often go out of bounds
    @. v[1:U*W] += B[d2:d2+U*W-1,d1]*B[d2:d2+U*W-1,κ]
end
G[d1,κ] = sum(v)

This is more difficult: it'll have to recognize the pattern of N-loop iterations getting peeled off. But it is what it ought to do, and therefore what I'll go for.

I'll leave this issue open until I/someone else implements that.

An easier means of getting that behavior would be an @peel macro that lets you write a loop like in toy3!, which will have the @avx macro perform the transformation to toy2!.
That would save it from having to look for and identify the pattern.

chrisvwx · 2020-01-13T15:42:04Z

Thanks! You asked how often things work when I use LoopVectorization; this morning was the first time that I was able to use the @avx macro in a function that I am trying to optimize without a compilation error. With your latest fix I get a consistent 5% speed improvement in a specific non-trivial call to the function. I'll look at the function again later today in more detail.

Thanks for your comments on toy2! and loops that start at an offset. In the code I am trying to optimize, I can often rewrite to not require the offset, yet it does seem like a useful thing to have.

Thanks for looking at this!

chriselrod · 2020-01-13T15:59:25Z

This morning was the first time without a compilation error?
Sounds like it has a long way to go before it provides a good user experience.

chriselrod · 2020-01-16T13:11:47Z

On LoopVectorization 0.3.6:

julia> using LoopVectorization

julia> using BenchmarkTools

julia> function toy1!(G, B,κ)
           d = size(G,1)
           @inbounds for d1=1:d
               G[d1,κ] = B[1,d1]*B[1,κ]
               for d2=2:d
                   G[d1,κ] += B[d2,d1]*B[d2,κ]
               end
           end
       end
toy1! (generic function with 1 method)

julia> function toy2!(G, B,κ)
           d = size(G,1)
           @avx for d1=1:d
               G[d1,κ] = B[1,d1]*B[1,κ]
               for d2=2:d
                   G[d1,κ] += B[d2,d1]*B[d2,κ]
               end
           end
       end
toy2! (generic function with 1 method)

julia> function toy3!(G, B,κ)
           d = size(G,1)
           z = zero(eltype(G))
           @avx for d1=1:d
               G[d1,κ] = z
               for d2=1:d
                   G[d1,κ] += B[d2,d1]*B[d2,κ]
               end
           end
       end
toy3! (generic function with 1 method)

julia> N = 8
8

julia> B = randn(N, N);

julia> G1 = zeros(N, N).*NaN;

julia> G2 = similar(G1);

julia> G3 = similar(G1);

julia> toy1!(G1,B,1)

julia> toy2!(G2,B,1)

julia> toy3!(G3,B,1)

julia> @assert @views G1[:,1] ≈ G2[:,1]

julia> @assert @views G1[:,1] ≈ G3[:,1]

julia> @benchmark toy1!($G1,$B,1)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     33.883 ns (0.00% GC)
  median time:      34.104 ns (0.00% GC)
  mean time:        34.862 ns (0.00% GC)
  maximum time:     69.998 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     993

julia> @benchmark toy2!($G1,$B,1)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     25.958 ns (0.00% GC)
  median time:      26.100 ns (0.00% GC)
  mean time:        26.140 ns (0.00% GC)
  maximum time:     82.856 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     996

julia> @benchmark toy3!($G1,$B,1)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     19.281 ns (0.00% GC)
  median time:      19.478 ns (0.00% GC)
  mean time:        19.480 ns (0.00% GC)
  maximum time:     21.995 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     997

I fixed the remaining problem by (a) allowing loops that start at values other than 1, and (b) making the handling of reductions more robust.

For now, I will close the issue because the examples work.
The peeled code is of course slower. If you'd like for the library to recognize peeling to optimize it specially and attempt to make it as fast as not peeling, feel free to file a new issue for that.

I'm hoping more loops will work without errors!

chrisvwx · 2020-01-16T20:53:51Z

Thanks!

This helps quite a bit. For some reason the code I'm trying to optimize requires something closer to the toy2!; @avx is now giving around a 10% time gain. I'd like to use it in even more places; I submitted another issue.

Thanks for your work on this

chriselrod closed this as completed Jan 16, 2020

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ERROR: UndefVarError: ####temporary#425_ not defined #18

ERROR: UndefVarError: ####temporary#425_ not defined #18

chrisvwx commented Jan 12, 2020

chriselrod commented Jan 13, 2020 •

edited

Loading

chrisvwx commented Jan 13, 2020

chriselrod commented Jan 13, 2020

chriselrod commented Jan 16, 2020

chrisvwx commented Jan 16, 2020

ERROR: UndefVarError: ####temporary#425_ not defined #18

ERROR: UndefVarError: ####temporary#425_ not defined #18

Comments

chrisvwx commented Jan 12, 2020

chriselrod commented Jan 13, 2020 • edited Loading

chrisvwx commented Jan 13, 2020

chriselrod commented Jan 13, 2020

chriselrod commented Jan 16, 2020

chrisvwx commented Jan 16, 2020

chriselrod commented Jan 13, 2020 •

edited

Loading