-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CartesianIndex and @turbo loop #359
Comments
Yes. If someone is interested in fixing this, I can provide instructions/guidance. Otherwise, this will have to wait for the LoopVectorization rewrite, which I'm trying to prioritize at the moment. |
Without using using LoopVectorization, OffsetArrays, BenchmarkTools
x = rand(100,100)
kern = Images.Kernel.gaussian((1, 1), (3, 3))
function noturbosum(x, kern)
ks = zero(eltype(x))
@inbounds for i=2:size(x,1)-1
for j in axes(kern,1), m in axes(kern,2)
ks += x[i+j, i+m] * kern[j,m]
end
end
ks
end
function outersum(x, kern)
ks = zero(eltype(x))
@turbo for i=2:size(x,1)-1
for j in axes(kern,1), m in axes(kern,2)
ks += x[i+j, i+m] * kern[j,m]
end
end
ks
end
function insum(x, kern)
ks = zero(eltype(x))
@turbo for i=2:size(x,1)-1
s1 = zero(eltype(x))
for j in axes(kern,1), m in axes(kern,2)
s1 += x[i+j, i+m] * kern[j,m]
end
ks += s1
end
ks
end
function in2sum(x, kern)
ks = zero(eltype(x))
@turbo for i=2:size(x,1)-1
for j in axes(kern,1)
s0 = zero(eltype(x))
for m in axes(kern,2)
s0 += x[i+j, i+m] * kern[j,m]
end
ks += s0
end
end
ks
end
function ininsum(x, kern)
ks = zero(eltype(x))
@turbo for i=2:size(x,1)-1
s1 = zero(eltype(x))
for j in axes(kern,1)
s0 = zero(eltype(x))
for m in axes(kern,2)
s0 += x[i+j, i+m] * kern[j,m]
end
s1 += s0
end
ks += s1
end
ks
end
println("outer ", noturbosum(x, kern))
println("outer ", outersum(x, kern))
println("in ", insum(x, kern))
println("in2 ", in2sum(x, kern))
println("inin ", ininsum(x, kern))
display(@btime outersum($x, $kern))
display(@btime insum($x, $kern))
display(@btime in2sum($x, $kern))
display(@btime ininsum($x, $kern))
noturbo 49.06428455119616
outer 46.612923559459894
in 46.447873809237635
in2 46.447873809237635
inin 0.0
3.379 μs (0 allocations: 0 bytes)
46.612923559459894
302.369 ns (0 allocations: 0 bytes)
46.447873809237635
168.696 ns (0 allocations: 0 bytes)
46.447873809237635
23.303 ns (0 allocations: 0 bytes)
0.0 I probably can't dedicate time to fixing |
Differences that big are definitely bugs. julia> display(@btime noturbosum_nofm($x, $kern)) # no `@fastmath`, no `@turbo`
965.667 ns (0 allocations: 0 bytes)
225.36596301915785
julia> display(@btime noturbosum($x, $kern)) # @fastmath, no @turbo
991.838 ns (0 allocations: 0 bytes)
225.36596301915785
julia> display(@btime outersum($x, $kern))
224.695 ns (0 allocations: 0 bytes)
225.365963019158
julia> display(@btime insum($x, $kern))
378.662 ns (0 allocations: 0 bytes)
225.36596301915802
julia> display(@btime in2sum($x, $kern))
273.545 ns (0 allocations: 0 bytes)
225.36596301915804
julia> display(@btime ininsum($x, $kern))
125.409 ns (6 allocations: 304 bytes)
0.0
You should be able to solve multiple Something I'd like to get around to eventually is supporting using |
sorry to revive this, is it possible to give a hint what is causing the first error message?
I have a function with a single for loop which I want to call in a nested way:
then I want to write my 2D function based on the 1D, namely:
This gives me
|
You have a |
thanks for your answer!
|
As one aside function integrate_avx(f::Function, xmin::SVector{2,T}, xmax::SVector{2,T},
x::AbstractVector{T}, w::AbstractVector{T}, h::T) where {T<:Real}
function f1(x1::T) where {T<:Real}
g1(y::T) where {T} = f(x1, y)
res = integrate_avx(g1, xmin[2], xmax[2], x, w, h)
return res
end
g2(x1::T) where {T} = f1(x1)
res = integrate_avx(g2, xmin[1], xmax[1], x, w, h)
return res
end looks like it is going to be putting function integrate_avx(f::Function, xmin::SVector{2,T}, xmax::SVector{2,T},
x::AbstractVector{T}, w::AbstractVector{T}, h::T) where {T<:Real}
function f1(x1::T) where {T<:Real}
g1(y::T) where {T} = f(x1, y)
return integrate_avx(g1, xmin[2], xmax[2], x, w, h)
end
g2(x1::T) where {T} = f1(x1)
return integrate_avx(g2, xmin[1], xmax[1], x, w, h)
end to avoid that. What are the types? Do you have a reproducer I can copy/paste? You could try LoopVectorization.jl/src/condense_loopset.jl Lines 933 to 977 in 3fbe248
|
The code is from here, but here is a MWE:
|
It works for me. I don't get the warning, and I see |
did you change the second call to
|
No, doing that, and I do see the warning. |
thanks a lot for the help! |
LV has been awesome for effortlessly speeding up code! Thanks!
I've recently run into some errors I don't know enough about to fix, reproduced with errors below with version v0.12.96. They're modifications of the image filtering example code. Currently the errors seem a bit too specific to me to address, and was hoping for help, or to report a bug if there's one.
In short my project involves kernels but over sparse, specific points rather than every index of a matrix so only convolving the kernel at specific
CartesianIndex
's seems elegant. Theouterr
issue seems to take offence with making the CartesianIndex itself, while theinerr
issue is beyond me but may be related; maybe@turbo
doesn't like the CartesianIndex being dynamically created? The nominal image filtering example works without problems. In a previous version of LV, I had a block of code similar to theinerr
function below working.The text was updated successfully, but these errors were encountered: