Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error with accumulation in for loop #486

Closed
RGerzaguet opened this issue Apr 19, 2023 · 4 comments
Closed

Error with accumulation in for loop #486

RGerzaguet opened this issue Apr 19, 2023 · 4 comments

Comments

@RGerzaguet
Copy link

Hello @chriselrod

First of all thanks for this wonderful package.
I have encountered a weird issue related to for loop accelerated with turbo. A "simple" for loop gives very different results when the macro activated.

In this loop, we only increment values in a matrix (e.g a confusion matrix) based on the value on two matrices.
A MWE is presented below

using LoopVectorization 

""" Create a confusion matrix based on the 2 inputs. Here this only counts the occurence of each element 
"""
function create_confusionMatrix(l̂::AbstractArray,l::AbstractArray,nbRadios::Number)
    confMatrix = zeros(nbRadios,nbRadios)
    for n ∈ eachindex(l)
        confMatrix[l[n],l̂[n]] += 1
    end
    return confMatrix
end 


""" Exactly same as before with @turbo power 
"""
function turbo_create_confusionMatrix(l̂::AbstractArray,l::AbstractArray,nbRadios::Number)
    confMatrix = zeros(nbRadios,nbRadios)
    @turbo for n ∈ eachindex(l)
        confMatrix[l[n],l̂[n]] += 1
    end
    return confMatrix
end 

# True labels 
nbRadios = 4
l = repeat(1:4,100) # l = [1;2;3;4;1;2;3;4;....;1;2;3;4]

# Create the 2 confusion matrices, should be the same 
m             = create_confusionMatrix(l,l,nbRadios)
m_turbo  = turbo_create_confusionMatrix(l,l,nbRadios)
# Same result right ? 
@assert m == m_turbo "Matrices are not identical"

And we have

julia> m
4×4 Matrix{Float64}:
 100    0    0    0
   0  100    0    0
   0    0  100    0
   0    0    0  100

julia> m_turbo
4×4 Matrix{Float64}:
 50   0   0   0
  0  50   0   0
  0   0  50   0
  0   0   0  50

This code has been used on Julia 1.8 with

[bdcacae8] LoopVectorization v0.12.157

We should have diagonal matrix (same array is used as true label and estimated label), and the 2 functions should lead to exactly the same numbers of the diagonal (100 in this case). This is not the case, and the turbo version has only half of the elements (50).

It seems that not every index is used when using the turbo version like a multiple load that leads to only one update ? I try to have a look on the @code_llvm but god this is hard 😅

It is something on our side that we do very bad ? A corner case of the package ? Something attended ?

@chriselrod
Copy link
Member

This issue, or a few like it, come up often.

It should probably be documented more clearly, but LoopVectorization.jl assumes indices don't alias whenever you have A[f(i...)], and f isn't + or - (it'll recognize aliasing in that case).

In plain English, that means if you have A[l[i]] it'll assume that l[i] != l[j] for all i != j.

@chriselrod
Copy link
Member

For example

julia> """ Create a confusion matrix based on the 2 inputs. Here this only counts the occurence of each element 
       """
       function create_confusionMatrix(l::AbstractArray,l::AbstractArray,nbRadios::Number)
           confMatrix = zeros(nbRadios,nbRadios)
           for n  eachindex(l)
               confMatrix[l[n],l[n]] += 1
           end
           return confMatrix
       end
create_confusionMatrix

julia> """ Create a confusion matrix based on the 2 inputs. Here this only counts the occurence of each element 
       """
       function simd_create_confusionMatrix(l::AbstractArray,l::AbstractArray,nbRadios::Number)
           confMatrix = zeros(nbRadios,nbRadios)
           @inbounds @simd ivdep for n  eachindex(l)
               confMatrix[l[n],l[n]] += 1
           end
           return confMatrix
       end
simd_create_confusionMatrix

julia> using LoopVectorization

julia> """ Exactly same as before with @turbo power 
       """
       function turbo_create_confusionMatrix(l::AbstractArray,l::AbstractArray,nbRadios::Number)
           confMatrix = zeros(nbRadios,nbRadios)
           @turbo for n  eachindex(l)
               confMatrix[l[n],l[n]] += 1
           end
           return confMatrix
       end
turbo_create_confusionMatrix

julia> # True labels 
       nbRadios = 4;

julia> l = repeat(1:4,100) # l = [1;2;3;4;1;2;3;4;....;1;2;3;4];

julia> # Create the 2 confusion matrices, should be the same 
       m             = create_confusionMatrix(l,l,nbRadios)
4×4 Matrix{Float64}:
 100.0    0.0    0.0    0.0
   0.0  100.0    0.0    0.0
   0.0    0.0  100.0    0.0
   0.0    0.0    0.0  100.0

julia> m_turbo  = turbo_create_confusionMatrix(l,l,nbRadios)
4×4 Matrix{Float64}:
 13.0   0.0   0.0   0.0
  0.0  13.0   0.0   0.0
  0.0   0.0  13.0   0.0
  0.0   0.0   0.0  13.0

julia> # Same result right ? 
       @assert m == m_turbo "Matrices are not identical"
ERROR: AssertionError: Matrices are not identical
Stacktrace:
 [1] top-level scope
   @ REPL[11]:2

julia> m_simd = simd_create_confusionMatrix(l,l,nbRadios)
4×4 Matrix{Float64}:
 50.0   0.0   0.0   0.0
  0.0  50.0   0.0   0.0
  0.0   0.0  50.0   0.0
  0.0   0.0   0.0  50.0

I got a different wrong answer from @turbo than you did, because the order at which it loads, accumulates, and stores isn't guaranteed and will vary from one platform to another based on what it thinks is fastest. If the answer depends on the order, different orders generate different answers.

I can also lie to the base Julia compiler with @simd ivdep to promise that the loop is order independent, and get a different wrong answer (in this case, it happens to be the same one you got with @turbo).

Obviously don't do this, just saying it's expected.
The answer is to not promise out of order when that isn't allowed.

@chriselrod
Copy link
Member

I am working on a rewrite of LoopVectorization that does much more extensive analysis. It'll only reorder when legal.
LoopVectorization.jl can't do any sophisiticated dependence analysis. It knows how to optimize a few kinds correctly, and for everything else it assumes they're independent.

It's easy to ignore problems. Especially if I know exactly which kinds it can and can't handle -- I'm not going to be caught off guard, but I guess the documentation should make that more clear for everyone else.

The rewrite (LoopModels) does a lot of modeling, so it can correctly optimize a ton of loops without needing to make those sorts of assumptions, so giving up when it can't prove (e.g. here) is reasonable.
It's of course impossible to prove that here, so I'll eventually have it read parallel loop iteration metadata (which @simd ivdep adds) to let users still make this promise, for when they have code that looks like yours, but they've guaranteed length(unique(l)) == l, i.e. when they know a priori that all elements are unique.

@RGerzaguet
Copy link
Author

Hello Chris,
Many thanks for the precious insights you have provided here. This is indeed a very complicated issue, and I did not realize that it was due to some kind of index dependency.
Let's hope that it will be addressed by the rework of Loop Vectorization. In the meantime, I would like to express my gratitude for all the support you have provided to the language and for all the vectorization assets.
Thank you once again!
Kudos !!

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants