Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to handle user defined functions? #55

Open
zsoerenm opened this issue Feb 18, 2020 · 2 comments
Open

How to handle user defined functions? #55

zsoerenm opened this issue Feb 18, 2020 · 2 comments

Comments

@zsoerenm
Copy link

zsoerenm commented Feb 18, 2020

I get the following error

ERROR: MethodError: no method matching calc_A(::VectorizationBase.SVec{16,Int16})

where calc_A is a user defined function. Do I need to implement calc_A(::VectorizationBase.SVec{16,Int16}) myself? Why isn't there an error for the other user defined functions? My first guess was because they are inlined, but calc_A should be inlined, too.

Here is the code for that:

@inline function calc_A(x::Int16)
    p = 15; r = 1; A = Int16(23170); C = Int16(-425)
    n = 7
    a = 7= (x * x) >> n
    rounding = one(x) << (p - a - 1)
    (A + rounding + (x² * C) >> r) >> (p - a)
end
@inline function calc_B(x::Int16)
    p = 14; r = 3; B = Int16(-17790); D = Int16(351)
    n = 7
    a = 7= (x * x) >> n
    rounding = one(x) << (p - a - 1)
    (rounding + x * (B + (x² * D) >> r) >> n) >> (p - a)
end
@inline function get_first_bit_sign(x)
    n = 7
    mysign(x << (sizeof(x) * 8 - n - 1))
end
@inline function get_second_bit_sign(x)
    n = 7
    mysign(x << (sizeof(x) * 8 - n - 2))
end
@inline function get_quarter_angle(x)
    n = 7
    x & (one(x) << n - one(x)) - one(x) << (n - 1)
end
@inline mysign(x) = 2 * signbit(x) - 1

function gen_sincos!(sins, coss, phases)
    @avx for i = 1:2500
        first_bit_sign = get_first_bit_sign(phases[i])
        second_bit_sign = get_second_bit_sign(phases[i])
        quarter_angle = get_quarter_angle(phases[i])
        A = calc_A(quarter_angle)
        B = calc_B(quarter_angle)

        coss[i] = second_bit_sign * (first_bit_sign * A + B)
        sins[i] = second_bit_sign * (A - first_bit_sign * B)
    end
end
sins = Vector{Int16}(undef, 2500)
coss = Vector{Int16}(undef, 2500)
phases = Vector{Int16}(undef, 2500)
gen_sincos!(sins, coss, phases)

EDIT: Alright I found the solution myself: If I write function calc_A(x) instead of function calc_A(x::Int16), it will work. So should I omit the declaration of x? What if I have a different function for x::Int32?

@chriselrod
Copy link
Member

chriselrod commented Feb 18, 2020

LoopVectorization evaluates your code using SVecs. An SVec{W,T} contains W elements of type T. Most operations on SVec are defined using SIMD instructions. It's basically how LoopVectorization says "apply single instructions to operate on all of these numbers simultaneously". It puts the Vectorization in LoopVectorization.

But, as a limitation, that means any functions you call will have to accept SVec arguments.
I don't know at the moment how I can work around that. Maybe some trick with Cassette is possible, maybe it can make SVec{<:Any,T} dispatch on ::T methods.

Another limitation is that LoopVectorization's optimizer only understands a few functions. If anything reasonably generic is missing, please let me know and I can add it.
While functions you define will work, LoopVectorization cannot look inside them while it makes decisions about how it will evaluate your code. Currently, it assumes those functions are expensive, and that thus doing anything creative will probably not be profitable. Functions like exp, log, and sin are expensive enough for that to be the case (even though I did include them).

That said, you can manually tell it what to do, eg, unroll by 2 (anything 1-4 will probably be pretty good, above that probably wont be worth it; above 8 will probably be bad):

function gen_sincos!(sins, coss, phases)
    @avx unroll=2 for i = 1:2500
        first_bit_sign = get_first_bit_sign(phases[i])
        second_bit_sign = get_second_bit_sign(phases[i])
        quarter_angle = get_quarter_angle(phases[i])
        A = calc_A(quarter_angle)
        B = calc_B(quarter_angle)

        coss[i] = second_bit_sign * (first_bit_sign * A + B)
        sins[i] = second_bit_sign * (A - first_bit_sign * B)
    end
end

Because it is assuming each of these functions is expensive, it should be deciding on unroll=1.
So it'll probably be worth experimenting with at least unroll=2 and unroll=4, to see if they improve performance.
I saw a roughly 15% performance increase from unroll=4.

It would also definitely be worth looking into if there is some way to figure out the cost of user-defined functions.

If manually specifying the unroll (or tile=(3,4) when there's more than 1 loop) in code without any user defined functions gives you a performance boost, feel free to file an issue, and I'll see if I can improve the algorithm. Currently, it's pretty naive for simple loops without reductions (like this one), and I'm sure it could be made a lot better.

EDIT: Alright I found the solution myself: If I write function calc_A(x) instead of function calc_A(x::Int16), it will work. So should I omit the declaration of x? What if I have a different function for x::Int32?

You can write:

using LoopVectorization: SVec
function calc_A(x::Union{Int16,SVec{<:Any,Int16}})
   ...
end
function calc_A(x::Union{Int32,SVec{<:Any,Int32}})
   ...
end

or, if you're making a lot of definitions, it may be cleaner to define type aliases (const unions):

using LoopVectorization: SVec
const VInt16 = Union{Int16,SVec{<:Any,Int16}}
const VInt32 = Union{Int32,SVec{<:Any,Int32}}
const VInt64 = Union{Int64,SVec{<:Any,Int64}}
function calc_A(x::VInt32)
   ...
end
function calc_B(x::VInt32)
   ...
end
function calc_C(x::VInt32)
   ...
end

Perhaps I should define these type aliases myself? What would be the best name? V[scalar type]?

@zsoerenm
Copy link
Author

Thank you for detailed insights!

Perhaps I should define these type aliases myself? What would be the best name? V[scalar type]?

Sounds like a good option.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants