post-op shift operations #2600

ArielPom · 2025-02-05T17:08:24Z

Summary

i did not find post-op for shift left/right for oneDNN.
why is that ?

Current usage

the solutions im using now are :

make the post-ops outside of oneDnn kernel. in my code.
convert shift value to a float multiply value, and add post-op algorithm::binary_mul.
for example x>>(15) , turns into x*(2^(-15))
x<<15 turns into x*(2^15)
but this solution redirect the kernel to ref implementation instead of avx5/vnni/some other smart implementation.
is this a standard behaviour?

Thanks.

yehudaorel · 2025-02-05T20:08:25Z

Hi @ArielPom,

Currently, oneDNN does not provide a dedicated bitwise shift left/right API, but as you mentioned mathematically it could be achieved using the Binary primitive.

As mentioned here - #1679, although multiplying by a ^2 is mathematically equivalent to performing a bit‑shift, the optimized kernels in oneDNN are not selected based solely on arithmetic equivalence. Instead, they are dispatched based on strict matching of the expected op patterns [data types, memory layouts, broadcasting, ..]

If you could please provide verbose log with ONEDNN_VERBOSE=dispatch, and a snippet to your oneDNN code that would be great!

ArielPom · 2025-02-05T20:21:16Z

thanks for the quick answer.

i can not provide a log after compiling with ONEDNN_VERBOSE=dispatch.
but i guess that un-optimized kernel is chosen because of the post-op data shape i give it.

can you clarify, lets say the output size is [4,8,16]
what are the possible post-op shapes that would lead to an optimized kernel?

yehudaorel · 2025-02-05T21:28:05Z

thanks for the quick answer.

i can not provide a log after compiling with ONEDNN_VERBOSE=dispatch. but i guess that un-optimized kernel is chosen because of the post-op data shape i give it.

No need to recompile for verbose mode, simply set environment variable before run:

export ONEDNN_VERBOSE=dispatch
./test.o
or
ONEDNN_VERBOSE=dispatch ./test.o

can you clarify, lets say the output size is [4,8,16] what are the possible post-op shapes that would lead to an optimized kernel?

In your case as you are are shifting (scaling) by a single factor, I believe you should be fine with using per-tensor-broadcast ({1,1,1}).

Also, perhaps try to break the computation down and use a Binary primitive instead of post-op, something like this should work:

    memory::dim N = 4, C = 8, W = 16;
    memory::dims tensor_dims = {N, C, W};

    float shift_scaler =  std::pow(2.0f, 15); // example x>>(15) 
    memory::dims multiplier_dims = {1, 1, 1}; // per-tensor broadcast
    std::vector<float> multiplier_data = {shift_scaler};

    auto src_md = memory::desc(tensor_dims, dt::f32, tag::abc);
    auto dst_md = memory::desc(tensor_dims, dt::f32, tag::abc);
    auto multiplier_md = memory::desc(multiplier_dims, dt::f32, tag::abc);

    auto src_mem = memory(src_md, engine, src_data);
    auto dst_mem = memory(dst_md, engine, dst_data);
    auto multiplier_mem = memory(multiplier_md, engine, multiplier_data);

    primitive_attr attr;
    auto binary_pd = binary::primitive_desc(engine, algorithm::binary_mul,
                                            src_md, multiplier_md, dst_md, attr);

    auto binary_prim = binary(binary_pd);

Hope this helps!

mgouicem · 2025-02-25T10:32:14Z

@ArielPom If the shift factor is constant, common to whole output tensor, and known at primitive creation time, you can try to use the eltwise post-op with eltwise_linear algorithm.

Elementwise post-ops generally reduces your chances of fallback to reference compared to binary post-op.

ArielPom added the enhancement A feature or an optimization request label Feb 5, 2025

yehudaorel added the question label Feb 5, 2025

yehudaorel self-assigned this Feb 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

post-op shift operations #2600

post-op shift operations #2600

ArielPom commented Feb 5, 2025

yehudaorel commented Feb 5, 2025

ArielPom commented Feb 5, 2025

yehudaorel commented Feb 5, 2025

mgouicem commented Feb 25, 2025

post-op shift operations #2600

post-op shift operations #2600

Comments

ArielPom commented Feb 5, 2025

Summary

Current usage

yehudaorel commented Feb 5, 2025

ArielPom commented Feb 5, 2025

yehudaorel commented Feb 5, 2025

mgouicem commented Feb 25, 2025