-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
post-op shift operations #2600
Comments
Hi @ArielPom, Currently, oneDNN does not provide a dedicated bitwise shift left/right API, but as you mentioned mathematically it could be achieved using the Binary primitive. As mentioned here - #1679, although multiplying by a ^2 is mathematically equivalent to performing a bit‑shift, the optimized kernels in oneDNN are not selected based solely on arithmetic equivalence. Instead, they are dispatched based on strict matching of the expected op patterns [data types, memory layouts, broadcasting, ..] If you could please provide verbose log with |
thanks for the quick answer. i can not provide a log after compiling with ONEDNN_VERBOSE=dispatch. can you clarify, lets say the output size is [4,8,16] |
No need to recompile for verbose mode, simply set environment variable before run:
In your case as you are are shifting (scaling) by a single factor, I believe you should be fine with using per-tensor-broadcast ({1,1,1}). Also, perhaps try to break the computation down and use a Binary primitive instead of post-op, something like this should work:
Hope this helps! |
@ArielPom If the shift factor is constant, common to whole output tensor, and known at primitive creation time, you can try to use the eltwise post-op with Elementwise post-ops generally reduces your chances of fallback to reference compared to binary post-op. |
Summary
i did not find post-op for shift left/right for oneDNN.
why is that ?
Current usage
the solutions im using now are :
make the post-ops outside of oneDnn kernel. in my code.
convert shift value to a float multiply value, and add post-op algorithm::binary_mul.
for example x>>(15) , turns into x*(2^(-15))
x<<15 turns into x*(2^15)
but this solution redirect the kernel to ref implementation instead of avx5/vnni/some other smart implementation.
is this a standard behaviour?
Thanks.
The text was updated successfully, but these errors were encountered: