-
Couldn't load subscription status.
- Fork 13.4k
opencl: transposed gemm/gemv moe kernel with mxfp4,f32 #16602
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
opencl: transposed gemm/gemv moe kernel with mxfp4,f32 #16602
Conversation
0fb80c7 to
7e84cc9
Compare
7e84cc9 to
61dedfa
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice bump in perf on Gen5 devices!
|
@shawngu-quic can you please fix the EditorConfig checker. |
|
@shawngu-quic There are also some compilation warnings that have to be fixed. |
* opencl: transposed gemm/gemv moe kernel with mxfp4,f32 * add restore kernel for moe transpose * fix trailing whitespaces * resolve compilation warnings
* opencl: transposed gemm/gemv moe kernel with mxfp4,f32 * add restore kernel for moe transpose * fix trailing whitespaces * resolve compilation warnings
Added redesigned moe-mxfp4 kernels optimized for Adreno:
Achieved large perf uplift for prefill, especially for long prompts.