[ROCm][Quantization][3/N] Refactor quark_moe w4a4 w/ oracle#41436
[ROCm][Quantization][3/N] Refactor quark_moe w4a4 w/ oracle#41436BowenBao wants to merge 4 commits intovllm-project:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request refactors and extends ROCm MXFP4 MoE support by introducing specialized AITER backends for W4A16, W4A8, and W4A4 quantization schemes. Key changes include the implementation of AiterW4A8ExpertsMonolithic using Triton kernels, the addition of AiterMxfp4Experts for W4A4, and a significant refactor of the Quark MoE method to utilize centralized oracle-based backend selection. The PR also enhances the testing suite with new oracle-based execution tests and GFX950-specific validation. Feedback is provided regarding code duplication in the reference MoE implementation and weight dequantization logic within the test suite, suggesting the extraction of these routines into helper functions to improve maintainability.
Signed-off-by: Bowen Bao <bowenbao@amd.com>
c40cbeb to
0397d4e
Compare
Signed-off-by: Bowen Bao <bowenbao@amd.com>
Co-Authored-By: Claude Opus 4 <noreply@anthropic.com> Signed-off-by: Bowen Bao <bowenbao@amd.com>
Includes changes from #39136