Skip to content

[ROCm][Quantization][3/N] Refactor quark_moe w4a4 w/ oracle#41436

Open
BowenBao wants to merge 4 commits intovllm-project:mainfrom
BowenBao:bowenbao/oracle_w4a4_final
Open

[ROCm][Quantization][3/N] Refactor quark_moe w4a4 w/ oracle#41436
BowenBao wants to merge 4 commits intovllm-project:mainfrom
BowenBao:bowenbao/oracle_w4a4_final

Conversation

@BowenBao
Copy link
Copy Markdown
Contributor

@BowenBao BowenBao commented May 1, 2026

Includes changes from #39136

  • refactor quark_moe w4a4 to use oracle.
  • all quark_moe mxfp4 configs now use oracle.
  • add qwen3.5 mxfp4 w4a4 eval test to cover both aiter and emulation backend.
  • updates error messages to be more actionable for oracle & aiter experts when no backend is available.

@mergify mergify Bot added gpt-oss Related to GPT-OSS models rocm Related to AMD ROCm labels May 1, 2026
@github-project-automation github-project-automation Bot moved this to Todo in AMD May 1, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors and extends ROCm MXFP4 MoE support by introducing specialized AITER backends for W4A16, W4A8, and W4A4 quantization schemes. Key changes include the implementation of AiterW4A8ExpertsMonolithic using Triton kernels, the addition of AiterMxfp4Experts for W4A4, and a significant refactor of the Quark MoE method to utilize centralized oracle-based backend selection. The PR also enhances the testing suite with new oracle-based execution tests and GFX950-specific validation. Feedback is provided regarding code duplication in the reference MoE implementation and weight dequantization logic within the test suite, suggesting the extraction of these routines into helper functions to improve maintainability.

Comment thread tests/kernels/moe/test_ocp_mx_moe.py Outdated
Comment thread tests/kernels/moe/test_ocp_mx_moe.py Outdated
BowenBao added 2 commits May 5, 2026 21:25
Signed-off-by: Bowen Bao <bowenbao@amd.com>
Signed-off-by: Bowen Bao <bowenbao@amd.com>
@BowenBao BowenBao force-pushed the bowenbao/oracle_w4a4_final branch from c40cbeb to 0397d4e Compare May 5, 2026 21:29
BowenBao added 2 commits May 5, 2026 21:31
Signed-off-by: Bowen Bao <bowenbao@amd.com>
Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>

Signed-off-by: Bowen Bao <bowenbao@amd.com>
@BowenBao BowenBao marked this pull request as ready for review May 5, 2026 22:34
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

gpt-oss Related to GPT-OSS models rocm Related to AMD ROCm

Projects

Status: Todo
Status: To Triage

Development

Successfully merging this pull request may close these issues.

1 participant