[ROCm][Quantization][3/N] Refactor quark_moe w4a4 w/ oracle by BowenBao · Pull Request #41436 · vllm-project/vllm

BowenBao · 2026-05-01T01:04:46Z

Includes changes from #39136

refactor quark_moe w4a4 to use oracle.
all quark_moe mxfp4 configs now use oracle.
add qwen3.5 mxfp4 w4a4 eval test to cover both aiter and emulation backend.
updates error messages to be more actionable for oracle & aiter experts when no backend is available.

gemini-code-assist

Code Review

This pull request refactors and extends ROCm MXFP4 MoE support by introducing specialized AITER backends for W4A16, W4A8, and W4A4 quantization schemes. Key changes include the implementation of AiterW4A8ExpertsMonolithic using Triton kernels, the addition of AiterMxfp4Experts for W4A4, and a significant refactor of the Quark MoE method to utilize centralized oracle-based backend selection. The PR also enhances the testing suite with new oracle-based execution tests and GFX950-specific validation. Feedback is provided regarding code duplication in the reference MoE implementation and weight dequantization logic within the test suite, suggesting the extraction of these routines into helper functions to improve maintainability.

Signed-off-by: Bowen Bao <bowenbao@amd.com>

Co-Authored-By: Claude Opus 4 <noreply@anthropic.com> Signed-off-by: Bowen Bao <bowenbao@amd.com>

claude

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

mergify Bot added gpt-oss Related to GPT-OSS models rocm Related to AMD ROCm labels May 1, 2026

github-project-automation Bot added this to gpt-oss Issues & Enhancements and AMD May 1, 2026

github-project-automation Bot moved this to Todo in AMD May 1, 2026

github-project-automation Bot moved this to To Triage in gpt-oss Issues & Enhancements May 1, 2026

gemini-code-assist Bot reviewed May 1, 2026

View reviewed changes

Comment thread tests/kernels/moe/test_ocp_mx_moe.py Outdated

Comment thread tests/kernels/moe/test_ocp_mx_moe.py Outdated

BowenBao mentioned this pull request May 1, 2026

[ROCm][Quantization][2/N] Refactor quark_moe w4a8 w/ oracle #39136

Merged

BowenBao added 2 commits May 5, 2026 21:25

w4a4

7c6d408

Signed-off-by: Bowen Bao <bowenbao@amd.com>

tests; dead code cleanup

0397d4e

Signed-off-by: Bowen Bao <bowenbao@amd.com>

BowenBao force-pushed the bowenbao/oracle_w4a4_final branch from c40cbeb to 0397d4e Compare May 5, 2026 21:29

BowenBao added 2 commits May 5, 2026 21:31

new mxfp4 eval configs

358b556

Signed-off-by: Bowen Bao <bowenbao@amd.com>

Revert changes to test_ocp_mx_moe.py

0890ebf

Co-Authored-By: Claude Opus 4 <noreply@anthropic.com> Signed-off-by: Bowen Bao <bowenbao@amd.com>

BowenBao marked this pull request as ready for review May 5, 2026 22:34

BowenBao requested review from mgoin, pavanimajety, robertgshaw2-redhat, tjtanaa, vadiklyutiy and yewentao256 as code owners May 5, 2026 22:34

claude Bot reviewed May 5, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ROCm][Quantization][3/N] Refactor quark_moe w4a4 w/ oracle#41436

[ROCm][Quantization][3/N] Refactor quark_moe w4a4 w/ oracle#41436
BowenBao wants to merge 4 commits intovllm-project:mainfrom
BowenBao:bowenbao/oracle_w4a4_final

BowenBao commented May 1, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

claude Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

BowenBao commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

BowenBao commented May 1, 2026 •

edited

Loading