[ROCm][CI] Extended Fused MoE and FP8 MoE test support by AndreasKaratzas · Pull Request #41100 · vllm-project/vllm

AndreasKaratzas · 2026-04-28T05:57:04Z

This PR makes the fused MoE layer test matrix usable on ROCm/MI355 by fixing the real ModelOpt FP8/FP4 failures it exposes and by making distributed subcase failures visible to pytest.

Key changes:

Propagate fused MoE distributed subcase failures back to the parent pytest process instead of allowing child-rank failures to print as failed subcases while the parent test reports PASSED.
Avoid collecting invalid no-parallel feature combinations where routed_input_transform or gate is requested without shared_experts.
Allow modelopt_fp4 MoE test configs on ROCm and SM90+ paths, where native or emulated NVFP4 execution is available.
Use the existing NVFP4 reference quantization path to create packed FP4 test weights on ROCm, since ops.scaled_fp4_quant is not available there.
Keep NVFP4 emulation lookup tensors on the same device as the packed FP4 input during dequantization.
Keep ModelOpt FP8 tensor-wise MoE activation scales as rank-1 tensors after reduction so Triton receives loadable scale pointers rather than constexpr scalar values.
Gate the experimental MoRI fused MoE layer matrix behind VLLM_TEST_ENABLE_MORI_MOE_LAYER=1; when enabled, the test sets the AITER fused MoE env requirements and disables AITER shared expert fusion.
Enable the modular OAI Triton MoE test on CUDA-like platforms and pad MXFP4 test weights/inputs to the CDNA4 scale-layout alignment on ROCm while slicing outputs back to the original test shape.

cc @kenroche

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

gemini-code-assist

Code Review

This pull request updates the MoE (Mixture of Experts) testing infrastructure and kernel implementations. Key changes include adding new Buildkite test configurations for AMD hardware, implementing FP4 emulation for ROCm, and improving the MoE layer test runner to handle distributed failures more robustly with temporary failure reports. Additionally, minor adjustments were made to Triton kernel inputs and quantization utilities to ensure compatibility across different platforms. I have no feedback to provide as there were no review comments.

[ROCm][CI] Extended Fused MoE and FP8 MoE test support

b0d7c3e

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

mergify Bot added ci/build rocm Related to AMD ROCm labels Apr 28, 2026

github-project-automation Bot added this to AMD Apr 28, 2026

github-project-automation Bot moved this to Todo in AMD Apr 28, 2026

AndreasKaratzas mentioned this pull request Apr 28, 2026

[ROCm][CI] Upgrade ROCm quantized MoE coverage #40943

Draft

gemini-code-assist Bot reviewed Apr 28, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ROCm][CI] Extended Fused MoE and FP8 MoE test support#41100

[ROCm][CI] Extended Fused MoE and FP8 MoE test support#41100
AndreasKaratzas wants to merge 1 commit intovllm-project:mainfrom
ROCm:akaratza_ci_fusedmoe_modelopt

AndreasKaratzas commented Apr 28, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

AndreasKaratzas commented Apr 28, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant