[ROCm][CI] Upgrade ROCm quantized MoE coverage by AndreasKaratzas · Pull Request #40943 · vllm-project/vllm

AndreasKaratzas · 2026-04-26T23:07:51Z

This PR replaces the old MI3xx MoE placeholder with a real ROCm quantized-MoE initialization matrix, restores the broken ROCm Quark MXFP4 path, and wires the matching Quark eval lanes into test-amd.yaml. The test_gfx950_moe.py side focuses on supported ROCm backend and model-init coverage, while test_quark.py keeps the ROCm Quark correctness and Wikitext/GSM8K story aligned with the product fix. The Quark product change is intentionally narrow and only redirects the proven-bad ROCm native MXFP4 linear path onto the safe fallback.

cc @kenroche

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

gemini-code-assist

Code Review

This pull request significantly expands the test suite for Quark quantization and MoE models, specifically targeting ROCm platforms like gfx950/MI355. Key changes include the addition of comprehensive initialization tests for various quantized MoE models, new CI pipeline steps for AMD hardware, and a major expansion of the Quark unit and accuracy tests. Feedback was provided regarding an excessively high timeout value in the remote server initialization for tests, which could lead to CI blockage.

gemini-code-assist · 2026-04-26T23:12:14Z

+        model,
+        server_args,
+        env_dict=env,
+        max_wait_seconds=1500,


The max_wait_seconds is set to 1500 (25 minutes), which is exceptionally high for a test using the dummy load format. While large models can take time to initialize, dummy loading (which skips disk I/O for weights) should typically complete within a few minutes. Such a long timeout can lead to significant CI blockage if a regression causes the server to hang or fail silently. Consider reducing this to a more reasonable value (e.g., 300-600 seconds).

This directly mirrors blackwell test. However, we might indeed not need that value there. At the same time, it is a timeout so there is no real check here I think.

AndreasKaratzas · 2026-04-27T00:35:18Z

Dependent on: #39801

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

AndreasKaratzas · 2026-04-27T08:05:11Z

@@ -228,7 +238,14 @@ def __init__(
                "https://github.com/ROCm/aiter for installation details."
            )

-        if not current_platform.supports_mx():
+        if self.force_rocm_mxfp4_emulation:
+            logger.warning_once(
+                "ROCm native Quark OCP MX dynamic GEMM for w_mxfp4_a_mxfp4 "
+                "is temporarily disabled due to correctness issues. Falling "
+                "back to simulated weight dequantization and activation QDQ "
+                "with high-precision linear layers."
+            )
+        elif not current_platform.supports_mx():


NOTE: Test without this patch after:

[ROCm][CI] Add missing quantization methods and fix online quant test failures #39801

is merged.

AndreasKaratzas · 2026-04-28T06:00:20Z

tests/kernels/moe/test_modular_oai_triton_moe.py is addressed in #41100

[ROCm][CI] Upgrade ROCm quantized MoE coverage

c8b4ff9

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

mergify Bot added ci/build rocm Related to AMD ROCm labels Apr 26, 2026

github-project-automation Bot added this to AMD Apr 26, 2026

github-project-automation Bot moved this to Todo in AMD Apr 26, 2026

gemini-code-assist Bot reviewed Apr 26, 2026

View reviewed changes

AndreasKaratzas added 2 commits April 27, 2026 01:20

[ROCm][CI] Upgrade ROCm quantized MoE coverage

1f55f4b

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

[ROCm][Bugfix] Emulate quantized mxfp4 due to numerical corruption

3e622a5

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

AndreasKaratzas commented Apr 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ROCm][CI] Upgrade ROCm quantized MoE coverage#40943

[ROCm][CI] Upgrade ROCm quantized MoE coverage#40943
AndreasKaratzas wants to merge 3 commits intovllm-project:mainfrom
ROCm:akaratza_rocm_quantized_moe

AndreasKaratzas commented Apr 26, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 26, 2026

Uh oh!

AndreasKaratzas Apr 27, 2026

Uh oh!

AndreasKaratzas commented Apr 27, 2026

Uh oh!

AndreasKaratzas Apr 27, 2026

Uh oh!

AndreasKaratzas commented Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

AndreasKaratzas commented Apr 26, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

AndreasKaratzas Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

AndreasKaratzas commented Apr 27, 2026

Uh oh!

AndreasKaratzas Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

AndreasKaratzas commented Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant