Skip to content

[ROCm][CI] Upgrade ROCm quantized MoE coverage#40943

Draft
AndreasKaratzas wants to merge 3 commits intovllm-project:mainfrom
ROCm:akaratza_rocm_quantized_moe
Draft

[ROCm][CI] Upgrade ROCm quantized MoE coverage#40943
AndreasKaratzas wants to merge 3 commits intovllm-project:mainfrom
ROCm:akaratza_rocm_quantized_moe

Conversation

@AndreasKaratzas
Copy link
Copy Markdown
Collaborator

This PR replaces the old MI3xx MoE placeholder with a real ROCm quantized-MoE initialization matrix, restores the broken ROCm Quark MXFP4 path, and wires the matching Quark eval lanes into test-amd.yaml. The test_gfx950_moe.py side focuses on supported ROCm backend and model-init coverage, while test_quark.py keeps the ROCm Quark correctness and Wikitext/GSM8K story aligned with the product fix. The Quark product change is intentionally narrow and only redirects the proven-bad ROCm native MXFP4 linear path onto the safe fallback.

cc @kenroche

Signed-off-by: Andreas Karatzas <akaratza@amd.com>
@mergify mergify Bot added ci/build rocm Related to AMD ROCm labels Apr 26, 2026
@github-project-automation github-project-automation Bot moved this to Todo in AMD Apr 26, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request significantly expands the test suite for Quark quantization and MoE models, specifically targeting ROCm platforms like gfx950/MI355. Key changes include the addition of comprehensive initialization tests for various quantized MoE models, new CI pipeline steps for AMD hardware, and a major expansion of the Quark unit and accuracy tests. Feedback was provided regarding an excessively high timeout value in the remote server initialization for tests, which could lead to CI blockage.

model,
server_args,
env_dict=env,
max_wait_seconds=1500,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The max_wait_seconds is set to 1500 (25 minutes), which is exceptionally high for a test using the dummy load format. While large models can take time to initialize, dummy loading (which skips disk I/O for weights) should typically complete within a few minutes. Such a long timeout can lead to significant CI blockage if a regression causes the server to hang or fail silently. Consider reducing this to a more reasonable value (e.g., 300-600 seconds).

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This directly mirrors blackwell test. However, we might indeed not need that value there. At the same time, it is a timeout so there is no real check here I think.

@AndreasKaratzas
Copy link
Copy Markdown
Collaborator Author

Dependent on: #39801

Signed-off-by: Andreas Karatzas <akaratza@amd.com>
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
Comment on lines 214 to +248
@@ -228,7 +238,14 @@ def __init__(
"https://github.com/ROCm/aiter for installation details."
)

if not current_platform.supports_mx():
if self.force_rocm_mxfp4_emulation:
logger.warning_once(
"ROCm native Quark OCP MX dynamic GEMM for w_mxfp4_a_mxfp4 "
"is temporarily disabled due to correctness issues. Falling "
"back to simulated weight dequantization and activation QDQ "
"with high-precision linear layers."
)
elif not current_platform.supports_mx():
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AndreasKaratzas
Copy link
Copy Markdown
Collaborator Author

tests/kernels/moe/test_modular_oai_triton_moe.py is addressed in #41100

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build rocm Related to AMD ROCm

Projects

Status: Todo

Development

Successfully merging this pull request may close these issues.

1 participant