[ROCm][CI] Upgrade ROCm quantized MoE coverage#40943
[ROCm][CI] Upgrade ROCm quantized MoE coverage#40943AndreasKaratzas wants to merge 3 commits intovllm-project:mainfrom
Conversation
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
There was a problem hiding this comment.
Code Review
This pull request significantly expands the test suite for Quark quantization and MoE models, specifically targeting ROCm platforms like gfx950/MI355. Key changes include the addition of comprehensive initialization tests for various quantized MoE models, new CI pipeline steps for AMD hardware, and a major expansion of the Quark unit and accuracy tests. Feedback was provided regarding an excessively high timeout value in the remote server initialization for tests, which could lead to CI blockage.
| model, | ||
| server_args, | ||
| env_dict=env, | ||
| max_wait_seconds=1500, |
There was a problem hiding this comment.
The max_wait_seconds is set to 1500 (25 minutes), which is exceptionally high for a test using the dummy load format. While large models can take time to initialize, dummy loading (which skips disk I/O for weights) should typically complete within a few minutes. Such a long timeout can lead to significant CI blockage if a regression causes the server to hang or fail silently. Consider reducing this to a more reasonable value (e.g., 300-600 seconds).
There was a problem hiding this comment.
This directly mirrors blackwell test. However, we might indeed not need that value there. At the same time, it is a timeout so there is no real check here I think.
|
Dependent on: #39801 |
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
| @@ -228,7 +238,14 @@ def __init__( | |||
| "https://github.com/ROCm/aiter for installation details." | |||
| ) | |||
|
|
|||
| if not current_platform.supports_mx(): | |||
| if self.force_rocm_mxfp4_emulation: | |||
| logger.warning_once( | |||
| "ROCm native Quark OCP MX dynamic GEMM for w_mxfp4_a_mxfp4 " | |||
| "is temporarily disabled due to correctness issues. Falling " | |||
| "back to simulated weight dequantization and activation QDQ " | |||
| "with high-precision linear layers." | |||
| ) | |||
| elif not current_platform.supports_mx(): | |||
There was a problem hiding this comment.
NOTE: Test without this patch after:
is merged.
|
|
This PR replaces the old MI3xx MoE placeholder with a real ROCm quantized-MoE initialization matrix, restores the broken ROCm Quark MXFP4 path, and wires the matching Quark eval lanes into
test-amd.yaml. Thetest_gfx950_moe.pyside focuses on supported ROCm backend and model-init coverage, whiletest_quark.pykeeps the ROCm Quark correctness and Wikitext/GSM8K story aligned with the product fix. The Quark product change is intentionally narrow and only redirects the proven-bad ROCm native MXFP4 linear path onto the safe fallback.cc @kenroche