[ROCm][AITER][Bugfix] Disable emulation for MoE by heachary · Pull Request #41226 · vllm-project/vllm

heachary · 2026-04-29T11:36:46Z

Purpose

PR #39801 introduced a regression for models using the w_mxfp4_a_mxfp4 scheme (e.g. Kimi-K2-Thinking-MXFP4). The _AITER_NATIVE_OCP_MX_SCHEMES tuple only included w_mxfp4, so w_mxfp4_a_mxfp4 was not recognized as a natively supported scheme and fell back to emulation despite aiter's CK MoE kernel supporting it. This PR adds w_mxfp4_a_mxfp4 to the supported set, restoring native execution.

Test Plan

[x] Added a unit test to make sure emulate is set correctly

Test Result

Signed-off-by: Hemanth Acharya <heachary@amd.com>

gemini-code-assist

Code Review

This pull request adds support for the w_mxfp4_a_mxfp4 scheme to the native AITER CK path in Quark MoE, reducing reliance on emulation. A new test case verifies this behavior. However, the logic for setting the emulate flag is incomplete as it does not check if AITER is actually enabled, which could lead to runtime errors. Additionally, the new test should be parametrized to cover more hardware configurations, and the use of enums for scheme names is suggested for better type safety.

gemini-code-assist · 2026-04-29T11:40:12Z

+        # for `w_mxfp4` (w4a16) and `w_mxfp4_a_mxfp4`; mixed schemes like
+        # `w_mxfp4_a_mxfp6_*` fall through to QuantMethod.NO and raise
+        # "Unsupported kernel config for moe heuristic dispatch".
+        _AITER_NATIVE_OCP_MX_SCHEMES = ("w_mxfp4", "w_mxfp4_a_mxfp4")


The addition of w_mxfp4_a_mxfp4 to _AITER_NATIVE_OCP_MX_SCHEMES highlights a significant bug in the self.emulate logic on lines 1034-1039. Currently, if current_platform.supports_mx() is True and the scheme is in _AITER_NATIVE_OCP_MX_SCHEMES, self.emulate will be False even if self.use_rocm_aiter_moe is False. This is because the and condition on line 1037 evaluates to True (since self.mxfp4_backend is initialized to NONE at line 992 and not yet updated for w_mxfp4), making the result dependent only on the first part of the expression. This will lead to a crash when apply() attempts to use the AITER path while it is disabled. The logic should be corrected to ensure emulation is used whenever AITER is unavailable and no other native backend exists. Additionally, consider using the OCP_MX_Scheme enum members for better type safety.

_AITER_NATIVE_OCP_MX_SCHEMES = (OCP_MX_Scheme.w_mxfp4, OCP_MX_Scheme.w_mxfp4_a_mxfp4)

gemini-code-assist · 2026-04-29T11:40:12Z

+def test_moe_emulation_w_mxfp4_a_mxfp4():
+    """w_mxfp4_a_mxfp4 on gfx950 (supports_mx + aiter) must not fall back
+    to emulation — it should use the native AITER CK path."""
+    from unittest.mock import MagicMock, patch
+
+    from vllm.model_executor.layers.fused_moe import FusedMoEConfig, MoEActivation
+    from vllm.model_executor.layers.fused_moe.config import (
+        FusedMoEParallelConfig,
+        RoutingMethodType,
+    )
+    from vllm.model_executor.layers.quantization.quark.quark_moe import (
+        QuarkOCP_MX_MoEMethod,
+    )
+
+    weight_config = {"dtype": "fp4", "qscheme": "per_group", "is_dynamic": False}
+    input_config = {"dtype": "fp4", "qscheme": "per_group", "is_dynamic": True}
+
+    parallel_config = FusedMoEParallelConfig(
+        tp_size=1,
+        pcp_size=1,
+        dp_size=1,
+        ep_size=1,
+        tp_rank=0,
+        pcp_rank=0,
+        dp_rank=0,
+        ep_rank=0,
+        sp_size=1,
+        use_ep=False,
+        all2all_backend="",
+        enable_eplb=False,
+    )
+    moe = FusedMoEConfig(
+        num_experts=8,
+        experts_per_token=2,
+        hidden_dim=256,
+        intermediate_size_per_partition=512,
+        num_local_experts=8,
+        num_logical_experts=8,
+        activation=MoEActivation.SILU,
+        device="gpu",
+        routing_method=RoutingMethodType.Default,
+        moe_parallel_config=parallel_config,
+        in_dtype=torch.bfloat16,
+    )
+
+    mock_vllm_config = MagicMock()
+
+    with (
+        patch(
+            "vllm.model_executor.layers.quantization.quark.quark_moe.current_platform"
+        ) as mock_platform,
+        patch(
+            "vllm.model_executor.layers.quantization.quark.quark_moe.rocm_aiter_ops"
+        ) as mock_aiter,
+        patch(
+            "vllm.model_executor.layers.quantization.quark.quark_moe"
+            ".get_current_vllm_config",
+            return_value=mock_vllm_config,
+        ),
+    ):
+        mock_platform.supports_mx.return_value = True
+        mock_aiter.is_fused_moe_enabled.return_value = True
+
+        method = QuarkOCP_MX_MoEMethod(
+            weight_config=weight_config,
+            input_config=input_config,
+            moe=moe,
+        )
+
+    assert method.ocp_mx_scheme == "w_mxfp4_a_mxfp4"
+    assert method.emulate is False, (
+        "w_mxfp4_a_mxfp4 on gfx950 (supports_mx + aiter) must not emulate"
+    )


The unit test should be parametrized to cover cases where MX support or AITER is disabled. This would have caught the logic bug in self.emulate where emulation is incorrectly disabled when AITER is unavailable on MX-supporting hardware.

@pytest.mark.parametrize("supports_mx", [True, False]) @pytest.mark.parametrize("is_aiter_enabled", [True, False]) def test_moe_emulation_w_mxfp4_a_mxfp4(supports_mx, is_aiter_enabled): """w_mxfp4_a_mxfp4 on gfx950 (supports_mx + aiter) must not fall back to emulation — it should use the native AITER CK path.""" from unittest.mock import MagicMock, patch from vllm.model_executor.layers.fused_moe import FusedMoEConfig, MoEActivation from vllm.model_executor.layers.fused_moe.config import ( FusedMoEParallelConfig, RoutingMethodType, ) from vllm.model_executor.layers.quantization.quark.quark_moe import ( QuarkOCP_MX_MoEMethod, ) weight_config = {"dtype": "fp4", "qscheme": "per_group", "is_dynamic": False} input_config = {"dtype": "fp4", "qscheme": "per_group", "is_dynamic": True} parallel_config = FusedMoEParallelConfig( tp_size=1, pcp_size=1, dp_size=1, ep_size=1, tp_rank=0, pcp_rank=0, dp_rank=0, ep_rank=0, sp_size=1, use_ep=False, all2all_backend="", enable_eplb=False, ) moe = FusedMoEConfig( num_experts=8, experts_per_token=2, hidden_dim=256, intermediate_size_per_partition=512, num_local_experts=8, num_logical_experts=8, activation=MoEActivation.SILU, device="gpu", routing_method=RoutingMethodType.Default, moe_parallel_config=parallel_config, in_dtype=torch.bfloat16, ) mock_vllm_config = MagicMock() with ( patch( "vllm.model_executor.layers.quantization.quark.quark_moe.current_platform" ) as mock_platform, patch( "vllm.model_executor.layers.quantization.quark.quark_moe.rocm_aiter_ops" ) as mock_aiter, patch( "vllm.model_executor.layers.quantization.quark.quark_moe" ".get_current_vllm_config", return_value=mock_vllm_config, ), ): mock_platform.supports_mx.return_value = supports_mx mock_aiter.is_fused_moe_enabled.return_value = is_aiter_enabled method = QuarkOCP_MX_MoEMethod( weight_config=weight_config, input_config=input_config, moe=moe, ) assert method.ocp_mx_scheme == "w_mxfp4_a_mxfp4" # Emulation should be False only if both MX is supported and AITER is enabled expected_emulate = not (supports_mx and is_aiter_enabled) assert method.emulate == expected_emulate, ( f"Emulation mismatch for supports_mx={supports_mx}, " f"is_aiter_enabled={is_aiter_enabled}")

claude

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

Rohan138 · 2026-04-29T16:10:56Z

duplicate of #41175

heachary added 2 commits April 29, 2026 16:29

[Bugfix] : aiter supports w_mxfp4_a_mxfp4

62e0327

Signed-off-by: Hemanth Acharya <heachary@amd.com>

adding unit test to make sure moe emulation is disabled on gfx950

1b491f7

Signed-off-by: Hemanth Acharya <heachary@amd.com>

mergify Bot added rocm Related to AMD ROCm bug Something isn't working labels Apr 29, 2026

github-project-automation Bot added this to AMD Apr 29, 2026

github-project-automation Bot moved this to Todo in AMD Apr 29, 2026

gemini-code-assist Bot reviewed Apr 29, 2026

View reviewed changes

heachary marked this pull request as ready for review April 29, 2026 16:02

heachary requested a review from tjtanaa as a code owner April 29, 2026 16:02

claude Bot reviewed Apr 29, 2026

View reviewed changes

heachary closed this Apr 29, 2026

github-project-automation Bot moved this from Todo to Done in AMD Apr 29, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ROCm][AITER][Bugfix] Disable emulation for MoE#41226

[ROCm][AITER][Bugfix] Disable emulation for MoE#41226
heachary wants to merge 2 commits intovllm-project:mainfrom
heachary:heachary/kk2/disable-moe-emulation

heachary commented Apr 29, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 29, 2026

Uh oh!

gemini-code-assist Bot Apr 29, 2026

Uh oh!

claude Bot left a comment

Uh oh!

Rohan138 commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

heachary commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

Rohan138 commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

heachary commented Apr 29, 2026 •

edited

Loading