[ROCm][Bugfix] Forward router-weight flag in Quark OCP MX AITER MoE#40300
[ROCm][Bugfix] Forward router-weight flag in Quark OCP MX AITER MoE#40300Bortlesboat wants to merge 4 commits intovllm-project:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a fallback mechanism in the Quark OCP MX MoE implementation to use the Triton-based fused_experts kernel when the ROCm AITER kernel encounters an unsupported configuration. However, the review feedback identifies a critical flaw: weights are shuffled into a specific layout for AITER during the loading process, making them incompatible with the standard layout expected by the fallback kernel, which would lead to silent data corruption. Additionally, the rocm_aiter_fused_experts call is missing the apply_router_weight_on_input argument, and the new unit tests are noted to be misleading as they do not account for the weight layout mismatch that would occur in production.
Co-authored-by: OpenAI Codex <codex@openai.com> Signed-off-by: Bortlesboat <bortstheboat@gmail.com>
Signed-off-by: Bortlesboat <bortstheboat@gmail.com>
bcc243f to
8de7176
Compare
|
Closing — #39801 (merged Apr 27) introduces an If MI355 still hits the heuristic dispatch error on the allowlisted |
Signed-off-by: Bortlesboat <bortstheboat@gmail.com>
|
Closing per the comment above — superseded by #39801 (merged Apr 27). |
Refs #40008.
This keeps the Quark OCP MX AITER apply path from silently falling back at runtime after weights may have been prepared for AITER/CK, and forwards
apply_router_weight_on_inputintorocm_aiter_fused_expertsso routing-weight-on-input models preserve the same semantics as the modular and fused fallback paths.Why this is not duplicating an existing PR:
gh issue view 40008 --repo vllm-project/vllm --commentsgh pr list --repo vllm-project/vllm --state open --search "40008 in:body"returned no matches when this PR was opened.gh pr list --repo vllm-project/vllm --state open --search "Unsupported kernel config for moe heuristic dispatch"returned no matches when this PR was opened.gh pr list --repo vllm-project/vllm --state open --search "moe heuristic dispatch rocm"only found [ROCm] Enable dual-stream MoE shared experts, AITER sparse MLA workaround, and GLM-5-FP8 weight loading fix #38665, which is a different ROCm lane.Tests run:
wsl.exe --cd /mnt/c/Users/andre/.config/superpowers/worktrees/vllm/vllm-rocm-moe-heuristic-40008 bash -lc 'PYTHONPATH=$PWD uv run --no-project --python 3.12 --with torch==2.11.0 --with pytest --with-requirements requirements/common.txt --with-requirements requirements/test/cuda.in python -m pytest tests/model_executor/layers/test_quark_ocp_mx_moe.py -q'->2 passedwsl.exe --cd /mnt/c/Users/andre/.config/superpowers/worktrees/vllm/vllm-rocm-moe-heuristic-40008 bash -lc 'uv run --no-project --python 3.12 --with ruff ruff check vllm/model_executor/layers/quantization/quark/quark_moe.py tests/model_executor/layers/test_quark_ocp_mx_moe.py'->All checks passed!wsl.exe --cd /mnt/c/Users/andre/.config/superpowers/worktrees/vllm/vllm-rocm-moe-heuristic-40008 bash -lc 'uv run --no-project --python 3.12 --with ruff ruff format --check vllm/model_executor/layers/quantization/quark/quark_moe.py tests/model_executor/layers/test_quark_ocp_mx_moe.py'->2 files already formattedgit diff --check-> passedAI assistance: I used Codex to help draft the patch and regression tests, then reviewed the diff and local verification before updating this PR.