[ROCm][Bugfix] Forward router-weight flag in Quark OCP MX AITER MoE by Bortlesboat · Pull Request #40300 · vllm-project/vllm

Bortlesboat · 2026-04-19T16:35:54Z

This keeps the Quark OCP MX AITER apply path from silently falling back at runtime after weights may have been prepared for AITER/CK, and forwards apply_router_weight_on_input into rocm_aiter_fused_experts so routing-weight-on-input models preserve the same semantics as the modular and fused fallback paths.

Why this is not duplicating an existing PR:

gh issue view 40008 --repo vllm-project/vllm --comments
gh pr list --repo vllm-project/vllm --state open --search "40008 in:body" returned no matches when this PR was opened.
gh pr list --repo vllm-project/vllm --state open --search "Unsupported kernel config for moe heuristic dispatch" returned no matches when this PR was opened.
gh pr list --repo vllm-project/vllm --state open --search "moe heuristic dispatch rocm" only found [ROCm] Enable dual-stream MoE shared experts, AITER sparse MLA workaround, and GLM-5-FP8 weight loading fix #38665, which is a different ROCm lane.

Tests run:

wsl.exe --cd /mnt/c/Users/andre/.config/superpowers/worktrees/vllm/vllm-rocm-moe-heuristic-40008 bash -lc 'PYTHONPATH=$PWD uv run --no-project --python 3.12 --with torch==2.11.0 --with pytest --with-requirements requirements/common.txt --with-requirements requirements/test/cuda.in python -m pytest tests/model_executor/layers/test_quark_ocp_mx_moe.py -q' -> 2 passed
wsl.exe --cd /mnt/c/Users/andre/.config/superpowers/worktrees/vllm/vllm-rocm-moe-heuristic-40008 bash -lc 'uv run --no-project --python 3.12 --with ruff ruff check vllm/model_executor/layers/quantization/quark/quark_moe.py tests/model_executor/layers/test_quark_ocp_mx_moe.py' -> All checks passed!
wsl.exe --cd /mnt/c/Users/andre/.config/superpowers/worktrees/vllm/vllm-rocm-moe-heuristic-40008 bash -lc 'uv run --no-project --python 3.12 --with ruff ruff format --check vllm/model_executor/layers/quantization/quark/quark_moe.py tests/model_executor/layers/test_quark_ocp_mx_moe.py' -> 2 files already formatted
git diff --check -> passed

AI assistance: I used Codex to help draft the patch and regression tests, then reviewed the diff and local verification before updating this PR.

gemini-code-assist

Code Review

This pull request introduces a fallback mechanism in the Quark OCP MX MoE implementation to use the Triton-based fused_experts kernel when the ROCm AITER kernel encounters an unsupported configuration. However, the review feedback identifies a critical flaw: weights are shuffled into a specific layout for AITER during the loading process, making them incompatible with the standard layout expected by the fallback kernel, which would lead to silent data corruption. Additionally, the rocm_aiter_fused_experts call is missing the apply_router_weight_on_input argument, and the new unit tests are noted to be misleading as they do not account for the weight layout mismatch that would occur in production.

claude

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

Co-authored-by: OpenAI Codex <codex@openai.com> Signed-off-by: Bortlesboat <bortstheboat@gmail.com>

Signed-off-by: Bortlesboat <bortstheboat@gmail.com>

Bortlesboat · 2026-04-30T00:30:43Z

Closing — #39801 (merged Apr 27) introduces an _AITER_NATIVE_OCP_MX_SCHEMES = ("w_mxfp4",) allowlist in QuarkOCP_MX_MoEMethod.__init__ that sets emulate=True for the mixed schemes (w_mxfp4_a_mxfp6_*, w_fp6_e3m2_a_fp6_e3m2, etc.) before the AITER weight shuffle. That's the cleaner fix for #40008: it avoids the silent-correctness risk @gemini-code-assist flagged here (the runtime fallback would have run fused_experts on AITER-shuffled weights).

If MI355 still hits the heuristic dispatch error on the allowlisted w_mxfp4 scheme specifically, that's worth its own narrower repro on a separate issue.

Signed-off-by: Bortlesboat <bortstheboat@gmail.com>

Bortlesboat · 2026-04-30T17:50:23Z

Closing per the comment above — superseded by #39801 (merged Apr 27).

mergify Bot added rocm Related to AMD ROCm bug Something isn't working labels Apr 19, 2026

github-project-automation Bot added this to AMD Apr 19, 2026

github-project-automation Bot moved this to Todo in AMD Apr 19, 2026

gemini-code-assist Bot reviewed Apr 19, 2026

View reviewed changes

Comment thread vllm/model_executor/layers/quantization/quark/quark_moe.py Outdated

Comment thread vllm/model_executor/layers/quantization/quark/quark_moe.py Outdated

Comment thread tests/model_executor/layers/test_quark_ocp_mx_moe.py Outdated

Bortlesboat marked this pull request as ready for review April 20, 2026 04:45

Bortlesboat requested a review from tjtanaa as a code owner April 20, 2026 04:45

claude Bot reviewed Apr 20, 2026

View reviewed changes

Bortlesboat and others added 2 commits April 29, 2026 14:34

[ROCm][Bugfix] Fall back when Quark MoE AITER dispatch is unsupported

ed259dc

Co-authored-by: OpenAI Codex <codex@openai.com> Signed-off-by: Bortlesboat <bortstheboat@gmail.com>

chore: retrigger CI

8de7176

Signed-off-by: Bortlesboat <bortstheboat@gmail.com>

Bortlesboat force-pushed the codex/vllm-rocm-moe-heuristic-40008 branch from bcc243f to 8de7176 Compare April 29, 2026 18:42

Merge branch 'main' into codex/vllm-rocm-moe-heuristic-40008

0ddaaa7

Bortlesboat closed this Apr 30, 2026

github-project-automation Bot moved this from Todo to Done in AMD Apr 30, 2026

Bortlesboat changed the title ~~[ROCm][Bugfix] Fall back when Quark MoE AITER dispatch is unsupported~~ [ROCm][Bugfix] Forward router-weight flag in Quark OCP MX AITER MoE Apr 30, 2026

Address Quark OCP MX AITER review feedback

8b48945

Signed-off-by: Bortlesboat <bortstheboat@gmail.com>

Bortlesboat reopened this Apr 30, 2026

Bortlesboat closed this Apr 30, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ROCm][Bugfix] Forward router-weight flag in Quark OCP MX AITER MoE#40300

[ROCm][Bugfix] Forward router-weight flag in Quark OCP MX AITER MoE#40300
Bortlesboat wants to merge 4 commits intovllm-project:mainfrom
Bortlesboat:codex/vllm-rocm-moe-heuristic-40008

Bortlesboat commented Apr 19, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

claude Bot left a comment

Uh oh!

Bortlesboat commented Apr 30, 2026

Uh oh!

Bortlesboat commented Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Bortlesboat commented Apr 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

Bortlesboat commented Apr 30, 2026

Uh oh!

Bortlesboat commented Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Bortlesboat commented Apr 19, 2026 •

edited

Loading