Skip to content

[ROCm][Bugfix] Forward router-weight flag in Quark OCP MX AITER MoE#40300

Closed
Bortlesboat wants to merge 4 commits intovllm-project:mainfrom
Bortlesboat:codex/vllm-rocm-moe-heuristic-40008
Closed

[ROCm][Bugfix] Forward router-weight flag in Quark OCP MX AITER MoE#40300
Bortlesboat wants to merge 4 commits intovllm-project:mainfrom
Bortlesboat:codex/vllm-rocm-moe-heuristic-40008

Conversation

@Bortlesboat
Copy link
Copy Markdown
Contributor

@Bortlesboat Bortlesboat commented Apr 19, 2026

Refs #40008.

This keeps the Quark OCP MX AITER apply path from silently falling back at runtime after weights may have been prepared for AITER/CK, and forwards apply_router_weight_on_input into rocm_aiter_fused_experts so routing-weight-on-input models preserve the same semantics as the modular and fused fallback paths.

Why this is not duplicating an existing PR:

  • gh issue view 40008 --repo vllm-project/vllm --comments
  • gh pr list --repo vllm-project/vllm --state open --search "40008 in:body" returned no matches when this PR was opened.
  • gh pr list --repo vllm-project/vllm --state open --search "Unsupported kernel config for moe heuristic dispatch" returned no matches when this PR was opened.
  • gh pr list --repo vllm-project/vllm --state open --search "moe heuristic dispatch rocm" only found [ROCm] Enable dual-stream MoE shared experts, AITER sparse MLA workaround, and GLM-5-FP8 weight loading fix #38665, which is a different ROCm lane.

Tests run:

  • wsl.exe --cd /mnt/c/Users/andre/.config/superpowers/worktrees/vllm/vllm-rocm-moe-heuristic-40008 bash -lc 'PYTHONPATH=$PWD uv run --no-project --python 3.12 --with torch==2.11.0 --with pytest --with-requirements requirements/common.txt --with-requirements requirements/test/cuda.in python -m pytest tests/model_executor/layers/test_quark_ocp_mx_moe.py -q' -> 2 passed
  • wsl.exe --cd /mnt/c/Users/andre/.config/superpowers/worktrees/vllm/vllm-rocm-moe-heuristic-40008 bash -lc 'uv run --no-project --python 3.12 --with ruff ruff check vllm/model_executor/layers/quantization/quark/quark_moe.py tests/model_executor/layers/test_quark_ocp_mx_moe.py' -> All checks passed!
  • wsl.exe --cd /mnt/c/Users/andre/.config/superpowers/worktrees/vllm/vllm-rocm-moe-heuristic-40008 bash -lc 'uv run --no-project --python 3.12 --with ruff ruff format --check vllm/model_executor/layers/quantization/quark/quark_moe.py tests/model_executor/layers/test_quark_ocp_mx_moe.py' -> 2 files already formatted
  • git diff --check -> passed

AI assistance: I used Codex to help draft the patch and regression tests, then reviewed the diff and local verification before updating this PR.

@mergify mergify Bot added rocm Related to AMD ROCm bug Something isn't working labels Apr 19, 2026
@github-project-automation github-project-automation Bot moved this to Todo in AMD Apr 19, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a fallback mechanism in the Quark OCP MX MoE implementation to use the Triton-based fused_experts kernel when the ROCm AITER kernel encounters an unsupported configuration. However, the review feedback identifies a critical flaw: weights are shuffled into a specific layout for AITER during the loading process, making them incompatible with the standard layout expected by the fallback kernel, which would lead to silent data corruption. Additionally, the rocm_aiter_fused_experts call is missing the apply_router_weight_on_input argument, and the new unit tests are noted to be misleading as they do not account for the weight layout mismatch that would occur in production.

Comment thread vllm/model_executor/layers/quantization/quark/quark_moe.py Outdated
Comment thread vllm/model_executor/layers/quantization/quark/quark_moe.py Outdated
Comment thread tests/model_executor/layers/test_quark_ocp_mx_moe.py Outdated
@Bortlesboat Bortlesboat marked this pull request as ready for review April 20, 2026 04:45
@Bortlesboat Bortlesboat requested a review from tjtanaa as a code owner April 20, 2026 04:45
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

Bortlesboat and others added 2 commits April 29, 2026 14:34
Co-authored-by: OpenAI Codex <codex@openai.com>
Signed-off-by: Bortlesboat <bortstheboat@gmail.com>
Signed-off-by: Bortlesboat <bortstheboat@gmail.com>
@Bortlesboat Bortlesboat force-pushed the codex/vllm-rocm-moe-heuristic-40008 branch from bcc243f to 8de7176 Compare April 29, 2026 18:42
@Bortlesboat
Copy link
Copy Markdown
Contributor Author

Closing — #39801 (merged Apr 27) introduces an _AITER_NATIVE_OCP_MX_SCHEMES = ("w_mxfp4",) allowlist in QuarkOCP_MX_MoEMethod.__init__ that sets emulate=True for the mixed schemes (w_mxfp4_a_mxfp6_*, w_fp6_e3m2_a_fp6_e3m2, etc.) before the AITER weight shuffle. That's the cleaner fix for #40008: it avoids the silent-correctness risk @gemini-code-assist flagged here (the runtime fallback would have run fused_experts on AITER-shuffled weights).

If MI355 still hits the heuristic dispatch error on the allowlisted w_mxfp4 scheme specifically, that's worth its own narrower repro on a separate issue.

@github-project-automation github-project-automation Bot moved this from Todo to Done in AMD Apr 30, 2026
@Bortlesboat Bortlesboat changed the title [ROCm][Bugfix] Fall back when Quark MoE AITER dispatch is unsupported [ROCm][Bugfix] Forward router-weight flag in Quark OCP MX AITER MoE Apr 30, 2026
Signed-off-by: Bortlesboat <bortstheboat@gmail.com>
@Bortlesboat Bortlesboat reopened this Apr 30, 2026
@Bortlesboat
Copy link
Copy Markdown
Contributor Author

Closing per the comment above — superseded by #39801 (merged Apr 27).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working rocm Related to AMD ROCm

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

1 participant