[BugFix] Fix Ascend MoE routing expert count with EPLB#8864
[BugFix] Fix Ascend MoE routing expert count with EPLB#8864gcanlin wants to merge 2 commits intovllm-project:mainfrom
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses a bug in Ascend MoE routing where the dynamic EPLB configuration caused a mismatch between logical and physical expert counts. By correctly separating these counts and updating the quantization paths, the fix ensures that router logits and expert selection logic operate on the expected logical expert count, preventing assertion failures in distributed MoE scenarios. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
There was a problem hiding this comment.
Code Review
Suggested PR Title:
[Ops][Feature] Refactor MoE expert logic and unify SharedFusedMoESuggested PR Summary:
### What this PR does / why we need it?
This PR refactors the MoE implementation for Ascend by introducing a centralized `get_moe_num_logical_experts` utility to handle expert counts across various quantization methods (W4A16, W4A4, W8A8). It unifies `AscendSharedFusedMoE` into `AscendFusedMoE`, updates the runner to inherit from the standard `MoERunner`, and adds a consistency validation check for shared expert split computations. Feedback was provided regarding unresolved developer notes in Chinese and hardcoded logic in the `finalize` call within `fused_moe.py`.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
New unit tests were added in `tests/ut/quantization/methods/test_moe_logical_experts.py` to verify the logical expert calculation.Signed-off-by: gcanlin <canlinguosdu@gmail.com>
|
The CI error is unrelated to this PR. The bug is from #8831. It doesn't adapt to vllm main. vllm-project/vllm#39446 broke it. |
Summary
Fix Ascend MoE dynamic EPLB routing after the upstream vLLM MoE/EPLB refactor.
Upstream vLLM now distinguishes:
router_logits.shape[-1]matches the logical expert count, but Ascend MoE quant paths were comparing it againstmoe_config.num_experts, which can include redundant physical experts when dynamic EPLB is enabled. This caused:AssertionError: Number of global experts mismatch (excluding redundancy) in the Qwen3 MoE W8A8 dynamic EPLB TP2 test.
Changes
Root cause
vLLM upstream PRs such as #30623 separated router logic into dedicated router classes and made EPLB map logical expert IDs to physical expert IDs after top-k selection. Ascend code still treated
moe_config.num_expertsas therouter-logitsexpert count, but with dynamic EPLB it represents physical/global experts.Test
Test Result