[MoE Refactor] Rename FusedMoE.make_expert_params_mapping to fused_moe_make_expert_params_mapping#40671
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a standalone function fused_moe_make_expert_params_mapping in the fused_moe layer and updates numerous model implementations to use this new interface for expert parameter mapping. A critical issue was identified in the new function signature, which lacks default values for several parameters; this will cause runtime TypeError exceptions at call sites that only provide a subset of the required arguments. Providing default values for these parameters is necessary to maintain compatibility with existing model implementations.
| # This is a temporary forwarding method which will be removed/modified layer. | ||
| def fused_moe_make_expert_params_mapping( | ||
| model: torch.nn.Module, | ||
| ckpt_gate_proj_name: str, | ||
| ckpt_down_proj_name: str, | ||
| ckpt_up_proj_name: str, | ||
| num_experts: int, | ||
| num_redundant_experts: int = 0, |
There was a problem hiding this comment.
The new function fused_moe_make_expert_params_mapping is missing default values for ckpt_up_proj_name and num_experts. Many models (e.g., AXK1, afmoe, bailing_moe, deepseek_eagle) call this function with only 3 arguments (model, ckpt_gate_proj_name, ckpt_down_proj_name), which will now result in a TypeError at runtime because the function expects 5 required arguments. \n\nYou should provide default values for these parameters to maintain compatibility with existing model implementations that do not specify them.
def fused_moe_make_expert_params_mapping(\n model: torch.nn.Module,\n ckpt_gate_proj_name: str,\n ckpt_down_proj_name: str,\n ckpt_up_proj_name: Optional[str] = None,\n num_experts: int = 0,\n num_redundant_experts: int = 0,\n) -> list[tuple[str, str, int, str]]:Head branch was pushed to by a user without write access
…e_make_expert_params_mapping (vllm-project#40671) Signed-off-by: Bill Nell <bnell@redhat.com> Signed-off-by: Avinash Singh <avinashsingh.rcoem@gmail.com>
…e_make_expert_params_mapping (vllm-project#40671) Signed-off-by: Bill Nell <bnell@redhat.com> Signed-off-by: Adrian <info@zzit.ch>
…e_make_expert_params_mapping (vllm-project#40671) Signed-off-by: Bill Nell <bnell@redhat.com> Co-authored-by: hongbolv <33214277+hongbolv@users.noreply.github.com>
### What this PR does / why we need it? Based on #8856. Sync to vLLM `4d51588e2381018348f1022dfa3a7698899805b7`. Fix: --- - MoE refactor @wxsIcey, introduced by vllm-project/vllm#35782, vllm-project/vllm#35949, vllm-project/vllm#40560, vllm-project/vllm#40671. - `TypeError: rejection_sample() got an unexpected keyword argument 'synthetic_mode'` -> Add `synthetic_mode` and `synthetic_conditional_rates` param to ascend `rejection_sample()`. --- | # | Error | Category | Upstream Commit | Affected vllm-ascend Path | Fix | | :- | :------------------------------------------------- | :-------- | :----------------------------------------------- | :--------------------------------------------------------- | :----------------------------------------------- | | 1 | `encoder_compilation_time` AttributeError | Code Bug | `c08f3b2a6` ([#39240](vllm-project/vllm#39240)) | `worker/worker.py:567` | `getattr` fallback | | 2 | `AscendRMSNormGated activation` TypeError | Code Bug | `893611813` ([#40245](vllm-project/vllm#40245)) | `ops/layernorm.py:160`, `_310p/ops/layernorm.py:43` | Accept `activation` kwarg | | 3 | `AscendFusedMoEMethod.apply topk_weights` TypeError| Code Bug | many (e.g., `5e584ce9e` ([#35782](vllm-project/vllm#35782)), `809d83c2d` ([#40560](vllm-project/vllm#40560)), `4d51588e2` ([#40860](vllm-project/vllm#40860))) | `ops/fused_moe/fused_moe.py:107` | Major refactor — follow-up PR | | 4 | `_all_lora_classes` is tuple | Code Bug | `a250f1bd5` ([#35077](vllm-project/vllm#35077)) | `lora/utils.py:188` | Rebuild tuple instead of `.add()` | | 5 | `ProfilingChunkScheduler hash_block_size` TypeError| Code Bug | `7b1bc0a3e` ([#40946](vllm-project/vllm#40946)) | `core/scheduler_profiling_chunk.py:57` | Forward new kwarg | | 6 | `_moe_C.topk_softmax` AttributeError | Code Bug | MoE router refactor | router dispatch override needed | Provide `torch_npu` topk-softmax (with Issue 4) | | 7 | global experts shape mismatch | Code Bug | follow-on of Issue 4 | `quantization/methods/w8a8_dynamic.py:198` | Resolve once Issue 4 is fixed | - vLLM main: vllm-project/vllm@d886c26 --------- Signed-off-by: wxsIcey <1790571317@qq.com> Signed-off-by: Shanshan Shen <87969357+shen-shanshan@users.noreply.github.com> Signed-off-by: gcanlin <canlinguosdu@gmail.com> Signed-off-by: shen-shanshan <467638484@qq.com> Co-authored-by: wxsIcey <1790571317@qq.com> Co-authored-by: gcanlin <canlinguosdu@gmail.com>
Purpose
This is prep work for deleting the
FusedMoEclass and replacing it withMoERunner. This PR is just a rename ofFusedMoE.make_expert_params_mappingtofused_moe_make_expert_params_mappingfor models that use the function.Test Plan
CI
Test Result
cc @robertgshaw2-redhat , @yzong-rh
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.