[Model] Apply SharedFusedMoE to glm4_moe.#24849
[Model] Apply SharedFusedMoE to glm4_moe.#24849DarkLight1337 merged 2 commits intovllm-project:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request refactors Glm4MoE to use SharedFusedMoE when shared experts are present, which is a good improvement for flexibility and performance optimization on Ascend hardware. However, I've found a few critical issues in the implementation that need to be addressed. There's a bug in the forward method that will cause a TypeError during runtime, and another issue in the __init__ method that leads to incorrect model behavior by ignoring the routed_scaling_factor. I've also suggested a refactoring to improve code clarity and maintainability by reducing duplication.
Signed-off-by: whx-sjtu <2952154980@qq.com>
1d21408 to
036509e
Compare
3305fe1 to
b1795cf
Compare
|
I only added SharedFusedMoE support for deepseek and llama4 since adding it everywhere would have been more disruptive. There's no reason it can't be applied to other models that use shared experts. |
|
Label |
Signed-off-by: whx-sjtu <2952154980@qq.com>
Signed-off-by: whx-sjtu <2952154980@qq.com> Signed-off-by: charlifu <charlifu@amd.com>
Signed-off-by: whx-sjtu <2952154980@qq.com>
Purpose
The class
SharedFusedMoEwas proposed by @bnellnm in PR #23273. The model glm4_moe has shared experts but we don't useSharedFusedMoEfor glm4_moe, I'm not sure why, please cc @bnellnm.Let glm4_moe uses SharedFusedMoE can solve two problems in vllm-ascend:
maybe_all_reduce_tensor_model_parallel, and sometimes perform all-reduce of shared experts independently. However in currently glm4_moe modeling, whether perform all-reduce of shared experts independently is determined for sure in init bymust_reduce_shared_expert_outputs, which conficts our implemention.SharedFusedMoE.In conclusion, I think we should use
SharedFusedMoEto glm4_moe. cc @bnellnm @robertgshaw2-redhat @wangxiyuan @LucasWilkinson @YikunTest Plan
No need to add new test.
Test ResultResult
all tests should pass
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.