[PluggableLayer][3/N] Apply PluggableLayer to moe-related layers.#33556
[PluggableLayer][3/N] Apply PluggableLayer to moe-related layers.#33556ProExpertProg merged 6 commits intovllm-project:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request refactors several Mixture-of-Experts (MoE) related layers to use PluggableLayer instead of CustomOp. The changes are mostly straightforward replacements. However, I found a critical issue in FusedMoE where changing the base class removes the necessary forward method dispatching, which would lead to a runtime error. I've provided a suggestion to fix this by adding an explicit forward method. The other changes look correct.
a74c095 to
38fd266
Compare
| @PluggableLayer.register("modular_fused_moe") | ||
| class FusedMoEModularMethod(FusedMoEMethodBase, PluggableLayer): |
There was a problem hiding this comment.
It needed to be a CustomOp for type correctness (since UnquantizedFusedMoEMethod is also a CustomOp). You end up with weird errors otherwise. If other classes have been converted to Pluggable ops then maybe this can be removed.
There was a problem hiding this comment.
I see, because we return an "MoE method" which could be an instance of UnquantizedFusedMoEMethod or an instance of FusedMoEModularMethod or something else and CustomOp was the common base class? Could we just make nn.Module the shared super class?
There was a problem hiding this comment.
I see, because we return an "MoE method" which could be an instance of
UnquantizedFusedMoEMethodor an instance ofFusedMoEModularMethodor something else andCustomOpwas the common base class? Could we just makenn.Modulethe shared super class?
Feel free to give it a try. IIRC all FusedMoEMethodBase objects are already torch.nn.Modules though.
There was a problem hiding this comment.
So the core problem here is that UnquantizedFusedMoEMethod is a CustomOp, right? Maybe after vLLM IR is ready and UnquantizedFusedMoEMethod is not a CustomOp anymore, then we can just remove FusedMoEModularMethod's original inheritance from CustomOp.
There was a problem hiding this comment.
I will remove the inheritance from CustomOp and try to reproduce this type error.
There was a problem hiding this comment.
@bnellnm Could you please provide the scenario that will cause type errors? I just remove the inheritance from CustomOp and fail to reproduce type-related errors.
There was a problem hiding this comment.
Running a MoE model with a non-naive all2all backend should trigger the problem.
There was a problem hiding this comment.
(EngineCore_DP1 pid=3909056) ERROR 02-12 17:42:49 [core.py:1006] module.maybe_init_modular_kernel()
(EngineCore_DP1 pid=3909056) ERROR 02-12 17:42:49 [core.py:1006] File "/home/bnellnm/nm-vllm-new/vllm/model_executor/layers/fused_moe/layer.py", line 691, in maybe_init_modular_kernel
(EngineCore_DP1 pid=3909056) ERROR 02-12 17:42:49 [core.py:1006] self._replace_quant_method(
(EngineCore_DP1 pid=3909056) ERROR 02-12 17:42:49 [core.py:1006] File "/home/bnellnm/nm-vllm-new/vllm/model_executor/layers/fused_moe/layer.py", line 664, in _replace_quant_method
(EngineCore_DP1 pid=3909056) ERROR 02-12 17:42:49 [core.py:1006] self.quant_method = mk
(EngineCore_DP1 pid=3909056) ERROR 02-12 17:42:49 [core.py:1006] ^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=3909061) Process EngineCore_DP0:
(EngineCore_DP1 pid=3909056) ERROR 02-12 17:42:49 [core.py:1006] File "/home/bnellnm/venvs/nm-vllm-new/lib/python3.12/site-packages/torch/nn/modules/module.py", line 2018, in __setattr__
(EngineCore_DP1 pid=3909056) ERROR 02-12 17:42:49 [core.py:1006] raise TypeError(
(EngineCore_DP1 pid=3909056) ERROR 02-12 17:42:49 [core.py:1006] TypeError: cannot assign 'vllm.model_executor.layers.fused_moe.fused_moe_modular_method.FusedMoEModularMethod' as child modul\
e 'quant_method' (torch.nn.Module or None expected)
There are the types of errors you get if you remove CustomOp inheritance from FusedMoEModularMethod
There was a problem hiding this comment.
Thanks a lot. I will try it.
0786fd3 to
6d560c1
Compare
|
This pull request has merge conflicts that must be resolved before it can be |
|
|
||
| # --8<-- [start:transformers_fused_moe] | ||
| @CustomOp.register("transformers_fused_moe") | ||
| @PluggableLayer.register("transformers_fused_moe") |
There was a problem hiding this comment.
For the avoidance of doubt, this is a custom to allow us to accept topk_ids in FusedMoE.forward and have it reappear as the output of custom_routing_function when torch.compile/CUDA Graphs are enabled.
There was a problem hiding this comment.
I treat this class as PluggableLayer because it doesn't have different implementations for different in-tree platforms. Do you mean that this extra functionality of transformers_fused_moe needs to be compiled by torch through CustomOp.maybe_compile? @hmellor
There was a problem hiding this comment.
I was just annotating this particular change so that nobody thinks it can be removed entirely.
A quick way to check that it still works is to install transformers from main and run the following test in vLLM pytest tests/models/test_transformers.py -k olmoe -vsx. If this still passes, then the change from CustomOp to PluggableLayer is ok
There was a problem hiding this comment.
@hmellor I don't understand your comment either - CustomOp class doesn't affect compilation/Dynamo/cudagraphs. Are you talking about direct_register_custom_op?
There was a problem hiding this comment.
Oh I see. Yeah it sounds like I should have been referencing direct_register_custom_op.
TL;DR the Transformers modelling backend needs direct_register_custom_op to support compilation (torch and CUDA Graphs) with MoE models
|
@whx-sjtu can you resolve merge conflicts? |
|
@whx-sjtu CI should run once you push |
6d560c1 to
75130ac
Compare
| @CustomOp.register("modular_fused_moe") | ||
| class FusedMoEModularMethod(FusedMoEMethodBase, CustomOp): | ||
| class FusedMoEModularMethod(FusedMoEMethodBase, torch.nn.Module): |
There was a problem hiding this comment.
Can we just make FusedMoEModularMethod inheritated from nn.Module to solve the type error? cc @ProExpertProg @bnellnm
186bfd1 to
4c93836
Compare
|
@whx-sjtu docs build failure looks related: |
Ops. This is really weired and I couldn't reproduce this |
|
Hi @whx-sjtu, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
| # intrusive way to do this. | ||
| def _replace_quant_method(self, mk: FusedMoEMethodBase): | ||
| self.quant_method = mk | ||
| object.__setattr__(self, "quant_method", mk) |
There was a problem hiding this comment.
I decide to remove the inheritance of nn.Module and solve the original possible TypeError by this change.
There was a problem hiding this comment.
I'm fine with not inheriting from nn.Module but I'd be worried that bypassing torch using __setattr__ here might be asking for trouble? I don't know enough about torch to know what the issues are though.
732e558 to
ad87a1c
Compare
ad87a1c to
194243f
Compare
Signed-off-by: whx-sjtu <2952154980@qq.com>
Signed-off-by: whx-sjtu <2952154980@qq.com>
Signed-off-by: whx-sjtu <2952154980@qq.com>
Signed-off-by: whx-sjtu <2952154980@qq.com>
194243f to
c31134d
Compare
I found the root cause of the problem, and I explained the reason and provided a solution in this pull request: #35178 |
|
To ensure feasibility, I leave the |
…lm-project#33556) Signed-off-by: whx-sjtu <2952154980@qq.com>
…lm-project#33556) Signed-off-by: whx-sjtu <2952154980@qq.com> Signed-off-by: zengxian <xiangdong.zeng@intel.com>
…lm-project#33556) Signed-off-by: whx-sjtu <2952154980@qq.com>
Purpose
As a task in #32676, this PR applies PluggableLayer to moe-related layers, including fused_moe, modular_fused_moe and transformers_fused_moe.
Test Plan
All ci tests should pass.
Test Result
All ci tests should pass.
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.