Skip to content

[PluggableLayer][3/N] Apply PluggableLayer to moe-related layers.#33556

Merged
ProExpertProg merged 6 commits intovllm-project:mainfrom
whx-sjtu:apply_plug_moe
Apr 14, 2026
Merged

[PluggableLayer][3/N] Apply PluggableLayer to moe-related layers.#33556
ProExpertProg merged 6 commits intovllm-project:mainfrom
whx-sjtu:apply_plug_moe

Conversation

@whx-sjtu
Copy link
Copy Markdown
Contributor

@whx-sjtu whx-sjtu commented Feb 2, 2026

Purpose

As a task in #32676, this PR applies PluggableLayer to moe-related layers, including fused_moe, modular_fused_moe and transformers_fused_moe.

Test Plan

All ci tests should pass.

Test Result

All ci tests should pass.

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors several Mixture-of-Experts (MoE) related layers to use PluggableLayer instead of CustomOp. The changes are mostly straightforward replacements. However, I found a critical issue in FusedMoE where changing the base class removes the necessary forward method dispatching, which would lead to a runtime error. I've provided a suggestion to fix this by adding an explicit forward method. The other changes look correct.

Comment thread vllm/model_executor/layers/fused_moe/layer.py
@whx-sjtu whx-sjtu force-pushed the apply_plug_moe branch 2 times, most recently from a74c095 to 38fd266 Compare February 3, 2026 07:40
Comment on lines +24 to +25
@PluggableLayer.register("modular_fused_moe")
class FusedMoEModularMethod(FusedMoEMethodBase, PluggableLayer):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bnellnm why is this a CustomOp at all? It doesn't even define forward...

@whx-sjtu I think this is already pluggable via FusedMoEMethodBase and FusedMoEPrepareAndFinalize, I'm not sure this needs to be a PluggableLayer at all

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It needed to be a CustomOp for type correctness (since UnquantizedFusedMoEMethod is also a CustomOp). You end up with weird errors otherwise. If other classes have been converted to Pluggable ops then maybe this can be removed.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, because we return an "MoE method" which could be an instance of UnquantizedFusedMoEMethod or an instance of FusedMoEModularMethod or something else and CustomOp was the common base class? Could we just make nn.Module the shared super class?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, because we return an "MoE method" which could be an instance of UnquantizedFusedMoEMethod or an instance of FusedMoEModularMethod or something else and CustomOp was the common base class? Could we just make nn.Module the shared super class?

Feel free to give it a try. IIRC all FusedMoEMethodBase objects are already torch.nn.Modules though.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the core problem here is that UnquantizedFusedMoEMethod is a CustomOp, right? Maybe after vLLM IR is ready and UnquantizedFusedMoEMethod is not a CustomOp anymore, then we can just remove FusedMoEModularMethod's original inheritance from CustomOp.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will remove the inheritance from CustomOp and try to reproduce this type error.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bnellnm Could you please provide the scenario that will cause type errors? I just remove the inheritance from CustomOp and fail to reproduce type-related errors.

Copy link
Copy Markdown
Collaborator

@bnellnm bnellnm Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running a MoE model with a non-naive all2all backend should trigger the problem.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(EngineCore_DP1 pid=3909056) ERROR 02-12 17:42:49 [core.py:1006]     module.maybe_init_modular_kernel()
(EngineCore_DP1 pid=3909056) ERROR 02-12 17:42:49 [core.py:1006]   File "/home/bnellnm/nm-vllm-new/vllm/model_executor/layers/fused_moe/layer.py", line 691, in maybe_init_modular_kernel
(EngineCore_DP1 pid=3909056) ERROR 02-12 17:42:49 [core.py:1006]     self._replace_quant_method(
(EngineCore_DP1 pid=3909056) ERROR 02-12 17:42:49 [core.py:1006]   File "/home/bnellnm/nm-vllm-new/vllm/model_executor/layers/fused_moe/layer.py", line 664, in _replace_quant_method
(EngineCore_DP1 pid=3909056) ERROR 02-12 17:42:49 [core.py:1006]     self.quant_method = mk
(EngineCore_DP1 pid=3909056) ERROR 02-12 17:42:49 [core.py:1006]     ^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=3909061) Process EngineCore_DP0:
(EngineCore_DP1 pid=3909056) ERROR 02-12 17:42:49 [core.py:1006]   File "/home/bnellnm/venvs/nm-vllm-new/lib/python3.12/site-packages/torch/nn/modules/module.py", line 2018, in __setattr__
(EngineCore_DP1 pid=3909056) ERROR 02-12 17:42:49 [core.py:1006]     raise TypeError(
(EngineCore_DP1 pid=3909056) ERROR 02-12 17:42:49 [core.py:1006] TypeError: cannot assign 'vllm.model_executor.layers.fused_moe.fused_moe_modular_method.FusedMoEModularMethod' as child modul\
e 'quant_method' (torch.nn.Module or None expected)

There are the types of errors you get if you remove CustomOp inheritance from FusedMoEModularMethod

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot. I will try it.

@whx-sjtu whx-sjtu force-pushed the apply_plug_moe branch 2 times, most recently from 0786fd3 to 6d560c1 Compare February 5, 2026 06:42
@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Feb 11, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @whx-sjtu.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Feb 11, 2026

# --8<-- [start:transformers_fused_moe]
@CustomOp.register("transformers_fused_moe")
@PluggableLayer.register("transformers_fused_moe")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the avoidance of doubt, this is a custom to allow us to accept topk_ids in FusedMoE.forward and have it reappear as the output of custom_routing_function when torch.compile/CUDA Graphs are enabled.

Copy link
Copy Markdown
Contributor Author

@whx-sjtu whx-sjtu Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I treat this class as PluggableLayer because it doesn't have different implementations for different in-tree platforms. Do you mean that this extra functionality of transformers_fused_moe needs to be compiled by torch through CustomOp.maybe_compile? @hmellor

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was just annotating this particular change so that nobody thinks it can be removed entirely.

A quick way to check that it still works is to install transformers from main and run the following test in vLLM pytest tests/models/test_transformers.py -k olmoe -vsx. If this still passes, then the change from CustomOp to PluggableLayer is ok

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hmellor I don't understand your comment either - CustomOp class doesn't affect compilation/Dynamo/cudagraphs. Are you talking about direct_register_custom_op?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see. Yeah it sounds like I should have been referencing direct_register_custom_op.

TL;DR the Transformers modelling backend needs direct_register_custom_op to support compilation (torch and CUDA Graphs) with MoE models

@ProExpertProg
Copy link
Copy Markdown
Collaborator

@whx-sjtu can you resolve merge conflicts?

@ProExpertProg ProExpertProg added the ready ONLY add when PR is ready to merge/full CI is needed label Feb 26, 2026
@ProExpertProg
Copy link
Copy Markdown
Collaborator

@whx-sjtu CI should run once you push

Comment on lines +24 to +23
@CustomOp.register("modular_fused_moe")
class FusedMoEModularMethod(FusedMoEMethodBase, CustomOp):
class FusedMoEModularMethod(FusedMoEMethodBase, torch.nn.Module):
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just make FusedMoEModularMethod inheritated from nn.Module to solve the type error? cc @ProExpertProg @bnellnm

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine with me.

whx-sjtu

This comment was marked as spam.

@whx-sjtu whx-sjtu force-pushed the apply_plug_moe branch 3 times, most recently from 186bfd1 to 4c93836 Compare March 2, 2026 03:14
@ProExpertProg
Copy link
Copy Markdown
Collaborator

@whx-sjtu docs build failure looks related:

File "/home/docs/checkouts/readthedocs.org/user_builds/vllm/checkouts/33556/vllm/model_executor/layers/fused_moe/fused_moe_modular_method.py", line 23, in <module>
--
  | class FusedMoEModularMethod(FusedMoEMethodBase, torch.nn.Module):
  | TypeError: metaclass conflict: the metaclass of a derived class must be a (non-strict) subclass of the metaclasses of all its bases

@whx-sjtu
Copy link
Copy Markdown
Contributor Author

whx-sjtu commented Mar 4, 2026

TypeError: metaclass conflict: the metaclass of a derived class must be a (non-strict) subclass of the metaclasses of all its bases

Ops. This is really weired and I couldn't reproduce this TypeError locally. If this error do exist, the original version of inheriting from CustomOp should have the same problem because CustomOp is inherited from nn.Module. I will try to solve this problem by explictly specifying the metadata of FusedMoEModularMethod.

@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Mar 4, 2026

Hi @whx-sjtu, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?
mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

# intrusive way to do this.
def _replace_quant_method(self, mk: FusedMoEMethodBase):
self.quant_method = mk
object.__setattr__(self, "quant_method", mk)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decide to remove the inheritance of nn.Module and solve the original possible TypeError by this change.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with not inheriting from nn.Module but I'd be worried that bypassing torch using __setattr__ here might be asking for trouble? I don't know enough about torch to know what the issues are though.

@whx-sjtu whx-sjtu force-pushed the apply_plug_moe branch 2 times, most recently from 732e558 to ad87a1c Compare March 12, 2026 02:30
Signed-off-by: whx-sjtu <2952154980@qq.com>
Signed-off-by: whx-sjtu <2952154980@qq.com>
Signed-off-by: whx-sjtu <2952154980@qq.com>
Signed-off-by: whx-sjtu <2952154980@qq.com>
@wxsIcey
Copy link
Copy Markdown
Contributor

wxsIcey commented Mar 28, 2026

TypeError: metaclass conflict: the metaclass of a derived class must be a (non-strict) subclass of the metaclasses of all its bases

Ops. This is really weired and I couldn't reproduce this TypeError locally. If this error do exist, the original version of inheriting from CustomOp should have the same problem because CustomOp is inherited from nn.Module. I will try to solve this problem by explictly specifying the metadata of FusedMoEModularMethod.

I found the root cause of the problem, and I explained the reason and provided a solution in this pull request: #35178

@whx-sjtu
Copy link
Copy Markdown
Contributor Author

whx-sjtu commented Apr 7, 2026

To ensure feasibility, I leave the FusedMoEModularMethod untouched in this PR. Related issue should be solved in #35178 by @wxsIcey. cc @ProExpertProg @bnellnm

@ProExpertProg ProExpertProg merged commit f02b326 into vllm-project:main Apr 14, 2026
69 checks passed
Chinmay-Kulkarni-AMD pushed a commit to Chinmay-Kulkarni-AMD/vllm that referenced this pull request Apr 15, 2026
zxd1997066 pushed a commit to zxd1997066/vllm that referenced this pull request Apr 15, 2026
…lm-project#33556)

Signed-off-by: whx-sjtu <2952154980@qq.com>
Signed-off-by: zengxian <xiangdong.zeng@intel.com>
askliar pushed a commit to askliar/vllm that referenced this pull request Apr 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants