[PluggableLayer][3/N] Apply PluggableLayer to moe-related layers. by whx-sjtu · Pull Request #33556 · vllm-project/vllm

whx-sjtu · 2026-02-02T09:37:01Z

Purpose

As a task in #32676, this PR applies PluggableLayer to moe-related layers, including fused_moe, modular_fused_moe and transformers_fused_moe.

Test Plan

All ci tests should pass.

Test Result

All ci tests should pass.

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

gemini-code-assist

Code Review

This pull request refactors several Mixture-of-Experts (MoE) related layers to use PluggableLayer instead of CustomOp. The changes are mostly straightforward replacements. However, I found a critical issue in FusedMoE where changing the base class removes the necessary forward method dispatching, which would lead to a runtime error. I've provided a suggestion to fix this by adding an explicit forward method. The other changes look correct.

ProExpertProg · 2026-02-03T16:24:14Z

+@PluggableLayer.register("modular_fused_moe")
+class FusedMoEModularMethod(FusedMoEMethodBase, PluggableLayer):


@bnellnm why is this a CustomOp at all? It doesn't even define forward...

@whx-sjtu I think this is already pluggable via FusedMoEMethodBase and FusedMoEPrepareAndFinalize, I'm not sure this needs to be a PluggableLayer at all

It needed to be a CustomOp for type correctness (since UnquantizedFusedMoEMethod is also a CustomOp). You end up with weird errors otherwise. If other classes have been converted to Pluggable ops then maybe this can be removed.

I see, because we return an "MoE method" which could be an instance of UnquantizedFusedMoEMethod or an instance of FusedMoEModularMethod or something else and CustomOp was the common base class? Could we just make nn.Module the shared super class?

I see, because we return an "MoE method" which could be an instance of UnquantizedFusedMoEMethod or an instance of FusedMoEModularMethod or something else and CustomOp was the common base class? Could we just make nn.Module the shared super class?

Feel free to give it a try. IIRC all FusedMoEMethodBase objects are already torch.nn.Modules though.

So the core problem here is that UnquantizedFusedMoEMethod is a CustomOp, right? Maybe after vLLM IR is ready and UnquantizedFusedMoEMethod is not a CustomOp anymore, then we can just remove FusedMoEModularMethod's original inheritance from CustomOp.

I will remove the inheritance from CustomOp and try to reproduce this type error.

@bnellnm Could you please provide the scenario that will cause type errors? I just remove the inheritance from CustomOp and fail to reproduce type-related errors.

Running a MoE model with a non-naive all2all backend should trigger the problem.

(EngineCore_DP1 pid=3909056) ERROR 02-12 17:42:49 [core.py:1006] module.maybe_init_modular_kernel() (EngineCore_DP1 pid=3909056) ERROR 02-12 17:42:49 [core.py:1006] File "/home/bnellnm/nm-vllm-new/vllm/model_executor/layers/fused_moe/layer.py", line 691, in maybe_init_modular_kernel (EngineCore_DP1 pid=3909056) ERROR 02-12 17:42:49 [core.py:1006] self._replace_quant_method( (EngineCore_DP1 pid=3909056) ERROR 02-12 17:42:49 [core.py:1006] File "/home/bnellnm/nm-vllm-new/vllm/model_executor/layers/fused_moe/layer.py", line 664, in _replace_quant_method (EngineCore_DP1 pid=3909056) ERROR 02-12 17:42:49 [core.py:1006] self.quant_method = mk (EngineCore_DP1 pid=3909056) ERROR 02-12 17:42:49 [core.py:1006] ^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=3909061) Process EngineCore_DP0: (EngineCore_DP1 pid=3909056) ERROR 02-12 17:42:49 [core.py:1006] File "/home/bnellnm/venvs/nm-vllm-new/lib/python3.12/site-packages/torch/nn/modules/module.py", line 2018, in __setattr__ (EngineCore_DP1 pid=3909056) ERROR 02-12 17:42:49 [core.py:1006] raise TypeError( (EngineCore_DP1 pid=3909056) ERROR 02-12 17:42:49 [core.py:1006] TypeError: cannot assign 'vllm.model_executor.layers.fused_moe.fused_moe_modular_method.FusedMoEModularMethod' as child modul\ e 'quant_method' (torch.nn.Module or None expected)

There are the types of errors you get if you remove CustomOp inheritance from FusedMoEModularMethod

Thanks a lot. I will try it.

mergify · 2026-02-11T08:53:20Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @whx-sjtu.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

hmellor · 2026-02-11T09:48:58Z


 # --8<-- [start:transformers_fused_moe]
-@CustomOp.register("transformers_fused_moe")
+@PluggableLayer.register("transformers_fused_moe")


For the avoidance of doubt, this is a custom to allow us to accept topk_ids in FusedMoE.forward and have it reappear as the output of custom_routing_function when torch.compile/CUDA Graphs are enabled.

I treat this class as PluggableLayer because it doesn't have different implementations for different in-tree platforms. Do you mean that this extra functionality of transformers_fused_moe needs to be compiled by torch through CustomOp.maybe_compile? @hmellor

I was just annotating this particular change so that nobody thinks it can be removed entirely.

A quick way to check that it still works is to install transformers from main and run the following test in vLLM pytest tests/models/test_transformers.py -k olmoe -vsx. If this still passes, then the change from CustomOp to PluggableLayer is ok

@hmellor I don't understand your comment either - CustomOp class doesn't affect compilation/Dynamo/cudagraphs. Are you talking about direct_register_custom_op?

Oh I see. Yeah it sounds like I should have been referencing direct_register_custom_op.

TL;DR the Transformers modelling backend needs direct_register_custom_op to support compilation (torch and CUDA Graphs) with MoE models

ProExpertProg · 2026-02-26T22:49:08Z

@whx-sjtu can you resolve merge conflicts?

ProExpertProg · 2026-02-26T22:49:48Z

@whx-sjtu CI should run once you push

whx-sjtu · 2026-02-27T08:53:21Z

-@CustomOp.register("modular_fused_moe")
-class FusedMoEModularMethod(FusedMoEMethodBase, CustomOp):
+class FusedMoEModularMethod(FusedMoEMethodBase, torch.nn.Module):


Can we just make FusedMoEModularMethod inheritated from nn.Module to solve the type error? cc @ProExpertProg @bnellnm

This is fine with me.

ProExpertProg · 2026-03-02T20:03:10Z

@whx-sjtu docs build failure looks related:

File "/home/docs/checkouts/readthedocs.org/user_builds/vllm/checkouts/33556/vllm/model_executor/layers/fused_moe/fused_moe_modular_method.py", line 23, in <module>
--
  | class FusedMoEModularMethod(FusedMoEMethodBase, torch.nn.Module):
  | TypeError: metaclass conflict: the metaclass of a derived class must be a (non-strict) subclass of the metaclasses of all its bases

whx-sjtu · 2026-03-04T03:20:17Z

TypeError: metaclass conflict: the metaclass of a derived class must be a (non-strict) subclass of the metaclasses of all its bases

Ops. This is really weired and I couldn't reproduce this TypeError locally. If this error do exist, the original version of inheriting from CustomOp should have the same problem because CustomOp is inherited from nn.Module. I will try to solve this problem by explictly specifying the metadata of FusedMoEModularMethod.

mergify · 2026-03-04T04:37:04Z

Hi @whx-sjtu, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?

mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

whx-sjtu · 2026-03-04T07:31:00Z

    # intrusive way to do this.
    def _replace_quant_method(self, mk: FusedMoEMethodBase):
-        self.quant_method = mk
+        object.__setattr__(self, "quant_method", mk)


I decide to remove the inheritance of nn.Module and solve the original possible TypeError by this change.

I'm fine with not inheriting from nn.Module but I'd be worried that bypassing torch using __setattr__ here might be asking for trouble? I don't know enough about torch to know what the issues are though.

Signed-off-by: whx-sjtu <2952154980@qq.com>

wxsIcey · 2026-03-28T01:49:10Z

TypeError: metaclass conflict: the metaclass of a derived class must be a (non-strict) subclass of the metaclasses of all its bases

Ops. This is really weired and I couldn't reproduce this TypeError locally. If this error do exist, the original version of inheriting from CustomOp should have the same problem because CustomOp is inherited from nn.Module. I will try to solve this problem by explictly specifying the metadata of FusedMoEModularMethod.

I found the root cause of the problem, and I explained the reason and provided a solution in this pull request: #35178

whx-sjtu · 2026-04-07T06:42:18Z

To ensure feasibility, I leave the FusedMoEModularMethod untouched in this PR. Related issue should be solved in #35178 by @wxsIcey. cc @ProExpertProg @bnellnm

…lm-project#33556) Signed-off-by: whx-sjtu <2952154980@qq.com>

…lm-project#33556) Signed-off-by: whx-sjtu <2952154980@qq.com> Signed-off-by: zengxian <xiangdong.zeng@intel.com>

…lm-project#33556) Signed-off-by: whx-sjtu <2952154980@qq.com>

whx-sjtu requested review from hmellor, mgoin and pavanimajety as code owners February 2, 2026 09:37

gemini-code-assist bot reviewed Feb 2, 2026

View reviewed changes

Comment thread vllm/model_executor/layers/fused_moe/layer.py

whx-sjtu force-pushed the apply_plug_moe branch 2 times, most recently from a74c095 to 38fd266 Compare February 3, 2026 07:40

ProExpertProg reviewed Feb 3, 2026

View reviewed changes

whx-sjtu force-pushed the apply_plug_moe branch 2 times, most recently from 0786fd3 to 6d560c1 Compare February 5, 2026 06:42

mergify bot added the needs-rebase label Feb 11, 2026

hmellor reviewed Feb 11, 2026

View reviewed changes

ProExpertProg approved these changes Feb 26, 2026

View reviewed changes

ProExpertProg added the ready ONLY add when PR is ready to merge/full CI is needed label Feb 26, 2026

whx-sjtu force-pushed the apply_plug_moe branch from 6d560c1 to 75130ac Compare February 27, 2026 08:50

mergify bot removed the needs-rebase label Feb 27, 2026

whx-sjtu commented Feb 27, 2026

View reviewed changes

This comment was marked as spam.

Sign in to view

whx-sjtu force-pushed the apply_plug_moe branch 3 times, most recently from 186bfd1 to 4c93836 Compare March 2, 2026 03:14

whx-sjtu force-pushed the apply_plug_moe branch from 4c93836 to 1b62608 Compare March 4, 2026 03:28

whx-sjtu force-pushed the apply_plug_moe branch from bde4ca6 to 9c84cd7 Compare March 4, 2026 06:14

whx-sjtu mentioned this pull request Mar 4, 2026

[Tracker]: Apply PluggableLayer and vLLM IR to replace current CustomOp #32676

Open

15 tasks

whx-sjtu commented Mar 4, 2026

View reviewed changes

whx-sjtu force-pushed the apply_plug_moe branch 2 times, most recently from 732e558 to ad87a1c Compare March 12, 2026 02:30

whx-sjtu force-pushed the apply_plug_moe branch from ad87a1c to 194243f Compare March 23, 2026 14:37

whx-sjtu added 4 commits March 24, 2026 00:38

apply pluggablelayer to moe-related layers

20d6348

Signed-off-by: whx-sjtu <2952154980@qq.com>

add metaclass for FusedMoEModularMethod

61b9c9c

Signed-off-by: whx-sjtu <2952154980@qq.com>

remove inheritance from nn.Module

abfacac

Signed-off-by: whx-sjtu <2952154980@qq.com>

revert FusedMoEModularMethod change

c31134d

Signed-off-by: whx-sjtu <2952154980@qq.com>

whx-sjtu force-pushed the apply_plug_moe branch from 194243f to c31134d Compare March 23, 2026 16:38

whx-sjtu added 2 commits April 9, 2026 16:37

Merge branch 'main' into apply_plug_moe

cd0c8d1

Merge branch 'main' into apply_plug_moe

72f7403

ProExpertProg merged commit f02b326 into vllm-project:main Apr 14, 2026
69 checks passed

Chinmay-Kulkarni-AMD pushed a commit to Chinmay-Kulkarni-AMD/vllm that referenced this pull request Apr 15, 2026

[PluggableLayer][3/N] Apply PluggableLayer to moe-related layers. (vl…

acad878

…lm-project#33556) Signed-off-by: whx-sjtu <2952154980@qq.com>

zxd1997066 pushed a commit to zxd1997066/vllm that referenced this pull request Apr 15, 2026

[PluggableLayer][3/N] Apply PluggableLayer to moe-related layers. (vl…

b22280c

…lm-project#33556) Signed-off-by: whx-sjtu <2952154980@qq.com> Signed-off-by: zengxian <xiangdong.zeng@intel.com>

askliar pushed a commit to askliar/vllm that referenced this pull request Apr 15, 2026

[PluggableLayer][3/N] Apply PluggableLayer to moe-related layers. (vl…

2154e9a

…lm-project#33556) Signed-off-by: whx-sjtu <2952154980@qq.com>

		@PluggableLayer.register("modular_fused_moe")
		class FusedMoEModularMethod(FusedMoEMethodBase, PluggableLayer):

Uh oh!

Conversation

whx-sjtu commented Feb 2, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

All ci tests should pass.

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bnellnm Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mergify bot commented Feb 11, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

whx-sjtu Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ProExpertProg commented Feb 26, 2026

Uh oh!

ProExpertProg commented Feb 26, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

This comment was marked as spam.

Uh oh!

ProExpertProg commented Mar 2, 2026

Uh oh!

whx-sjtu commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mergify bot commented Mar 4, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wxsIcey commented Mar 28, 2026

Uh oh!

whx-sjtu commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

whx-sjtu commented Feb 2, 2026 •

edited by github-actions bot

Loading

bnellnm Feb 11, 2026 •

edited

Loading

whx-sjtu Feb 13, 2026 •

edited

Loading

whx-sjtu commented Mar 4, 2026 •

edited

Loading

whx-sjtu commented Apr 7, 2026 •

edited

Loading