[MoE Refactor] Separate Router into OO Classes#30623
[MoE Refactor] Separate Router into OO Classes#30623robertgshaw2-redhat merged 40 commits intovllm-project:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a significant and well-executed refactoring of the Mixture of Experts (MoE) routing logic. By extracting the routing functionality into separate FusedMoERouter classes, the code is now more modular, extensible, and easier to maintain. The use of a factory pattern in create_fused_moe_router to select the appropriate routing strategy is a clean design choice. The changes have been consistently applied across the FusedMoE layer and various quantization methods. Overall, this is an excellent improvement to the codebase. I have one minor suggestion to improve the error messages for consistency.
|
This pull request has merge conflicts that must be resolved before it can be |
ec01222 to
b073a08
Compare
|
This pull request has merge conflicts that must be resolved before it can be |
9b3b8a3 to
1aa5d1f
Compare
|
This pull request has merge conflicts that must be resolved before it can be |
1aa5d1f to
a5a608c
Compare
|
I saw there are a lot of PR for router refactor. Which one are we intending to merge? |
0c44229 to
a6a6516
Compare
|
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
3c7c1ef to
18a2b27
Compare
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
| indices_type: torch.dtype | None, | ||
| ) -> tuple[torch.Tensor, torch.Tensor]: | ||
| """Compute routing using fused top-k with bias.""" | ||
| topk_weights, topk_ids = fused_topk_bias( |
There was a problem hiding this comment.
note: the indices_type is ignored here. In fact, in fused_topk_bias we always convert to int32. We could instead case to the indices_type
There was a problem hiding this comment.
Can I do this in the follow up?
| renormalize=self.renormalize, | ||
| ) | ||
|
|
||
| return topk_weights.to(torch.float32), topk_ids |
There was a problem hiding this comment.
ditto re: topk_ids type
robertgshaw2-redhat
left a comment
There was a problem hiding this comment.
thanks for the hard work on this!
327a02d
into
vllm-project:main
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Purpose
This PR is part of the effort to separate the expert selection code from
FusedMoEMethod.Move all the MoE router implementations into separate subclasses of
FusedMoERouter. Add a factory functioncreate_fused_moe_routerfor constructing the proper subclass ofFusedMoERouterbased on the input arguments. TheFusedMoElayer no longer holds onto the router parameters and only keeps a reference to the router object itself.See #28408
Test Plan
CI + additional MoE refactoring tests.
Added unit tests for different routing methods.
Hand tested the following models
cc @robertgshaw2-redhat , @mgoin , @zyongye
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.Note
Decouples expert selection from the MoE layer into focused router classes and a factory, simplifying logic and enabling easier extensions.
BaseRouterand concrete routers:FusedTopKRouter,FusedTopKBiasRouter,GroupedTopKRouter,CustomRoutingRouter, andRoutingSimulatorRouter; centralizes EPLB handling viaEplbLayerStateand a templateselect_expertsflowcreate_fused_moe_routerfactory; extendsRoutingMethodTypewithCustomandSimulatedFusedMoEto hold a router instance and delegate routing; removes inline routing code and stores per-layer EPLB state inEplbLayerStaterouter.select_expertsWritten by Cursor Bugbot for commit 08a8b15f808b3778225eae0bd9d256f4610aacf5. This will update automatically on new commits. Configure here.
Note
Cursor Bugbot is generating a summary for commit 847f554a042bd41c007126f36a349c4e24306656. Configure here.
Note
Separates routing logic from
FusedMoEinto focused router classes, simplifying the layer and enabling extensibility.BaseRouterand concrete routers:FusedTopKRouter,FusedTopKBiasRouter,GroupedTopKRouter,CustomRoutingRouter,RoutingSimulatorRouter; routing selection viacreate_fused_moe_routerEplbLayerStateand applies mapping in a commonselect_expertsflowRoutingMethodTypewithCustomandSimulated; minor enum value change forUnspecifiedFusedMoEto construct/store a router and remove inline routing logic; updates quant paths (e.g., quark) to callrouter.select_expertsWritten by Cursor Bugbot for commit 847f554a042bd41c007126f36a349c4e24306656. This will update automatically on new commits. Configure here.
Note
Separates expert selection from the MoE layer into focused, reusable routers and a factory, simplifying routing logic and enabling extensibility.
BaseRouter(templateselect_experts) and concrete routers:FusedTopKRouter,FusedTopKBiasRouter,GroupedTopKRouter,CustomRoutingRouter,RoutingSimulatorRoutercreate_fused_moe_routerto select router (priority: simulator → grouped → bias → custom → default)EplbLayerState; common EPLB validation/mapping now handled inBaseRouterFusedMoEto construct/store a router and remove inline routing; router method type exposed by each routerRoutingMethodTypewithCustomandSimulated(and updatesUnspecifiedvalue)router.select_expertstests/kernels/moe/test_routing.py,test_routing_simulator.py)Written by Cursor Bugbot for commit ccc4b82e1e6c730e0ff10600051fd11e52c241b2. This will update automatically on new commits. Configure here.
Note
Decouples expert selection from
FusedMoEinto dedicated router classes, simplifying the layer and enabling extensibility.BaseRouterand concrete routers:FusedTopKRouter,FusedTopKBiasRouter,GroupedTopKRouter,CustomRoutingRouter,RoutingSimulatorRouter; addscreate_fused_moe_routerfactoryEplbLayerState; common EPLB validation/mapping handled inBaseRouter.select_expertsFusedMoEto construct/store a router and remove inline routing; quant paths (e.g., quark) now callrouter.select_expertsRoutingMethodTypewithCustomandSimulated(and adjustsUnspecifiedvalue)FusedMoEper strategyWritten by Cursor Bugbot for commit be2d50332238b352db7d926d48a4c571f342a62f. This will update automatically on new commits. Configure here.
Note
Separates routing logic into focused components and simplifies FusedMoE while preserving behavior and enabling extensibility.
BaseRouterwith commonselect_expertsflow (EPLB validation/mapping, dtype handling); adds routers:FusedTopKRouter,FusedTopKBiasRouter,GroupedTopKRouter,CustomRoutingRouter,RoutingSimulatorRoutercreate_fused_moe_routerfactory (priority: simulator → grouped → bias → custom → default)EplbLayerStateto hold per-layer EPLB tensors;FusedMoEnow constructs/stores a router and delegates routing; removes inline routing codeRoutingMethodTypewithCustomandSimulated(adjustsUnspecified), and integrates grouped/bias/custom paths via routersrouter.select_expertsFusedMoEper strategyWritten by Cursor Bugbot for commit 95f4d744a06eabb06f817b148ec75666ea0a21e7. This will update automatically on new commits. Configure here.
Note
Cursor Bugbot is generating a summary for commit b49b3b7a83de911644821c6fa4414805c5a53135. Configure here.
Note
Simplifies and modularizes MoE routing while preserving behavior and enabling extensibility.
BaseRouterand concrete routers:FusedTopKRouter,FusedTopKBiasRouter,GroupedTopKRouter,CustomRoutingRouter,RoutingSimulatorRouter; addscreate_fused_moe_routerfor selectionEplbLayerState; common EPLB validation/mapping and dtype handling moved intoBaseRouter.select_expertsFusedMoEto construct/store a router and remove inline routing logic; updates quant paths (e.g., quark) to callrouter.select_expertsRoutingMethodTypewithCustomandSimulated(adjustsUnspecified); keeps grouped/bias/custom behavior via dedicated routersFusedMoEper strategyWritten by Cursor Bugbot for commit 805aeecd6b44e490eaf6456fed1ca9d44273db3f. This will update automatically on new commits. Configure here.
Note
Modularizes MoE routing and simplifies
FusedMoEby delegating expert selection to dedicated routers.BaseRouter(templateselect_experts) and concrete routers:FusedTopKRouter,FusedTopKBiasRouter,GroupedTopKRouter,CustomRoutingRouter,RoutingSimulatorRoutercreate_fused_moe_routerfor router selection (env-sim → grouped → bias → custom → default)EplbLayerStateto hold per-layer EPLB tensors; common EPLB validation/mapping moved intoBaseRouterFusedMoEto construct/store arouterand remove inline routing; quant paths (e.g., quark) callrouter.select_expertsRoutingMethodTypewithCustomandSimulated(adjustsUnspecifiedvalue)FusedMoEper strategyWritten by Cursor Bugbot for commit 7af1cc865df8e78b560bab574f218487f4375a88. This will update automatically on new commits. Configure here.
Note
Decouples expert selection from
FusedMoEinto modular routers and a factory; consolidates EPLB and simplifies the layer.BaseRouterand concrete routers:FusedTopKRouter,FusedTopKBiasRouter,GroupedTopKRouter,CustomRoutingRouter,RoutingSimulatorRouter; introducescreate_fused_moe_routerEplbLayerStateand common EPLB validation/mapping inBaseRouter.select_experts;FusedMoEnow constructs/stores a router and removes inline routing logicfused_topk→fused_topk_router.py, grouped top-k logic →grouped_topk_router.py; extendsRoutingMethodTypewithCustomandSimulated(adjustsUnspecified)router.select_expertstests/kernels/moe/test_routing.py, refactors simulator torouting_simulator_router, and fixes imports across MoE testsWritten by Cursor Bugbot for commit 3f436f5df21299674bb17ebb5ce75bdcfdc02756. This will update automatically on new commits. Configure here.
Note
Modularizes MoE routing and simplifies
FusedMoEby delegating expert selection to router classes.BaseRouterplusFusedTopKRouter,FusedTopKBiasRouter,GroupedTopKRouter,CustomRoutingRouter, andRoutingSimulatorRouter; exposescreate_fused_moe_routerEplbLayerStateand moves EPLB validation/mapping into commonselect_experts; updatesFusedMoEto store a router and remove inline routingfused_topk→fused_topk_router.py, grouped-topk →grouped_topk_router.py; updates__init__exportsRoutingMethodTypewithCustomandSimulated(adjustsUnspecified); updates quantization path (e.g., quark) to callrouter.select_expertstests/kernels/moe/test_routing.py, refactors simulator test and imports across MoE testsWritten by Cursor Bugbot for commit 3f436f5df21299674bb17ebb5ce75bdcfdc02756. This will update automatically on new commits. Configure here.
Note
Modularizes MoE routing and simplifies
FusedMoEby delegating expert selection to reusable router classes.BaseRouterplus concrete routers:FusedTopKRouter,FusedTopKBiasRouter,GroupedTopKRouter,CustomRoutingRouter,RoutingSimulatorRouter; addscreate_fused_moe_routerEplbLayerState; common EPLB validation/mapping moved toBaseRouter.select_expertsfused_topk_router.py,grouped_topk_router.py) and updates exports/importsRoutingMethodTypewithCustomandSimulated(adjustsUnspecified); updates quant path (e.g., quark) to callrouter.select_expertsFusedMoEto construct/store a router and remove inline routing logictests/kernels/moe/test_routing.py, refactors simulator torouting_simulator_router, fixes imports across MoE testsWritten by Cursor Bugbot for commit 3f436f5df21299674bb17ebb5ce75bdcfdc02756. This will update automatically on new commits. Configure here.
Note
Separates expert routing from
FusedMoEinto reusable components and unifies EPLB handling.BaseRouterand concrete routers (FusedTopKRouter,FusedTopKBiasRouter,GroupedTopKRouter,CustomRoutingRouter,RoutingSimulatorRouter) withcreate_fused_moe_routerfused_topkand grouped-topk logic into dedicated modules; updates__init__exports and test importsEplbLayerState;BaseRouter.select_expertsvalidates/applies EPLB mapping and dtype conversionFusedMoEto construct/store a router and remove inline routing code; exposes router method typeRoutingMethodTypewithCustomandSimulated(adjustsUnspecifiedvalue)router.select_expertstests/kernels/moe/test_routing.py, refactors simulator torouting_simulator_router, and adjusts MoE tests to new importsWritten by Cursor Bugbot for commit 3f436f5df21299674bb17ebb5ce75bdcfdc02756. This will update automatically on new commits. Configure here.
Note
Modularizes MoE routing and simplifies
FusedMoEby delegating expert selection to dedicated router classes with a factory, while centralizing EPLB handling.BaseRouterand routers:FusedTopKRouter,FusedTopKBiasRouter,GroupedTopKRouter,CustomRoutingRouter,RoutingSimulatorRouter; addscreate_fused_moe_routerfused_topk_router.pyandgrouped_topk_router.py; updates__init__exports and test importsEplbLayerStateand common EPLB validation/mapping in router flow;FusedMoEnow stores a router and removes inline routingRoutingMethodTypewithCustomandSimulated; simulator now viarouting_simulator_routerrouter.select_expertsWritten by Cursor Bugbot for commit 086754d1f862ac0b861024f3aad58961b50f1f14. This will update automatically on new commits. Configure here.
Note
Modularizes MoE routing and simplifies
FusedMoEby delegating expert selection to dedicated routers; consolidates EPLB handling.BaseRouterand routers:FusedTopKRouter,FusedTopKBiasRouter,GroupedTopKRouter,CustomRoutingRouter,RoutingSimulatorRouter; newcreate_fused_moe_routerfused_topk→fused_topk_router.py, grouped-topk logic →grouped_topk_router.py; updates__init__exportsEplbLayerStateand applies EPLB validation/mapping/dtype in commonselect_expertsflowFusedMoEto construct/store a router and remove inline routing; router exposesrouting_method_typeRoutingMethodTypewithCustomandSimulated(adjustsUnspecifiedvalue)quark_moe) to callrouter.select_expertsWritten by Cursor Bugbot for commit 7273e06aec069d90aa2945e72b18a964a3a54500. This will update automatically on new commits. Configure here.
Note
Simplifies and modularizes MoE routing, improving extensibility and testability.
BaseRouterand concrete routers:FusedTopKRouter,FusedTopKBiasRouter,GroupedTopKRouter,CustomRoutingRouter,RoutingSimulatorRouter; addscreate_fused_moe_routerfused_topk→fused_topk_router.py, grouped-topk →grouped_topk_router.py; updates__init__exports and test importsEplbLayerStateand unifies EPLB validation/mapping/dtype inBaseRouter.select_experts;FusedMoEnow stores a router and delegates routingRoutingMethodTypewithCustomandSimulated; simulator now viarouting_simulator_routerrouter.select_expertsandeplb_statetests/kernels/moe/test_routing.pyand refactors simulator test; broad test import fixes forfused_topkWritten by Cursor Bugbot for commit 4699a3cb0603a6b9597ec25879d65ca5b36c506a. This will update automatically on new commits. Configure here.