Skip to content

[MoE Refactor] Separate Router into OO Classes#30623

Merged
robertgshaw2-redhat merged 40 commits intovllm-project:mainfrom
neuralmagic:fused-moe-router-3
Jan 18, 2026
Merged

[MoE Refactor] Separate Router into OO Classes#30623
robertgshaw2-redhat merged 40 commits intovllm-project:mainfrom
neuralmagic:fused-moe-router-3

Conversation

@bnellnm
Copy link
Collaborator

@bnellnm bnellnm commented Dec 13, 2025

Purpose

This PR is part of the effort to separate the expert selection code from FusedMoEMethod.

Move all the MoE router implementations into separate subclasses of FusedMoERouter. Add a factory function create_fused_moe_router for constructing the proper subclass of FusedMoERouter based on the input arguments. The FusedMoE layer no longer holds onto the router parameters and only keeps a reference to the router object itself.

See #28408

Test Plan

CI + additional MoE refactoring tests.
Added unit tests for different routing methods.

Hand tested the following models

  • baidu/ERNIE-4.5-21B-A3B-PT
  • baidu/ERNIE-4.5-VL-28B-A3B-PT
  • microsoft/Phi-3.5-MoE-instruct

cc @robertgshaw2-redhat , @mgoin , @zyongye

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Note

Decouples expert selection from the MoE layer into focused router classes and a factory, simplifying logic and enabling easier extensions.

  • Introduces BaseRouter and concrete routers: FusedTopKRouter, FusedTopKBiasRouter, GroupedTopKRouter, CustomRoutingRouter, and RoutingSimulatorRouter; centralizes EPLB handling via EplbLayerState and a template select_experts flow
  • Adds create_fused_moe_router factory; extends RoutingMethodType with Custom and Simulated
  • Refactors FusedMoE to hold a router instance and delegate routing; removes inline routing code and stores per-layer EPLB state in EplbLayerState
  • Adjusts quantization path (e.g., quark) to call router.select_experts
  • Adds tests for fused top-k, bias-corrected, grouped top-k, custom (Llama4), and simulator-based routing

Written by Cursor Bugbot for commit 08a8b15f808b3778225eae0bd9d256f4610aacf5. This will update automatically on new commits. Configure here.


Note

Cursor Bugbot is generating a summary for commit 847f554a042bd41c007126f36a349c4e24306656. Configure here.


Note

Separates routing logic from FusedMoE into focused router classes, simplifying the layer and enabling extensibility.

  • Introduces BaseRouter and concrete routers: FusedTopKRouter, FusedTopKBiasRouter, GroupedTopKRouter, CustomRoutingRouter, RoutingSimulatorRouter; routing selection via create_fused_moe_router
  • Centralizes per-layer EPLB data with new EplbLayerState and applies mapping in a common select_experts flow
  • Extends RoutingMethodType with Custom and Simulated; minor enum value change for Unspecified
  • Refactors FusedMoE to construct/store a router and remove inline routing logic; updates quant paths (e.g., quark) to call router.select_experts
  • Adds comprehensive CUDA-based tests for fused top-k, bias-corrected, grouped top-k, custom (Llama4), and simulator-driven routing

Written by Cursor Bugbot for commit 847f554a042bd41c007126f36a349c4e24306656. This will update automatically on new commits. Configure here.


Note

Separates expert selection from the MoE layer into focused, reusable routers and a factory, simplifying routing logic and enabling extensibility.

  • Introduces BaseRouter (template select_experts) and concrete routers: FusedTopKRouter, FusedTopKBiasRouter, GroupedTopKRouter, CustomRoutingRouter, RoutingSimulatorRouter
  • Adds create_fused_moe_router to select router (priority: simulator → grouped → bias → custom → default)
  • Centralizes EPLB runtime data via EplbLayerState; common EPLB validation/mapping now handled in BaseRouter
  • Refactors FusedMoE to construct/store a router and remove inline routing; router method type exposed by each router
  • Extends RoutingMethodType with Custom and Simulated (and updates Unspecified value)
  • Updates quantization paths (e.g., quark) to call router.select_experts
  • Adds CUDA-based tests covering fused top-k, bias-corrected, grouped top-k, custom (Llama4), and simulator routing (tests/kernels/moe/test_routing.py, test_routing_simulator.py)

Written by Cursor Bugbot for commit ccc4b82e1e6c730e0ff10600051fd11e52c241b2. This will update automatically on new commits. Configure here.


Note

Decouples expert selection from FusedMoE into dedicated router classes, simplifying the layer and enabling extensibility.

  • Introduces BaseRouter and concrete routers: FusedTopKRouter, FusedTopKBiasRouter, GroupedTopKRouter, CustomRoutingRouter, RoutingSimulatorRouter; adds create_fused_moe_router factory
  • Centralizes EPLB runtime data via new EplbLayerState; common EPLB validation/mapping handled in BaseRouter.select_experts
  • Refactors FusedMoE to construct/store a router and remove inline routing; quant paths (e.g., quark) now call router.select_experts
  • Extends RoutingMethodType with Custom and Simulated (and adjusts Unspecified value)
  • Adds CUDA tests covering fused top-k, bias-corrected, grouped top-k, custom (Llama4), and simulator routing; updates simulator integration to build FusedMoE per strategy

Written by Cursor Bugbot for commit be2d50332238b352db7d926d48a4c571f342a62f. This will update automatically on new commits. Configure here.


Note

Separates routing logic into focused components and simplifies FusedMoE while preserving behavior and enabling extensibility.

  • Introduces BaseRouter with common select_experts flow (EPLB validation/mapping, dtype handling); adds routers: FusedTopKRouter, FusedTopKBiasRouter, GroupedTopKRouter, CustomRoutingRouter, RoutingSimulatorRouter
  • Adds create_fused_moe_router factory (priority: simulator → grouped → bias → custom → default)
  • New EplbLayerState to hold per-layer EPLB tensors; FusedMoE now constructs/stores a router and delegates routing; removes inline routing code
  • Extends RoutingMethodType with Custom and Simulated (adjusts Unspecified), and integrates grouped/bias/custom paths via routers
  • Updates quantization path (e.g., quark) to call router.select_experts
  • Adds CUDA tests covering fused top-k, bias-corrected, grouped top-k, custom (Llama4), and simulator routing; refactors simulator test to instantiate FusedMoE per strategy

Written by Cursor Bugbot for commit 95f4d744a06eabb06f817b148ec75666ea0a21e7. This will update automatically on new commits. Configure here.


Note

Cursor Bugbot is generating a summary for commit b49b3b7a83de911644821c6fa4414805c5a53135. Configure here.


Note

Simplifies and modularizes MoE routing while preserving behavior and enabling extensibility.

  • Introduces BaseRouter and concrete routers: FusedTopKRouter, FusedTopKBiasRouter, GroupedTopKRouter, CustomRoutingRouter, RoutingSimulatorRouter; adds create_fused_moe_router for selection
  • Centralizes runtime EPLB tensors via new EplbLayerState; common EPLB validation/mapping and dtype handling moved into BaseRouter.select_experts
  • Refactors FusedMoE to construct/store a router and remove inline routing logic; updates quant paths (e.g., quark) to call router.select_experts
  • Extends RoutingMethodType with Custom and Simulated (adjusts Unspecified); keeps grouped/bias/custom behavior via dedicated routers
  • Adds CUDA tests: fused top-k, bias-corrected, grouped top-k, custom (Llama4), and simulator; updates simulator integration to build FusedMoE per strategy

Written by Cursor Bugbot for commit 805aeecd6b44e490eaf6456fed1ca9d44273db3f. This will update automatically on new commits. Configure here.


Note

Modularizes MoE routing and simplifies FusedMoE by delegating expert selection to dedicated routers.

  • Introduces BaseRouter (template select_experts) and concrete routers: FusedTopKRouter, FusedTopKBiasRouter, GroupedTopKRouter, CustomRoutingRouter, RoutingSimulatorRouter
  • Adds create_fused_moe_router for router selection (env-sim → grouped → bias → custom → default)
  • New EplbLayerState to hold per-layer EPLB tensors; common EPLB validation/mapping moved into BaseRouter
  • Refactors FusedMoE to construct/store a router and remove inline routing; quant paths (e.g., quark) call router.select_experts
  • Extends RoutingMethodType with Custom and Simulated (adjusts Unspecified value)
  • Adds CUDA tests covering fused top-k, bias-corrected, grouped top-k, custom (Llama4), and simulator routing; updates simulator test to instantiate FusedMoE per strategy

Written by Cursor Bugbot for commit 7af1cc865df8e78b560bab574f218487f4375a88. This will update automatically on new commits. Configure here.


Note

Decouples expert selection from FusedMoE into modular routers and a factory; consolidates EPLB and simplifies the layer.

  • Adds BaseRouter and concrete routers: FusedTopKRouter, FusedTopKBiasRouter, GroupedTopKRouter, CustomRoutingRouter, RoutingSimulatorRouter; introduces create_fused_moe_router
  • New EplbLayerState and common EPLB validation/mapping in BaseRouter.select_experts; FusedMoE now constructs/stores a router and removes inline routing logic
  • Moves/exports routing funcs: fused_topkfused_topk_router.py, grouped top-k logic → grouped_topk_router.py; extends RoutingMethodType with Custom and Simulated (adjusts Unspecified)
  • Updates quantization path (e.g., quark) to call router.select_experts
  • Adds/updates tests: new tests/kernels/moe/test_routing.py, refactors simulator to routing_simulator_router, and fixes imports across MoE tests

Written by Cursor Bugbot for commit 3f436f5df21299674bb17ebb5ce75bdcfdc02756. This will update automatically on new commits. Configure here.


Note

Modularizes MoE routing and simplifies FusedMoE by delegating expert selection to router classes.

  • Adds BaseRouter plus FusedTopKRouter, FusedTopKBiasRouter, GroupedTopKRouter, CustomRoutingRouter, and RoutingSimulatorRouter; exposes create_fused_moe_router
  • Introduces EplbLayerState and moves EPLB validation/mapping into common select_experts; updates FusedMoE to store a router and remove inline routing
  • Splits routing code: fused_topkfused_topk_router.py, grouped-topk → grouped_topk_router.py; updates __init__ exports
  • Extends RoutingMethodType with Custom and Simulated (adjusts Unspecified); updates quantization path (e.g., quark) to call router.select_experts
  • Adds/updates tests: new tests/kernels/moe/test_routing.py, refactors simulator test and imports across MoE tests

Written by Cursor Bugbot for commit 3f436f5df21299674bb17ebb5ce75bdcfdc02756. This will update automatically on new commits. Configure here.


Note

Modularizes MoE routing and simplifies FusedMoE by delegating expert selection to reusable router classes.

  • Introduces BaseRouter plus concrete routers: FusedTopKRouter, FusedTopKBiasRouter, GroupedTopKRouter, CustomRoutingRouter, RoutingSimulatorRouter; adds create_fused_moe_router
  • Centralizes per-layer EPLB tensors via new EplbLayerState; common EPLB validation/mapping moved to BaseRouter.select_experts
  • Moves routing code into dedicated modules (fused_topk_router.py, grouped_topk_router.py) and updates exports/imports
  • Extends RoutingMethodType with Custom and Simulated (adjusts Unspecified); updates quant path (e.g., quark) to call router.select_experts
  • Refactors FusedMoE to construct/store a router and remove inline routing logic
  • Adds/updates tests: new tests/kernels/moe/test_routing.py, refactors simulator to routing_simulator_router, fixes imports across MoE tests

Written by Cursor Bugbot for commit 3f436f5df21299674bb17ebb5ce75bdcfdc02756. This will update automatically on new commits. Configure here.


Note

Separates expert routing from FusedMoE into reusable components and unifies EPLB handling.

  • Introduces BaseRouter and concrete routers (FusedTopKRouter, FusedTopKBiasRouter, GroupedTopKRouter, CustomRoutingRouter, RoutingSimulatorRouter) with create_fused_moe_router
  • Moves fused_topk and grouped-topk logic into dedicated modules; updates __init__ exports and test imports
  • Adds EplbLayerState; BaseRouter.select_experts validates/applies EPLB mapping and dtype conversion
  • Refactors FusedMoE to construct/store a router and remove inline routing code; exposes router method type
  • Extends RoutingMethodType with Custom and Simulated (adjusts Unspecified value)
  • Updates quantization (e.g., Quark) to call router.select_experts
  • Adds/updates tests: new tests/kernels/moe/test_routing.py, refactors simulator to routing_simulator_router, and adjusts MoE tests to new imports

Written by Cursor Bugbot for commit 3f436f5df21299674bb17ebb5ce75bdcfdc02756. This will update automatically on new commits. Configure here.


Note

Modularizes MoE routing and simplifies FusedMoE by delegating expert selection to dedicated router classes with a factory, while centralizing EPLB handling.

  • Introduces BaseRouter and routers: FusedTopKRouter, FusedTopKBiasRouter, GroupedTopKRouter, CustomRoutingRouter, RoutingSimulatorRouter; adds create_fused_moe_router
  • Moves routing code into fused_topk_router.py and grouped_topk_router.py; updates __init__ exports and test imports
  • Adds EplbLayerState and common EPLB validation/mapping in router flow; FusedMoE now stores a router and removes inline routing
  • Extends RoutingMethodType with Custom and Simulated; simulator now via routing_simulator_router
  • Updates quantization paths (e.g., quark) to call router.select_experts
  • Adds CUDA tests for fused top-k, bias-corrected, grouped top-k, custom (Llama4), and simulator routing; adjusts existing tests to new imports

Written by Cursor Bugbot for commit 086754d1f862ac0b861024f3aad58961b50f1f14. This will update automatically on new commits. Configure here.


Note

Modularizes MoE routing and simplifies FusedMoE by delegating expert selection to dedicated routers; consolidates EPLB handling.

  • Adds BaseRouter and routers: FusedTopKRouter, FusedTopKBiasRouter, GroupedTopKRouter, CustomRoutingRouter, RoutingSimulatorRouter; new create_fused_moe_router
  • Moves routing code: fused_topkfused_topk_router.py, grouped-topk logic → grouped_topk_router.py; updates __init__ exports
  • Introduces EplbLayerState and applies EPLB validation/mapping/dtype in common select_experts flow
  • Refactors FusedMoE to construct/store a router and remove inline routing; router exposes routing_method_type
  • Extends RoutingMethodType with Custom and Simulated (adjusts Unspecified value)
  • Updates quantization path (e.g., quark_moe) to call router.select_experts
  • Adds tests for fused top-k, bias-corrected, grouped top-k, custom (Llama4), and simulator routing; fixes imports in existing MoE tests

Written by Cursor Bugbot for commit 7273e06aec069d90aa2945e72b18a964a3a54500. This will update automatically on new commits. Configure here.


Note

Simplifies and modularizes MoE routing, improving extensibility and testability.

  • Introduces BaseRouter and concrete routers: FusedTopKRouter, FusedTopKBiasRouter, GroupedTopKRouter, CustomRoutingRouter, RoutingSimulatorRouter; adds create_fused_moe_router
  • Moves routing code: fused_topkfused_topk_router.py, grouped-topk → grouped_topk_router.py; updates __init__ exports and test imports
  • Adds EplbLayerState and unifies EPLB validation/mapping/dtype in BaseRouter.select_experts; FusedMoE now stores a router and delegates routing
  • Extends RoutingMethodType with Custom and Simulated; simulator now via routing_simulator_router
  • Updates quantization/paths (e.g., Quark, MXFP4, CPU/XPU checks) to use router.select_experts and eplb_state
  • Adds tests/kernels/moe/test_routing.py and refactors simulator test; broad test import fixes for fused_topk

Written by Cursor Bugbot for commit 4699a3cb0603a6b9597ec25879d65ca5b36c506a. This will update automatically on new commits. Configure here.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant and well-executed refactoring of the Mixture of Experts (MoE) routing logic. By extracting the routing functionality into separate FusedMoERouter classes, the code is now more modular, extensible, and easier to maintain. The use of a factory pattern in create_fused_moe_router to select the appropriate routing strategy is a clean design choice. The changes have been consistently applied across the FusedMoE layer and various quantization methods. Overall, this is an excellent improvement to the codebase. I have one minor suggestion to improve the error messages for consistency.

@mergify
Copy link

mergify bot commented Dec 18, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @bnellnm.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Dec 18, 2025
@bnellnm bnellnm force-pushed the fused-moe-router-3 branch from ec01222 to b073a08 Compare January 6, 2026 17:45
@mergify mergify bot removed the needs-rebase label Jan 6, 2026
@mergify
Copy link

mergify bot commented Jan 6, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @bnellnm.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify
Copy link

mergify bot commented Jan 7, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @bnellnm.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Jan 7, 2026
@bnellnm bnellnm force-pushed the fused-moe-router-3 branch from 1aa5d1f to a5a608c Compare January 8, 2026 19:25
@mergify mergify bot removed the needs-rebase label Jan 8, 2026
@zyongye
Copy link
Member

zyongye commented Jan 8, 2026

I saw there are a lot of PR for router refactor. Which one are we intending to merge?

@bnellnm bnellnm changed the title [Misc][Refactor] Refactor MoE router functions into separate classes [Misc][Refactor] Refactor MoE router functionality into separate classes Jan 8, 2026
@bnellnm bnellnm force-pushed the fused-moe-router-3 branch from 0c44229 to a6a6516 Compare January 8, 2026 20:55
@bnellnm bnellnm marked this pull request as ready for review January 8, 2026 20:57
@mergify
Copy link

mergify bot commented Jan 9, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @bnellnm.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Jan 9, 2026
@github-project-automation github-project-automation bot moved this to Backlog in MoE Refactor Jan 9, 2026
@robertgshaw2-redhat robertgshaw2-redhat moved this from Backlog to In progress in MoE Refactor Jan 9, 2026
@robertgshaw2-redhat robertgshaw2-redhat moved this from In progress to In review in MoE Refactor Jan 9, 2026
@robertgshaw2-redhat robertgshaw2-redhat changed the title [Misc][Refactor] Refactor MoE router functionality into separate classes [MoE Refactor] Separate Router into OO Classes Jan 9, 2026
@bnellnm bnellnm requested a review from tlrmchlsmth as a code owner January 9, 2026 23:20
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
@bnellnm bnellnm force-pushed the fused-moe-router-3 branch from 3c7c1ef to 18a2b27 Compare January 17, 2026 01:03
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
indices_type: torch.dtype | None,
) -> tuple[torch.Tensor, torch.Tensor]:
"""Compute routing using fused top-k with bias."""
topk_weights, topk_ids = fused_topk_bias(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: the indices_type is ignored here. In fact, in fused_topk_bias we always convert to int32. We could instead case to the indices_type

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I do this in the follow up?

renormalize=self.renormalize,
)

return topk_weights.to(torch.float32), topk_ids
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto re: topk_ids type

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Collaborator

@robertgshaw2-redhat robertgshaw2-redhat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the hard work on this!

@github-project-automation github-project-automation bot moved this to Ready in NVIDIA Jan 18, 2026
@robertgshaw2-redhat robertgshaw2-redhat merged commit 327a02d into vllm-project:main Jan 18, 2026
73 of 75 checks passed
@github-project-automation github-project-automation bot moved this from In review to Done in MoE Refactor Jan 18, 2026
@github-project-automation github-project-automation bot moved this from Ready to Done in NVIDIA Jan 18, 2026
gopalsarda pushed a commit to gopalsarda/vllm that referenced this pull request Jan 20, 2026
dsuhinin pushed a commit to dsuhinin/vllm that referenced this pull request Jan 21, 2026
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>
ItzDEXX pushed a commit to ItzDEXX/vllm that referenced this pull request Feb 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

nvidia ready ONLY add when PR is ready to merge/full CI is needed

Projects

Status: Done
Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants