[Misc] Support routing logic simulation #21990

minosfuture · 2025-07-31T05:28:50Z

Purpose

This is to support token-to-expert routing logic customization. We need this to work around imbalanced token-to-expert selection and support balanced selection for benchmark purpose.

Customization can be implemented by extending RoutingStrategy and implement route_tokens, being registered with RoutingSimulator, and selected by VLLM_MOE_ROUTING_STRATEGY envvar.

Thanks @tlrmchlsmth for initial implementation and handover!

Next PR:

Add metrics to confirm effectiveness of customized routing
Potentially extend RoutingSimulator into a general Router class and modularize FusedMoE.select_experts implementation.

Test Plan

pytest tests/test_routing_simulator.py

Test Result

passed

(Optional) Documentation Update

github-actions · 2025-07-31T05:28:59Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

gemini-code-assist

Code Review

This pull request introduces a flexible framework for simulating token-to-expert routing strategies in MoE models. The changes are well-structured, adding a RoutingSimulator and an extensible RoutingStrategy base class. The integration via the VLLM_MOE_ROUTING_STRATEGY environment variable is a good approach for enabling these custom strategies.

My review focuses on improving test robustness and fixing a critical issue in the new public API. Specifically, I've pointed out:

The need to use pytest.monkeypatch for safely managing environment variables in tests to prevent flakiness.
An unused pytest fixture in a new test.
A critical bug where a new public method has a default argument that will cause a crash.

Addressing these points will enhance the correctness and maintainability of the new testing and simulation capabilities.

vllm/model_executor/layers/fused_moe/layer.py

tests/test_routing_simulator.py

tlrmchlsmth · 2025-07-31T20:47:14Z

vllm/envs.py

Any plans for routing strategies besides softmax that would result in correct outputs? If not, let's give it a name that indicates that it should be used for testing? E.g. VLLM_MOE_ROUTING_SIMULATOR

I'm thinking of extending RoutingSimulator to a general class Router in next PRs and break FusedMoE.select_experts into multiple RoutingStrategy subclasses. wdyt?

Could you elaborate a bit more on what the specific RoutingStrategy subclasses will be and why there should be an environment variable to select between them?

Are you planning on adding different strategies that have different performance characteristics?

We can organize strategies into the following

common strategies provided by vllm

GroupedTopk

FusedTopk

simulated strategies for benchmark

UniformRandomRouting

WeightedRandomRouting

ImbalancedRouting (to be added)

and yes, we can add more strategies here for different performance characteristics

custom strategies registered by model implementation
// this can replace the current custom_routing_function

strategy selection can be finalized in FusedMoE init function, consolidating this envvar VLLM_MOE_ROUTING_STRATEGY and model configuration.
And VLLM_MOE_ROUTING_STRATEGY can be used to override model config, e.g., for benchmark purpose.

name "VLLM_MOE_ROUTING_STRATEGY" feels misleading. I feel moe routing strategy is referring token choice or expert choice or threshold based etc. Maybe we can rename it to something like VLLM_MOE_ROUTING_SIMULATION_STRATEGY?

sounds like it's preferred to keep this envvar solely for simulation!
updated and changed the naming. thx!

vllm/envs.py

vllm/model_executor/layers/fused_moe/routing_simulator.py

.gitignore

Signed-off-by: Ming Yang <[email protected]> Co-authored-by: Tyler Michael Smith <[email protected]>

Signed-off-by: Ming Yang <[email protected]>

Signed-off-by: Ming Yang <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Signed-off-by: Ming Yang <[email protected]>

vllm/model_executor/layers/fused_moe/layer.py

wpc · 2025-08-04T18:01:44Z

vllm/model_executor/layers/fused_moe/routing_simulator.py

+        pass
+
+
+class UniformRandomRouting(RoutingStrategy):


this is just a special case for imbalance routing? maybe user can specify a function name to generate dispatching distribution instead of having multiple subclasses?

great point. Updated. I'll also add the imbalanced factor implementation in a followup PR.

vllm/model_executor/layers/fused_moe/routing_simulator.py

…outing to DistributionBasedRouting Signed-off-by: Ming Yang <[email protected]>

Signed-off-by: Ming Yang <[email protected]>

minosfuture · 2025-08-06T23:18:56Z

@tlrmchlsmth the failed test in CI is passed locally. I wonder if this test is flaky. Should we bypass?

DarkLight1337 · 2025-08-07T06:06:25Z

Merging

Signed-off-by: Ming Yang <[email protected]> Co-authored-by: Tyler Michael Smith <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Jinzhen Lin <[email protected]>

Signed-off-by: Ming Yang <[email protected]> Co-authored-by: Tyler Michael Smith <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Noam Gat <[email protected]>

Signed-off-by: Ming Yang <[email protected]> Co-authored-by: Tyler Michael Smith <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Paul Pak <[email protected]>

Signed-off-by: Ming Yang <[email protected]> Co-authored-by: Tyler Michael Smith <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Diego-Castan <[email protected]>

Signed-off-by: Ming Yang <[email protected]> Co-authored-by: Tyler Michael Smith <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Signed-off-by: Ming Yang <[email protected]> Co-authored-by: Tyler Michael Smith <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Xiao Yu <[email protected]>

Signed-off-by: Ming Yang <[email protected]> Co-authored-by: Tyler Michael Smith <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

gemini-code-assist bot reviewed Jul 31, 2025

View reviewed changes

vllm/model_executor/layers/fused_moe/layer.py Outdated Show resolved Hide resolved

tests/test_routing_simulator.py Outdated Show resolved Hide resolved

tests/test_routing_simulator.py Outdated Show resolved Hide resolved

minosfuture force-pushed the tok_expert_sim branch from fbaa4a1 to f02a545 Compare July 31, 2025 05:42

minosfuture changed the title ~~[Feature] Support routing logic simulation~~ [Misc] Support routing logic simulation Jul 31, 2025

tlrmchlsmth reviewed Jul 31, 2025

View reviewed changes

tlrmchlsmth reviewed Aug 1, 2025

View reviewed changes

.gitignore Outdated Show resolved Hide resolved

minosfuture and others added 5 commits August 1, 2025 14:16

[Feature] Support routing logic simulation

c4af364

Signed-off-by: Ming Yang <[email protected]> Co-authored-by: Tyler Michael Smith <[email protected]>

Add test cases

630c073

Signed-off-by: Ming Yang <[email protected]>

Address gemini comments: environ, default arg

f1f791a

Signed-off-by: Ming Yang <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

remove doc for softmax routing strategy

02cbd66

Signed-off-by: Ming Yang <[email protected]>

remove gitignore change that's unrelated

d202b95

Signed-off-by: Ming Yang <[email protected]>

minosfuture force-pushed the tok_expert_sim branch from b813dca to d202b95 Compare August 1, 2025 21:17

minosfuture requested a review from tlrmchlsmth August 4, 2025 17:23

wpc reviewed Aug 4, 2025

View reviewed changes

vllm/model_executor/layers/fused_moe/layer.py Show resolved Hide resolved

wpc reviewed Aug 4, 2025

View reviewed changes

vllm/model_executor/layers/fused_moe/routing_simulator.py Outdated Show resolved Hide resolved

minosfuture added 2 commits August 4, 2025 14:13

address comments: remove WeightedRandomRouting; extend UniformRandomR…

862ac07

…outing to DistributionBasedRouting Signed-off-by: Ming Yang <[email protected]>

address comments: rename envvar

4f8fc98

Signed-off-by: Ming Yang <[email protected]>

tlrmchlsmth added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 5, 2025

tlrmchlsmth approved these changes Aug 5, 2025

View reviewed changes

tlrmchlsmth enabled auto-merge (squash) August 5, 2025 23:39

vllm-bot merged commit 82216dc into vllm-project:main Aug 7, 2025
51 of 53 checks passed

Uh oh!

[Misc] Support routing logic simulation #21990

[Misc] Support routing logic simulation #21990

Uh oh!

Conversation

minosfuture commented Jul 31, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

(Optional) Documentation Update

Uh oh!

github-actions bot commented Jul 31, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tlrmchlsmth Jul 31, 2025

Choose a reason for hiding this comment

Uh oh!

minosfuture Aug 1, 2025

Choose a reason for hiding this comment

Uh oh!

tlrmchlsmth Aug 1, 2025

Choose a reason for hiding this comment

Uh oh!

minosfuture Aug 1, 2025

Choose a reason for hiding this comment

Uh oh!

wpc Aug 4, 2025

Choose a reason for hiding this comment

Uh oh!

minosfuture Aug 5, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wpc Aug 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

minosfuture Aug 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

minosfuture commented Aug 6, 2025

Uh oh!

Uh oh!

DarkLight1337 commented Aug 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

minosfuture commented Jul 31, 2025 •

edited by github-actions bot

Loading

wpc Aug 4, 2025 •

edited

Loading

minosfuture Aug 4, 2025 •

edited

Loading