Skip to content

Conversation

@minosfuture
Copy link
Contributor

@minosfuture minosfuture commented Jul 31, 2025

Purpose

This is to support token-to-expert routing logic customization. We need this to work around imbalanced token-to-expert selection and support balanced selection for benchmark purpose.

Customization can be implemented by extending RoutingStrategy and implement route_tokens, being registered with RoutingSimulator, and selected by VLLM_MOE_ROUTING_STRATEGY envvar.

Thanks @tlrmchlsmth for initial implementation and handover!

Next PR:

  • Add metrics to confirm effectiveness of customized routing
  • Potentially extend RoutingSimulator into a general Router class and modularize FusedMoE.select_experts implementation.

Test Plan

pytest tests/test_routing_simulator.py

Test Result

passed

(Optional) Documentation Update

@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a flexible framework for simulating token-to-expert routing strategies in MoE models. The changes are well-structured, adding a RoutingSimulator and an extensible RoutingStrategy base class. The integration via the VLLM_MOE_ROUTING_STRATEGY environment variable is a good approach for enabling these custom strategies.

My review focuses on improving test robustness and fixing a critical issue in the new public API. Specifically, I've pointed out:

  • The need to use pytest.monkeypatch for safely managing environment variables in tests to prevent flakiness.
  • An unused pytest fixture in a new test.
  • A critical bug where a new public method has a default argument that will cause a crash.

Addressing these points will enhance the correctness and maintainability of the new testing and simulation capabilities.

@minosfuture minosfuture changed the title [Feature] Support routing logic simulation [Misc] Support routing logic simulation Jul 31, 2025
vllm/envs.py Outdated
Comment on lines 971 to 972
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any plans for routing strategies besides softmax that would result in correct outputs? If not, let's give it a name that indicates that it should be used for testing? E.g. VLLM_MOE_ROUTING_SIMULATOR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking of extending RoutingSimulator to a general class Router in next PRs and break FusedMoE.select_experts into multiple RoutingStrategy subclasses. wdyt?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you elaborate a bit more on what the specific RoutingStrategy subclasses will be and why there should be an environment variable to select between them?

Are you planning on adding different strategies that have different performance characteristics?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can organize strategies into the following

  1. common strategies provided by vllm
  • GroupedTopk
  • FusedTopk
  1. simulated strategies for benchmark
  • UniformRandomRouting
  • WeightedRandomRouting
  • ImbalancedRouting (to be added)
  • and yes, we can add more strategies here for different performance characteristics
  1. custom strategies registered by model implementation
    // this can replace the current custom_routing_function

strategy selection can be finalized in FusedMoE init function, consolidating this envvar VLLM_MOE_ROUTING_STRATEGY and model configuration.
And VLLM_MOE_ROUTING_STRATEGY can be used to override model config, e.g., for benchmark purpose.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

name "VLLM_MOE_ROUTING_STRATEGY" feels misleading. I feel moe routing strategy is referring token choice or expert choice or threshold based etc. Maybe we can rename it to something like VLLM_MOE_ROUTING_SIMULATION_STRATEGY?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds like it's preferred to keep this envvar solely for simulation!
updated and changed the naming. thx!

minosfuture and others added 5 commits August 1, 2025 14:16
Signed-off-by: Ming Yang <[email protected]>
Co-authored-by: Tyler Michael Smith <[email protected]>
Signed-off-by: Ming Yang <[email protected]>
Signed-off-by: Ming Yang <[email protected]>

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
pass


class UniformRandomRouting(RoutingStrategy):
Copy link
Contributor

@wpc wpc Aug 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is just a special case for imbalance routing? maybe user can specify a function name to generate dispatching distribution instead of having multiple subclasses?

Copy link
Contributor Author

@minosfuture minosfuture Aug 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great point. Updated. I'll also add the imbalanced factor implementation in a followup PR.

@tlrmchlsmth tlrmchlsmth added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 5, 2025
@tlrmchlsmth tlrmchlsmth enabled auto-merge (squash) August 5, 2025 23:39
@minosfuture
Copy link
Contributor Author

@tlrmchlsmth the failed test in CI is passed locally. I wonder if this test is flaky. Should we bypass?

@vllm-bot vllm-bot merged commit 82216dc into vllm-project:main Aug 7, 2025
51 of 53 checks passed
@DarkLight1337
Copy link
Member

Merging

jinzhen-lin pushed a commit to jinzhen-lin/vllm that referenced this pull request Aug 9, 2025
Signed-off-by: Ming Yang <[email protected]>
Co-authored-by: Tyler Michael Smith <[email protected]>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Jinzhen Lin <[email protected]>
noamgat pushed a commit to noamgat/vllm that referenced this pull request Aug 9, 2025
Signed-off-by: Ming Yang <[email protected]>
Co-authored-by: Tyler Michael Smith <[email protected]>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Noam Gat <[email protected]>
paulpak58 pushed a commit to paulpak58/vllm that referenced this pull request Aug 13, 2025
Signed-off-by: Ming Yang <[email protected]>
Co-authored-by: Tyler Michael Smith <[email protected]>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Paul Pak <[email protected]>
diegocastanibm pushed a commit to diegocastanibm/vllm that referenced this pull request Aug 15, 2025
Signed-off-by: Ming Yang <[email protected]>
Co-authored-by: Tyler Michael Smith <[email protected]>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Diego-Castan <[email protected]>
epwalsh pushed a commit to epwalsh/vllm that referenced this pull request Aug 28, 2025
Signed-off-by: Ming Yang <[email protected]>
Co-authored-by: Tyler Michael Smith <[email protected]>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
xiao-llm pushed a commit to xiao-llm/vllm that referenced this pull request Aug 28, 2025
Signed-off-by: Ming Yang <[email protected]>
Co-authored-by: Tyler Michael Smith <[email protected]>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Xiao Yu <[email protected]>
zhewenl pushed a commit to zhewenl/vllm that referenced this pull request Aug 28, 2025
Signed-off-by: Ming Yang <[email protected]>
Co-authored-by: Tyler Michael Smith <[email protected]>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants