-
-
Notifications
You must be signed in to change notification settings - Fork 11.3k
[Misc] Support routing logic simulation #21990
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a flexible framework for simulating token-to-expert routing strategies in MoE models. The changes are well-structured, adding a RoutingSimulator and an extensible RoutingStrategy base class. The integration via the VLLM_MOE_ROUTING_STRATEGY environment variable is a good approach for enabling these custom strategies.
My review focuses on improving test robustness and fixing a critical issue in the new public API. Specifically, I've pointed out:
- The need to use
pytest.monkeypatchfor safely managing environment variables in tests to prevent flakiness. - An unused
pytestfixture in a new test. - A critical bug where a new public method has a default argument that will cause a crash.
Addressing these points will enhance the correctness and maintainability of the new testing and simulation capabilities.
fbaa4a1 to
f02a545
Compare
vllm/envs.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any plans for routing strategies besides softmax that would result in correct outputs? If not, let's give it a name that indicates that it should be used for testing? E.g. VLLM_MOE_ROUTING_SIMULATOR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking of extending RoutingSimulator to a general class Router in next PRs and break FusedMoE.select_experts into multiple RoutingStrategy subclasses. wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you elaborate a bit more on what the specific RoutingStrategy subclasses will be and why there should be an environment variable to select between them?
Are you planning on adding different strategies that have different performance characteristics?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can organize strategies into the following
- common strategies provided by vllm
- GroupedTopk
- FusedTopk
- simulated strategies for benchmark
- UniformRandomRouting
- WeightedRandomRouting
- ImbalancedRouting (to be added)
- and yes, we can add more strategies here for different performance characteristics
- custom strategies registered by model implementation
// this can replace the current custom_routing_function
strategy selection can be finalized in FusedMoE init function, consolidating this envvar VLLM_MOE_ROUTING_STRATEGY and model configuration.
And VLLM_MOE_ROUTING_STRATEGY can be used to override model config, e.g., for benchmark purpose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
name "VLLM_MOE_ROUTING_STRATEGY" feels misleading. I feel moe routing strategy is referring token choice or expert choice or threshold based etc. Maybe we can rename it to something like VLLM_MOE_ROUTING_SIMULATION_STRATEGY?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds like it's preferred to keep this envvar solely for simulation!
updated and changed the naming. thx!
Signed-off-by: Ming Yang <[email protected]> Co-authored-by: Tyler Michael Smith <[email protected]>
Signed-off-by: Ming Yang <[email protected]>
Signed-off-by: Ming Yang <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Ming Yang <[email protected]>
Signed-off-by: Ming Yang <[email protected]>
b813dca to
d202b95
Compare
| pass | ||
|
|
||
|
|
||
| class UniformRandomRouting(RoutingStrategy): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is just a special case for imbalance routing? maybe user can specify a function name to generate dispatching distribution instead of having multiple subclasses?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great point. Updated. I'll also add the imbalanced factor implementation in a followup PR.
…outing to DistributionBasedRouting Signed-off-by: Ming Yang <[email protected]>
Signed-off-by: Ming Yang <[email protected]>
|
@tlrmchlsmth the failed test in CI is passed locally. I wonder if this test is flaky. Should we bypass? |
|
Merging |
Signed-off-by: Ming Yang <[email protected]> Co-authored-by: Tyler Michael Smith <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Jinzhen Lin <[email protected]>
Signed-off-by: Ming Yang <[email protected]> Co-authored-by: Tyler Michael Smith <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Noam Gat <[email protected]>
Signed-off-by: Ming Yang <[email protected]> Co-authored-by: Tyler Michael Smith <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Paul Pak <[email protected]>
Signed-off-by: Ming Yang <[email protected]> Co-authored-by: Tyler Michael Smith <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Diego-Castan <[email protected]>
Signed-off-by: Ming Yang <[email protected]> Co-authored-by: Tyler Michael Smith <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Ming Yang <[email protected]> Co-authored-by: Tyler Michael Smith <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Xiao Yu <[email protected]>
Signed-off-by: Ming Yang <[email protected]> Co-authored-by: Tyler Michael Smith <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Purpose
This is to support token-to-expert routing logic customization. We need this to work around imbalanced token-to-expert selection and support balanced selection for benchmark purpose.
Customization can be implemented by extending RoutingStrategy and implement route_tokens, being registered with RoutingSimulator, and selected by VLLM_MOE_ROUTING_STRATEGY envvar.
Thanks @tlrmchlsmth for initial implementation and handover!
Next PR:
Test Plan
pytest tests/test_routing_simulator.py
Test Result
passed
(Optional) Documentation Update