Allow non-DeepSeekV3 routing with one group#2502
Conversation
📝 WalkthroughWalkthroughThe pull request refactors routing validation in the MOE kernel launcher to add an explicit DeepSeekV3 branch with group-specific constraints, enforces no-groups limits before other routing branches, and consolidates multi-part error message concatenation into single-line formatting. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello @dbari, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request enhances the flexibility and robustness of the routing mechanism within the fused MoE kernel launcher. It relaxes constraints on routing methods when using a single group and adds validation checks to ensure configurations are consistent with the selected routing method. This change simplifies the codebase and accommodates a wider range of model configurations. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request aims to allow non-DeepSeekV3 routing methods to be used with a single group (n_group=1), which was previously disallowed. The changes correctly modify the validation logic for DeepSeekV3 and add checks for Renormalize and Llama4 routing to support n_group <= 1.
My review identifies a potential issue where other routing methods (like Default, TopK) are no longer checked for n_group, which could lead to unexpected behavior if they are used with more than one group. I've suggested a refactoring to apply the n_group <= 1 check to all non-DeepSeekV3 methods for consistency and robustness, which also improves code clarity and reduces duplication.
7f8ebdd to
7e91b3e
Compare
Signed-off-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com>
7e91b3e to
21cf038
Compare
|
/bot run |
|
@flashinfer-bot run |
📌 Description
This PR allows running any routing method with one group. Previously, all routing methods except for
DeepSeekV3required the number of groups to be unset or set to zero. However, Mistral Large 3 defines it to be one and usesRenormalizeas routing. This worked only by using a workaround in vLLM to unset the number of groups if it's equal to one.In order to simplify and generalize the code in vLLM, it makes sense to accept any routing as long as the number of groups is at most one.
🔍 Related Issues
Related vLLM issue: vllm-project/vllm#33792
🚀 Pull Request Checklist
Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.
✅ Pre-commit Checks
pre-commitby runningpip install pre-commit(or used your preferred method).pre-commit install.pre-commit run --all-filesand fixed any reported issues.🧪 Tests
unittest, etc.).Reviewer Notes
The tests are still running locally. I make small adjustments in case anything fails, however this can already be reviewed.
Summary by CodeRabbit