Skip to content

Allow non-DeepSeekV3 routing with one group#2502

Merged
yzh119 merged 1 commit intoflashinfer-ai:mainfrom
dbari:dbariamis/allow-non-dsv3-routing-with-1-group
Feb 5, 2026
Merged

Allow non-DeepSeekV3 routing with one group#2502
yzh119 merged 1 commit intoflashinfer-ai:mainfrom
dbari:dbariamis/allow-non-dsv3-routing-with-1-group

Conversation

@dbari
Copy link
Contributor

@dbari dbari commented Feb 5, 2026

📌 Description

This PR allows running any routing method with one group. Previously, all routing methods except for DeepSeekV3 required the number of groups to be unset or set to zero. However, Mistral Large 3 defines it to be one and uses Renormalize as routing. This worked only by using a workaround in vLLM to unset the number of groups if it's equal to one.

In order to simplify and generalize the code in vLLM, it makes sense to accept any routing as long as the number of groups is at most one.

🔍 Related Issues

Related vLLM issue: vllm-project/vllm#33792

🚀 Pull Request Checklist

Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.

✅ Pre-commit Checks

  • I have installed pre-commit by running pip install pre-commit (or used your preferred method).
  • I have installed the hooks with pre-commit install.
  • I have run the hooks manually with pre-commit run --all-files and fixed any reported issues.

If you are unsure about how to set up pre-commit, see the pre-commit documentation.

🧪 Tests

  • Tests have been added or updated as needed.
  • All tests are passing (unittest, etc.).

Reviewer Notes

The tests are still running locally. I make small adjustments in case anything fails, however this can already be reviewed.

Summary by CodeRabbit

  • Bug Fixes
    • Strengthened routing configuration validation with explicit constraint enforcement for different routing modes to prevent invalid setups.
    • Tightened group-based routing checks to ensure consistent expert selection limits and parameter relationships when groups are enabled.
    • Improved and consolidated error message formatting for configuration validation to make failures clearer and more consistent.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 5, 2026

📝 Walkthrough

Walkthrough

The pull request refactors routing validation in the MOE kernel launcher to add an explicit DeepSeekV3 branch with group-specific constraints, enforces no-groups limits before other routing branches, and consolidates multi-part error message concatenation into single-line formatting.

Changes

Cohort / File(s) Summary
MOE Kernel Launcher
csrc/trtllm_fused_moe_kernel_launcher.cu
Reworked routing validation to add an explicit DeepSeekV3 branch enforcing n_group != 0, topk_group != 0, divisibility of num_experts by n_group, and additional group constraints (top_k <= 8, topk_group <= 4, topk_group <= n_group, top_k < experts_in_selected_groups). Added a no-groups branch (n_group <= 1, topk_group <= 1) applied prior to other routing branches. Consolidated multi-part error message construction into single-line concatenations for dtype messages.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested reviewers

  • djmmoss
  • yzh119
  • cyx-6
  • jimmyzho
  • jiahanc

Poem

🐰
Hop-hop through expert lanes I dash,
DeepSeekV3 groups now trimmed and brash,
Counts and topks set tidy and small,
Validation hops in, precise for all,
Kernel carrots crunch—logic finds its stash.

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Allow non-DeepSeekV3 routing with one group' clearly and concisely describes the main change: enabling routing methods other than DeepSeekV3 to work with a single group.
Description check ✅ Passed The PR description includes the required sections: a clear explanation of what the PR does and why it's needed, related issues, completed pre-commit checks, and updated tests.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @dbari, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the flexibility and robustness of the routing mechanism within the fused MoE kernel launcher. It relaxes constraints on routing methods when using a single group and adds validation checks to ensure configurations are consistent with the selected routing method. This change simplifies the codebase and accommodates a wider range of model configurations.

Highlights

  • Routing Method Flexibility: This PR broadens the allowed routing methods when using a single group, accommodating configurations like Mistral Large 3's use of Renormalize routing with one group.
  • Code Simplification: The changes aim to simplify vLLM's code by removing workarounds and generalizing the acceptance of routing methods with at most one group.
  • Consistency Checks: The PR introduces additional checks to ensure that n_group and topk_group are appropriately configured for different routing methods, enhancing the robustness of the system.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • csrc/trtllm_fused_moe_kernel_launcher.cu
    • Modified the conditions for DeepSeekV3 routing to ensure n_group is not zero.
    • Added checks to ensure n_group and topk_group are less than or equal to 1 for Renormalize and Llama4 routing methods.
    • Removed redundant logging in trtllm_get_valid_moe_configs.
Activity
  • The PR addresses issue #33792 in the vLLM project.
  • Pre-commit checks have been executed and passed.
  • Tests have been added or updated as needed and are currently running locally.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to allow non-DeepSeekV3 routing methods to be used with a single group (n_group=1), which was previously disallowed. The changes correctly modify the validation logic for DeepSeekV3 and add checks for Renormalize and Llama4 routing to support n_group <= 1.

My review identifies a potential issue where other routing methods (like Default, TopK) are no longer checked for n_group, which could lead to unexpected behavior if they are used with more than one group. I've suggested a refactoring to apply the n_group <= 1 check to all non-DeepSeekV3 methods for consistency and robustness, which also improves code clarity and reduces duplication.

@dbari dbari force-pushed the dbariamis/allow-non-dsv3-routing-with-1-group branch from 7f8ebdd to 7e91b3e Compare February 5, 2026 15:12
@aleozlx aleozlx added the v0.6.3 label Feb 5, 2026
Signed-off-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com>
@dbari dbari force-pushed the dbariamis/allow-non-dsv3-routing-with-1-group branch from 7e91b3e to 21cf038 Compare February 5, 2026 16:44
Copy link
Collaborator

@aleozlx aleozlx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@aleozlx
Copy link
Collaborator

aleozlx commented Feb 5, 2026

/bot run

@flashinfer-bot
Copy link
Collaborator

GitLab MR !299 has been created, and the CI pipeline #43364347 is currently running. I'll report back once the pipeline job completes.

@yongwww
Copy link
Member

yongwww commented Feb 5, 2026

@flashinfer-bot run

@yzh119 yzh119 merged commit 1e9b237 into flashinfer-ai:main Feb 5, 2026
43 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants