Allow non-DeepSeekV3 routing with one group by dbari · Pull Request #2502 · flashinfer-ai/flashinfer

dbari · 2026-02-05T14:22:17Z

📌 Description

This PR allows running any routing method with one group. Previously, all routing methods except for DeepSeekV3 required the number of groups to be unset or set to zero. However, Mistral Large 3 defines it to be one and uses Renormalize as routing. This worked only by using a workaround in vLLM to unset the number of groups if it's equal to one.

In order to simplify and generalize the code in vLLM, it makes sense to accept any routing as long as the number of groups is at most one.

🔍 Related Issues

Related vLLM issue: vllm-project/vllm#33792

🚀 Pull Request Checklist

Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.

✅ Pre-commit Checks

I have installed pre-commit by running pip install pre-commit (or used your preferred method).
I have installed the hooks with pre-commit install.
I have run the hooks manually with pre-commit run --all-files and fixed any reported issues.

If you are unsure about how to set up pre-commit, see the pre-commit documentation.

🧪 Tests

Tests have been added or updated as needed.
All tests are passing (unittest, etc.).

Reviewer Notes

The tests are still running locally. I make small adjustments in case anything fails, however this can already be reviewed.

Summary by CodeRabbit

Bug Fixes
- Strengthened routing configuration validation with explicit constraint enforcement for different routing modes to prevent invalid setups.
- Tightened group-based routing checks to ensure consistent expert selection limits and parameter relationships when groups are enabled.
- Improved and consolidated error message formatting for configuration validation to make failures clearer and more consistent.

coderabbitai · 2026-02-05T14:22:36Z

📝 Walkthrough

Walkthrough

The pull request refactors routing validation in the MOE kernel launcher to add an explicit DeepSeekV3 branch with group-specific constraints, enforces no-groups limits before other routing branches, and consolidates multi-part error message concatenation into single-line formatting.

Changes

Cohort / File(s)	Summary
MOE Kernel Launcher `csrc/trtllm_fused_moe_kernel_launcher.cu`	Reworked routing validation to add an explicit `DeepSeekV3` branch enforcing `n_group != 0`, `topk_group != 0`, divisibility of `num_experts` by `n_group`, and additional group constraints (`top_k <= 8`, `topk_group <= 4`, `topk_group <= n_group`, `top_k < experts_in_selected_groups`). Added a no-groups branch (`n_group <= 1`, `topk_group <= 1`) applied prior to other routing branches. Consolidated multi-part error message construction into single-line concatenations for dtype messages.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

tiny support glm routing #2313 — Also modifies DeepSeekV3 MoE routing behavior; related changes to routing logic and constraints.
fix: add DeepSeek routing for Bf16xBf16 and MxIntxBf16 TRT-LLM Gen MoE #2234 — Touches the same launcher file and routing plumbing; overlaps with validation and launcher-path formatting adjustments.

Suggested reviewers

djmmoss
yzh119
cyx-6
jimmyzho
jiahanc

Poem

🐰
Hop-hop through expert lanes I dash,
DeepSeekV3 groups now trimmed and brash,
Counts and topks set tidy and small,
Validation hops in, precise for all,
Kernel carrots crunch—logic finds its stash.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'Allow non-DeepSeekV3 routing with one group' clearly and concisely describes the main change: enabling routing methods other than DeepSeekV3 to work with a single group.
Description check	✅ Passed	The PR description includes the required sections: a clear explanation of what the PR does and why it's needed, related issues, completed pre-commit checks, and updated tests.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist · 2026-02-05T14:23:42Z

Summary of Changes

Hello @dbari, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the flexibility and robustness of the routing mechanism within the fused MoE kernel launcher. It relaxes constraints on routing methods when using a single group and adds validation checks to ensure configurations are consistent with the selected routing method. This change simplifies the codebase and accommodates a wider range of model configurations.

Highlights

Routing Method Flexibility: This PR broadens the allowed routing methods when using a single group, accommodating configurations like Mistral Large 3's use of Renormalize routing with one group.
Code Simplification: The changes aim to simplify vLLM's code by removing workarounds and generalizing the acceptance of routing methods with at most one group.
Consistency Checks: The PR introduces additional checks to ensure that n_group and topk_group are appropriately configured for different routing methods, enhancing the robustness of the system.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

csrc/trtllm_fused_moe_kernel_launcher.cu
- Modified the conditions for DeepSeekV3 routing to ensure n_group is not zero.
- Added checks to ensure n_group and topk_group are less than or equal to 1 for Renormalize and Llama4 routing methods.
- Removed redundant logging in trtllm_get_valid_moe_configs.

Activity

The PR addresses issue #33792 in the vLLM project.
Pre-commit checks have been executed and passed.
Tests have been added or updated as needed and are currently running locally.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request aims to allow non-DeepSeekV3 routing methods to be used with a single group (n_group=1), which was previously disallowed. The changes correctly modify the validation logic for DeepSeekV3 and add checks for Renormalize and Llama4 routing to support n_group <= 1.

My review identifies a potential issue where other routing methods (like Default, TopK) are no longer checked for n_group, which could lead to unexpected behavior if they are used with more than one group. I've suggested a refactoring to apply the n_group <= 1 check to all non-DeepSeekV3 methods for consistency and robustness, which also improves code clarity and reduces duplication.

csrc/trtllm_fused_moe_kernel_launcher.cu

Signed-off-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com>

aleozlx

lgtm

aleozlx · 2026-02-05T16:47:06Z

/bot run

flashinfer-bot · 2026-02-05T16:47:39Z

GitLab MR !299 has been created, and the CI pipeline #43364347 is currently running. I'll report back once the pipeline job completes.

yongwww · 2026-02-05T17:04:57Z

@flashinfer-bot run

dbari requested review from cyx-6, djmmoss, jiahanc, jimmyzho and yzh119 as code owners February 5, 2026 14:22

gemini-code-assist bot reviewed Feb 5, 2026

View reviewed changes

dbari mentioned this pull request Feb 5, 2026

Fix RoutingMethodType logic vllm-project/vllm#33919

Merged

5 tasks

dbari force-pushed the dbariamis/allow-non-dsv3-routing-with-1-group branch from 7f8ebdd to 7e91b3e Compare February 5, 2026 15:12

aleozlx added the v0.6.3 label Feb 5, 2026

aleozlx reviewed Feb 5, 2026

View reviewed changes

csrc/trtllm_fused_moe_kernel_launcher.cu Outdated Show resolved Hide resolved

Allow non-DeepSeekV3 routing with one group

21cf038

Signed-off-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com>

dbari force-pushed the dbariamis/allow-non-dsv3-routing-with-1-group branch from 7e91b3e to 21cf038 Compare February 5, 2026 16:44

aleozlx approved these changes Feb 5, 2026

View reviewed changes

flashinfer-bot added the run-ci label Feb 5, 2026

yzh119 approved these changes Feb 5, 2026

View reviewed changes

yzh119 merged commit 1e9b237 into flashinfer-ai:main Feb 5, 2026
43 checks passed

elvischenv mentioned this pull request Feb 9, 2026

Flashinfer MOE FP8 support for Mistral Large 3. sgl-project/sglang#15422

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow non-DeepSeekV3 routing with one group#2502

Allow non-DeepSeekV3 routing with one group#2502
yzh119 merged 1 commit intoflashinfer-ai:mainfrom
dbari:dbariamis/allow-non-dsv3-routing-with-1-group

dbari commented Feb 5, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Feb 5, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

gemini-code-assist bot commented Feb 5, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

aleozlx left a comment

Uh oh!

aleozlx commented Feb 5, 2026

Uh oh!

flashinfer-bot commented Feb 5, 2026

Uh oh!

yongwww commented Feb 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

dbari commented Feb 5, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📌 Description

🔍 Related Issues

🚀 Pull Request Checklist

✅ Pre-commit Checks

🧪 Tests

Reviewer Notes

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

gemini-code-assist bot commented Feb 5, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

aleozlx left a comment

Choose a reason for hiding this comment

Uh oh!

aleozlx commented Feb 5, 2026

Uh oh!

flashinfer-bot commented Feb 5, 2026

Uh oh!

yongwww commented Feb 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

dbari commented Feb 5, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 5, 2026 •

edited

Loading