[Bugfix] Kimi-K2 grouped_topk usage for Flashinfer monolithic kernels.#33858
[Bugfix] Kimi-K2 grouped_topk usage for Flashinfer monolithic kernels.#33858simon-mo merged 1 commit intovllm-project:mainfrom
Conversation
Signed-off-by: Pavani Majety <pmajety@nvidia.com>
There was a problem hiding this comment.
Code Review
This pull request addresses a bug in the DeepseekV2MoE layer where grouped_topk routing was incorrectly disabled for the specific case of (n_group, topk_group) == (1, 1). This caused issues for models like Kimi-K2 that rely on this configuration, particularly when an e_score_correction_bias is used, as the fallback routing mechanism did not account for it. The fix removes this special condition, ensuring that GroupedTopKRouter is consistently used, which correctly handles all configurations, including the (1, 1) case. The resulting code is cleaner and more robust. The significant improvement in accuracy demonstrated in the test results validates this change. The concern regarding Mistral appears to be related to model configurations rather than a direct issue with this code modification, as Mistral models are handled by a separate implementation.
|
verified AIME and GSM8K passed, thanks for the fix! |
vllm-project#33858) Signed-off-by: Pavani Majety <pmajety@nvidia.com>
vllm-project#33858) Signed-off-by: Pavani Majety <pmajety@nvidia.com>
vllm-project#33858) Signed-off-by: Pavani Majety <pmajety@nvidia.com>
Purpose
This PR fixes a bug introduced in PR #33174 that sets the values for n_group and topk_group to None when they are (1, 1) respectively. This while it fixes Kimi-K2 may introduce an error with Mistral. @dbari Please confirm if this fix is good or if the values need to be passed differently
The marlin path works because it doesn't have monolithic kernel for routing + MOE unlike the INT4 TRTLLM MOE Kernels.
Test Plan
GSM8k before and after.
Test Result
Main
Kimi-K2-Thinking(Buggy)With PR - (Fixed)
Kimi-K2.5 + Flashinfer
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.