Add GQA group_size 5, 6, 7 to DISPATCH_GQA_GROUP_SIZE by arbi-dev · Pull Request #2986 · flashinfer-ai/flashinfer

arbi-dev · 2026-04-05T14:41:40Z

Summary

DISPATCH_GQA_GROUP_SIZE only handles group sizes 1, 2, 3, 4, 8. Any other value hits a runtime error:

RuntimeError: Unsupported group_size: 6

This breaks several popular models with non-power-of-2 GQA ratios:

Model	Q heads	KV heads	group_size
Qwen3.5-27B	24	4	6
InternLM2.5-20B	48	8	6
Qwen2.5-7B	28	4	7
Yi-1.5-34B	56	8	7

This PR adds explicit constexpr cases for group sizes 5, 6, and 7, so all sizes 1-8 are supported. Each adds one template instantiation per call site, matching the existing dispatch pattern.

Why this hasn't been reported widely

Most users access FlashInfer through vLLM or SGLang, which use BatchDecodeWithPagedKVCacheWrapper. That wrapper handles GQA at the Python level and doesn't go through DISPATCH_GQA_GROUP_SIZE. The error only manifests when calling the lower-level C++ kernel dispatch directly (e.g., custom attention backends or quantized KV cache implementations that bypass the Python wrapper).

Test plan

Verified Qwen3.5-27B (group_size=6) runs correctly after fix
CI: existing unit tests should pass (no behavior change for sizes 1-4, 8)

AI-assisted: Claude Opus 4.6 assisted with code generation and model survey.

Summary by CodeRabbit

Bug Fixes
- Extended support for additional group sizes (5, 6, and 7) in grouped query attention operations that were previously unsupported.

The macro only dispatched group sizes 1, 2, 3, 4, 8 — any other value hit a runtime error ("Unsupported group_size"). This breaks several popular models with non-power-of-2 GQA ratios: - group_size 6: Qwen3.5-27B (24Q/4KV), InternLM2.5-20B (48Q/8KV) - group_size 7: Qwen2.5-7B (28Q/4KV), Yi-1.5-34B (56Q/8KV) Add explicit constexpr cases for 5, 6, and 7 so all group sizes 1-8 are supported. Each adds one template instantiation per call site. The error manifests as: RuntimeError: Unsupported group_size: 6 when calling BatchDecodeWithPagedKVCache or similar kernel dispatch paths that go through DISPATCH_GQA_GROUP_SIZE.

coderabbitai · 2026-04-05T14:41:57Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 2655f013-9b19-49d3-bd33-465d380b65f8

📥 Commits

Reviewing files that changed from the base of the PR and between c4cb6e0 and 8efa67f.

📒 Files selected for processing (1)

include/flashinfer/utils.cuh

📝 Walkthrough

Walkthrough

The DISPATCH_GQA_GROUP_SIZE macro in include/flashinfer/utils.cuh was extended to support additional group size values (5, 6, and 7). Previously, these values would trigger an error; they now execute the provided macro arguments with the corresponding group size defined.

Changes

Cohort / File(s)	Summary
Macro Extension `include/flashinfer/utils.cuh`	Added conditional branches in `DISPATCH_GQA_GROUP_SIZE` macro to support `group_size` values of 5, 6, and 7, each defining `GROUP_SIZE` and executing `__VA_ARGS__` instead of falling through to error handling.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Poem

A rabbit hops with glee today, 🐰
Group sizes five through seven at play,
No more errors blocking the way,
The macro now dispatches all day,
More dispatch paths in flashy array! ✨

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately and concisely describes the main change: extending DISPATCH_GQA_GROUP_SIZE to support group sizes 5, 6, and 7.
Description check	✅ Passed	The description is comprehensive and complete, covering motivation, affected models, implementation approach, test verification, and AI assistance acknowledgment. However, it does not follow the provided PR template structure with required sections like 'Related Issues' and the pre-commit/testing checklist.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request expands the DISPATCH_GQA_GROUP_SIZE macro in include/flashinfer/utils.cuh to include support for group sizes 5, 6, and 7. The review feedback suggests refactoring the macro's if-else if chain into a switch statement to improve evaluation efficiency and maintain consistency with other dispatch macros in the project.

gemini-code-assist · 2026-04-05T14:44:09Z

include/flashinfer/utils.cuh

+  } else if (group_size == 5) {                              \
+    constexpr size_t GROUP_SIZE = 5;                         \
+    __VA_ARGS__                                              \
+  } else if (group_size == 6) {                              \
+    constexpr size_t GROUP_SIZE = 6;                         \
+    __VA_ARGS__                                              \
+  } else if (group_size == 7) {                              \
+    constexpr size_t GROUP_SIZE = 7;                         \
+    __VA_ARGS__                                              \


With the addition of more group sizes, the if-else if chain in DISPATCH_GQA_GROUP_SIZE is becoming increasingly long. Consider refactoring the macro to use a switch statement. This would ensure the group_size expression is evaluated only once and would improve consistency with other dispatch macros in this file (such as DISPATCH_CTA_TILE_Q and DISPATCH_HEAD_DIM) that already use switch for exact value matching.

yzh119 · 2026-04-05T17:57:53Z

Hi @arbi-dev can you see my comments in #2684 (review)

arbi-dev requested review from aleozlx, bkryu, cyx-6, jimmyzho, kahyunnam, nv-yunzheq, saltyminty, samuellees, sricketts, yongwww, yyihuang and yzh119 as code owners April 5, 2026 14:41

gemini-code-assist bot reviewed Apr 5, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add GQA group_size 5, 6, 7 to DISPATCH_GQA_GROUP_SIZE#2986

Add GQA group_size 5, 6, 7 to DISPATCH_GQA_GROUP_SIZE#2986
arbi-dev wants to merge 1 commit intoflashinfer-ai:mainfrom
arbi-dev:fix-gqa-group-size-dispatch

arbi-dev commented Apr 5, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Apr 5, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Apr 5, 2026

Uh oh!

yzh119 commented Apr 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

arbi-dev commented Apr 5, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why this hasn't been reported widely

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Apr 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

yzh119 commented Apr 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

arbi-dev commented Apr 5, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 5, 2026 •

edited

Loading