Dispatch GQA group size 5, 6, 7 by ahadnagy · Pull Request #2684 · flashinfer-ai/flashinfer

ahadnagy · 2026-03-04T09:53:19Z

📌 Description

Models like Qwen3.5-27B use 24 Q heads with 4 KV heads, producing a
GQA group_size of 6. The existing macro only handled {1, 2, 3, 4, 8},
causing "Unsupported group_size: 6" errors at batch decode plan time.

Add the missing group sizes 5, 6, and 7 so that non-power-of-2 GQA
ratios are dispatched correctly.

🔍 Related Issues

🚀 Pull Request Checklist

Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.

✅ Pre-commit Checks

I have installed pre-commit by running pip install pre-commit (or used your preferred method).
I have installed the hooks with pre-commit install.
I have run the hooks manually with pre-commit run --all-files and fixed any reported issues.

If you are unsure about how to set up pre-commit, see the pre-commit documentation.

🧪 Tests

Tests have been added or updated as needed.
All tests are passing (unittest, etc.).

Reviewer Notes

Summary by CodeRabbit

New Features
- Extended inference engine support for additional grouped query attention configurations, enabling broader compatibility with diverse model parameters and enabling more flexible inference operations across different model architectures.

gemini-code-assist · 2026-03-04T09:53:31Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the FlashInfer library by expanding the range of supported GQA group sizes. This ensures compatibility with a broader spectrum of large language models that utilize non-power-of-2 GQA ratios, preventing runtime errors and improving the library's versatility and robustness.

Highlights

Expanded GQA Group Size Support: The DISPATCH_GQA_OPTIONS macro has been updated to include support for GQA group sizes 5, 6, and 7. This change addresses "Unsupported group_size" errors encountered by models such as Qwen3.5-27B, which use non-power-of-2 GQA ratios.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

include/flashinfer/utils.cuh
- Added if-else if branches for GQA group sizes 5, 6, and 7 within the DISPATCH_GQA_OPTIONS macro.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

coderabbitai · 2026-03-04T09:53:42Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 068094fb-c5ca-42d7-a39b-a64312e9e089

📥 Commits

Reviewing files that changed from the base of the PR and between 635505f and 1453210.

📒 Files selected for processing (1)

include/flashinfer/utils.cuh

📝 Walkthrough

Walkthrough

The DISPATCH_GQA_GROUP_SIZE macro in include/flashinfer/utils.cuh has been extended to support additional group_size values (5, 6, and 7), while maintaining existing support for group_size 1–4 and 8.

Changes

Cohort / File(s)	Summary
Macro Extension `include/flashinfer/utils.cuh`	Extended `DISPATCH_GQA_GROUP_SIZE` macro to handle group_size values 5, 6, and 7 by defining corresponding constexpr branches and invoking variadic arguments for each new case.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

Poem

🐰 A macro that grows with each new size,
Five, six, seven—what a surprise!
Group sizes bloom like clover on the lawn,
FlashInfer hops into a broader dawn. ✨

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'Dispatch GQA group size 5, 6, 7' directly describes the main change—extending the macro to support three new group size values.
Description check	✅ Passed	The description includes the required Description section explaining the problem and solution clearly. The Related Issues and Checklist sections are present but mostly unfilled, which is acceptable for non-critical sections.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request adds support for GQA group sizes 5, 6, and 7 to enable models like Qwen3.5-27B. The change in include/flashinfer/utils.cuh is correct. I've added two comments: one suggesting a refactoring for better maintainability, and another pointing out missing test cases for the new group sizes to ensure correctness and robustness.

gemini-code-assist · 2026-03-04T09:55:32Z

include/flashinfer/utils.cuh

+  } else if (group_size == 5) {                              \
+    constexpr size_t GROUP_SIZE = 5;                         \
+    __VA_ARGS__                                              \
+  } else if (group_size == 6) {                              \
+    constexpr size_t GROUP_SIZE = 6;                         \
+    __VA_ARGS__                                              \
+  } else if (group_size == 7) {                              \
+    constexpr size_t GROUP_SIZE = 7;                         \
+    __VA_ARGS__                                              \


The added group sizes are a good extension. However, to ensure correctness, it's crucial to have corresponding test cases. I noticed in tests/attention/test_batch_attention.py that tests for gqa_group_size 5 and 6 are missing. Please add them to ensure full coverage of the new functionality.

gemini-code-assist · 2026-03-04T09:55:32Z

include/flashinfer/utils.cuh

+  } else if (group_size == 5) {                              \
+    constexpr size_t GROUP_SIZE = 5;                         \
+    __VA_ARGS__                                              \
+  } else if (group_size == 6) {                              \
+    constexpr size_t GROUP_SIZE = 6;                         \
+    __VA_ARGS__                                              \
+  } else if (group_size == 7) {                              \
+    constexpr size_t GROUP_SIZE = 7;                         \
+    __VA_ARGS__                                              \


While adding more else if conditions works, this chain is getting long. For better readability and maintainability, consider refactoring the DISPATCH_GQA_GROUP_SIZE macro to use a switch statement, similar to how DISPATCH_HEAD_DIM is implemented in this file.

yzh119

GQA case should all go through tensor cores implementation, you can enable them by use_tensor_cores=True in attention APIs.

CUDA cores-based implementation (which your PR would affect) have bad performance on GQA.

ahadnagy · 2026-03-05T11:12:58Z

Thanks for the insight!

dispatch: add GQA group sizes 5, 6, 7 to DISPATCH_GQA_GROUP_SIZE

1453210

ahadnagy requested review from IwakuraRein, jiahanc, kahyunnam, nv-yunzheq and yzh119 as code owners March 4, 2026 09:53

gemini-code-assist bot reviewed Mar 4, 2026

View reviewed changes

yzh119 requested changes Mar 4, 2026

View reviewed changes

yzh119 closed this Mar 5, 2026

yzh119 mentioned this pull request Mar 27, 2026

Allow BatchDecodeWithPagedKVCacheWrapper for GQA ratio 16 and 32 #2895

Open

5 tasks

yzh119 mentioned this pull request Apr 5, 2026

Add GQA group_size 5, 6, 7 to DISPATCH_GQA_GROUP_SIZE #2986

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dispatch GQA group size 5, 6, 7#2684

Dispatch GQA group size 5, 6, 7#2684
ahadnagy wants to merge 1 commit intoflashinfer-ai:mainfrom
ahadnagy:dispatch-gqa-group-size-5-6-7

ahadnagy commented Mar 4, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

gemini-code-assist bot commented Mar 4, 2026

Uh oh!

coderabbitai bot commented Mar 4, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Mar 4, 2026

Uh oh!

gemini-code-assist bot Mar 4, 2026

Uh oh!

yzh119 left a comment

Uh oh!

ahadnagy commented Mar 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ahadnagy commented Mar 4, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📌 Description

🔍 Related Issues

🚀 Pull Request Checklist

✅ Pre-commit Checks

🧪 Tests

Reviewer Notes

Summary by CodeRabbit

Uh oh!

gemini-code-assist bot commented Mar 4, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

coderabbitai bot commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

yzh119 left a comment

Choose a reason for hiding this comment

Uh oh!

ahadnagy commented Mar 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ahadnagy commented Mar 4, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 4, 2026 •

edited

Loading