Skip to content

Dispatch GQA group size 5, 6, 7#2684

Closed
ahadnagy wants to merge 1 commit intoflashinfer-ai:mainfrom
ahadnagy:dispatch-gqa-group-size-5-6-7
Closed

Dispatch GQA group size 5, 6, 7#2684
ahadnagy wants to merge 1 commit intoflashinfer-ai:mainfrom
ahadnagy:dispatch-gqa-group-size-5-6-7

Conversation

@ahadnagy
Copy link
Copy Markdown

@ahadnagy ahadnagy commented Mar 4, 2026

📌 Description

Models like Qwen3.5-27B use 24 Q heads with 4 KV heads, producing a
GQA group_size of 6. The existing macro only handled {1, 2, 3, 4, 8},
causing "Unsupported group_size: 6" errors at batch decode plan time.

Add the missing group sizes 5, 6, and 7 so that non-power-of-2 GQA
ratios are dispatched correctly.

🔍 Related Issues

🚀 Pull Request Checklist

Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.

✅ Pre-commit Checks

  • I have installed pre-commit by running pip install pre-commit (or used your preferred method).
  • I have installed the hooks with pre-commit install.
  • I have run the hooks manually with pre-commit run --all-files and fixed any reported issues.

If you are unsure about how to set up pre-commit, see the pre-commit documentation.

🧪 Tests

  • Tests have been added or updated as needed.
  • All tests are passing (unittest, etc.).

Reviewer Notes

Summary by CodeRabbit

  • New Features
    • Extended inference engine support for additional grouped query attention configurations, enabling broader compatibility with diverse model parameters and enabling more flexible inference operations across different model architectures.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the FlashInfer library by expanding the range of supported GQA group sizes. This ensures compatibility with a broader spectrum of large language models that utilize non-power-of-2 GQA ratios, preventing runtime errors and improving the library's versatility and robustness.

Highlights

  • Expanded GQA Group Size Support: The DISPATCH_GQA_OPTIONS macro has been updated to include support for GQA group sizes 5, 6, and 7. This change addresses "Unsupported group_size" errors encountered by models such as Qwen3.5-27B, which use non-power-of-2 GQA ratios.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • include/flashinfer/utils.cuh
    • Added if-else if branches for GQA group sizes 5, 6, and 7 within the DISPATCH_GQA_OPTIONS macro.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 4, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 068094fb-c5ca-42d7-a39b-a64312e9e089

📥 Commits

Reviewing files that changed from the base of the PR and between 635505f and 1453210.

📒 Files selected for processing (1)
  • include/flashinfer/utils.cuh

📝 Walkthrough

Walkthrough

The DISPATCH_GQA_GROUP_SIZE macro in include/flashinfer/utils.cuh has been extended to support additional group_size values (5, 6, and 7), while maintaining existing support for group_size 1–4 and 8.

Changes

Cohort / File(s) Summary
Macro Extension
include/flashinfer/utils.cuh
Extended DISPATCH_GQA_GROUP_SIZE macro to handle group_size values 5, 6, and 7 by defining corresponding constexpr branches and invoking variadic arguments for each new case.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

Poem

🐰 A macro that grows with each new size,
Five, six, seven—what a surprise!
Group sizes bloom like clover on the lawn,
FlashInfer hops into a broader dawn. ✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Dispatch GQA group size 5, 6, 7' directly describes the main change—extending the macro to support three new group size values.
Description check ✅ Passed The description includes the required Description section explaining the problem and solution clearly. The Related Issues and Checklist sections are present but mostly unfilled, which is acceptable for non-critical sections.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for GQA group sizes 5, 6, and 7 to enable models like Qwen3.5-27B. The change in include/flashinfer/utils.cuh is correct. I've added two comments: one suggesting a refactoring for better maintainability, and another pointing out missing test cases for the new group sizes to ensure correctness and robustness.

Comment on lines +137 to +145
} else if (group_size == 5) { \
constexpr size_t GROUP_SIZE = 5; \
__VA_ARGS__ \
} else if (group_size == 6) { \
constexpr size_t GROUP_SIZE = 6; \
__VA_ARGS__ \
} else if (group_size == 7) { \
constexpr size_t GROUP_SIZE = 7; \
__VA_ARGS__ \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The added group sizes are a good extension. However, to ensure correctness, it's crucial to have corresponding test cases. I noticed in tests/attention/test_batch_attention.py that tests for gqa_group_size 5 and 6 are missing. Please add them to ensure full coverage of the new functionality.

Comment on lines +137 to +145
} else if (group_size == 5) { \
constexpr size_t GROUP_SIZE = 5; \
__VA_ARGS__ \
} else if (group_size == 6) { \
constexpr size_t GROUP_SIZE = 6; \
__VA_ARGS__ \
} else if (group_size == 7) { \
constexpr size_t GROUP_SIZE = 7; \
__VA_ARGS__ \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

While adding more else if conditions works, this chain is getting long. For better readability and maintainability, consider refactoring the DISPATCH_GQA_GROUP_SIZE macro to use a switch statement, similar to how DISPATCH_HEAD_DIM is implemented in this file.

Copy link
Copy Markdown
Collaborator

@yzh119 yzh119 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GQA case should all go through tensor cores implementation, you can enable them by use_tensor_cores=True in attention APIs.

CUDA cores-based implementation (which your PR would affect) have bad performance on GQA.

@yzh119 yzh119 closed this Mar 5, 2026
@ahadnagy
Copy link
Copy Markdown
Author

ahadnagy commented Mar 5, 2026

Thanks for the insight!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants