Skip to content

Add GQA group_size 5, 6, 7 to DISPATCH_GQA_GROUP_SIZE#2986

Open
arbi-dev wants to merge 1 commit intoflashinfer-ai:mainfrom
arbi-dev:fix-gqa-group-size-dispatch
Open

Add GQA group_size 5, 6, 7 to DISPATCH_GQA_GROUP_SIZE#2986
arbi-dev wants to merge 1 commit intoflashinfer-ai:mainfrom
arbi-dev:fix-gqa-group-size-dispatch

Conversation

@arbi-dev
Copy link
Copy Markdown

@arbi-dev arbi-dev commented Apr 5, 2026

Summary

DISPATCH_GQA_GROUP_SIZE only handles group sizes 1, 2, 3, 4, 8. Any other value hits a runtime error:

RuntimeError: Unsupported group_size: 6

This breaks several popular models with non-power-of-2 GQA ratios:

Model Q heads KV heads group_size
Qwen3.5-27B 24 4 6
InternLM2.5-20B 48 8 6
Qwen2.5-7B 28 4 7
Yi-1.5-34B 56 8 7

This PR adds explicit constexpr cases for group sizes 5, 6, and 7, so all sizes 1-8 are supported. Each adds one template instantiation per call site, matching the existing dispatch pattern.

Why this hasn't been reported widely

Most users access FlashInfer through vLLM or SGLang, which use BatchDecodeWithPagedKVCacheWrapper. That wrapper handles GQA at the Python level and doesn't go through DISPATCH_GQA_GROUP_SIZE. The error only manifests when calling the lower-level C++ kernel dispatch directly (e.g., custom attention backends or quantized KV cache implementations that bypass the Python wrapper).

Test plan

  • Verified Qwen3.5-27B (group_size=6) runs correctly after fix
  • CI: existing unit tests should pass (no behavior change for sizes 1-4, 8)

AI-assisted: Claude Opus 4.6 assisted with code generation and model survey.

Summary by CodeRabbit

  • Bug Fixes
    • Extended support for additional group sizes (5, 6, and 7) in grouped query attention operations that were previously unsupported.

The macro only dispatched group sizes 1, 2, 3, 4, 8 — any other value
hit a runtime error ("Unsupported group_size"). This breaks several
popular models with non-power-of-2 GQA ratios:

  - group_size 6: Qwen3.5-27B (24Q/4KV), InternLM2.5-20B (48Q/8KV)
  - group_size 7: Qwen2.5-7B (28Q/4KV), Yi-1.5-34B (56Q/8KV)

Add explicit constexpr cases for 5, 6, and 7 so all group sizes 1-8
are supported. Each adds one template instantiation per call site.

The error manifests as:
  RuntimeError: Unsupported group_size: 6
when calling BatchDecodeWithPagedKVCache or similar kernel dispatch
paths that go through DISPATCH_GQA_GROUP_SIZE.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 5, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 2655f013-9b19-49d3-bd33-465d380b65f8

📥 Commits

Reviewing files that changed from the base of the PR and between c4cb6e0 and 8efa67f.

📒 Files selected for processing (1)
  • include/flashinfer/utils.cuh

📝 Walkthrough

Walkthrough

The DISPATCH_GQA_GROUP_SIZE macro in include/flashinfer/utils.cuh was extended to support additional group size values (5, 6, and 7). Previously, these values would trigger an error; they now execute the provided macro arguments with the corresponding group size defined.

Changes

Cohort / File(s) Summary
Macro Extension
include/flashinfer/utils.cuh
Added conditional branches in DISPATCH_GQA_GROUP_SIZE macro to support group_size values of 5, 6, and 7, each defining GROUP_SIZE and executing __VA_ARGS__ instead of falling through to error handling.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Poem

A rabbit hops with glee today, 🐰
Group sizes five through seven at play,
No more errors blocking the way,
The macro now dispatches all day,
More dispatch paths in flashy array! ✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately and concisely describes the main change: extending DISPATCH_GQA_GROUP_SIZE to support group sizes 5, 6, and 7.
Description check ✅ Passed The description is comprehensive and complete, covering motivation, affected models, implementation approach, test verification, and AI assistance acknowledgment. However, it does not follow the provided PR template structure with required sections like 'Related Issues' and the pre-commit/testing checklist.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request expands the DISPATCH_GQA_GROUP_SIZE macro in include/flashinfer/utils.cuh to include support for group sizes 5, 6, and 7. The review feedback suggests refactoring the macro's if-else if chain into a switch statement to improve evaluation efficiency and maintain consistency with other dispatch macros in the project.

Comment on lines +150 to +158
} else if (group_size == 5) { \
constexpr size_t GROUP_SIZE = 5; \
__VA_ARGS__ \
} else if (group_size == 6) { \
constexpr size_t GROUP_SIZE = 6; \
__VA_ARGS__ \
} else if (group_size == 7) { \
constexpr size_t GROUP_SIZE = 7; \
__VA_ARGS__ \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

With the addition of more group sizes, the if-else if chain in DISPATCH_GQA_GROUP_SIZE is becoming increasingly long. Consider refactoring the macro to use a switch statement. This would ensure the group_size expression is evaluated only once and would improve consistency with other dispatch macros in this file (such as DISPATCH_CTA_TILE_Q and DISPATCH_HEAD_DIM) that already use switch for exact value matching.

@yzh119
Copy link
Copy Markdown
Collaborator

yzh119 commented Apr 5, 2026

Hi @arbi-dev can you see my comments in #2684 (review)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants