Skip to content

Added GQA support for CUDNNAttention backend#3984

Open
MaciejBalaNV wants to merge 1 commit into
vllm-project:mainfrom
MaciejBalaNV:gqa_for_cudnn_attn
Open

Added GQA support for CUDNNAttention backend#3984
MaciejBalaNV wants to merge 1 commit into
vllm-project:mainfrom
MaciejBalaNV:gqa_for_cudnn_attn

Conversation

@MaciejBalaNV
Copy link
Copy Markdown
Contributor

@MaciejBalaNV MaciejBalaNV commented May 29, 2026

This PR adds GQA support for CUDNNAttentionBackend, similar to how SDPA was handled here: #3728

Purpose

Enabling models to use CUDNNAttentionBackend with GQA. One example is Bagel, other is Cosmos3 that is being added here: #3454

Test Plan

Since CUDNNAttentionBackend or GQA for SDPA didn't contain any tests, not extending it here, since it's out of scope of the PR and the change is very simple.

Test Result

No relevant test results

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
  • The test results. Please paste the results comparison before and after, or the e2e results.
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
  • (Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Signed-off-by: Maciej Bala <mbala@nvidia.com>
@MaciejBalaNV
Copy link
Copy Markdown
Contributor Author

CC @alex-jw-brooks

@MaciejBalaNV MaciejBalaNV mentioned this pull request May 29, 2026
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant