Skip to content

[Attention] FA2 support more head sizes, ViT support, make default backend#28763

Merged
vllm-bot merged 11 commits intovllm-project:mainfrom
MatthewBonanni:fa_uneven_k
Nov 19, 2025
Merged

[Attention] FA2 support more head sizes, ViT support, make default backend#28763
vllm-bot merged 11 commits intovllm-project:mainfrom
MatthewBonanni:fa_uneven_k

Conversation

@MatthewBonanni
Copy link
Copy Markdown
Collaborator

@MatthewBonanni MatthewBonanni commented Nov 15, 2025

Purpose

This PR is paired with vllm-project/flash-attention#109 (merge that first after CI passes, then I'll update the git tag), which enables FA2 to support the head sizes required for vision transformers (40, 72, and 80) (FA3 supports these by default). This PR also updates the selector to make FlashAttention the default backend over xFormers.

Test Plan

pytest tests/kernels/attention/test_flash_attn.py (updated with new head sizes)

Test Result

Passes


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates FlashAttention to support head sizes required for Vision Transformers (40, 72, 80). This is achieved by updating the dependency to a fork of flash-attention, generalizing the head size check in the FlashAttention backend, and updating tests. The logic for selecting the ViT attention backend is also refactored for clarity. My review has identified two main points. First, a critical issue in cmake/external_projects/vllm_flash_attn.cmake where the dependency points to a personal fork, which must be reverted before merging. Second, a high-severity issue in tests/kernels/attention/test_flash_attn.py where a test case for soft_cap has been removed, potentially hiding a feature regression. The other changes look good.

@mgoin mgoin added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 15, 2025
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
@MatthewBonanni MatthewBonanni changed the title [DO NOT MERGE][Attention] FlashAttention ViT support [DO NOT MERGE][Attention] FlashAttention ViT support, make default backend Nov 17, 2025
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
@MatthewBonanni MatthewBonanni changed the title [DO NOT MERGE][Attention] FlashAttention ViT support, make default backend [Attention] FlashAttention ViT support, make default backend Nov 17, 2025
@mgoin mgoin mentioned this pull request Nov 17, 2025
1 task
@LucasWilkinson
Copy link
Copy Markdown
Collaborator

Do you know if FA2 is supported too? do you mine testing this on Ampere? I think it should be ok

@MatthewBonanni
Copy link
Copy Markdown
Collaborator Author

MatthewBonanni commented Nov 19, 2025

@LucasWilkinson pytest tests/kernels/attention/test_flash_attn.py passes on A100 with FA2 👍

@github-project-automation github-project-automation bot moved this to In review in NVIDIA Nov 19, 2025
@vllm-bot vllm-bot merged commit 4c23690 into vllm-project:main Nov 19, 2025
84 of 88 checks passed
@github-project-automation github-project-automation bot moved this from In review to Done in NVIDIA Nov 19, 2025
@youkaichao youkaichao changed the title [Attention] FlashAttention ViT support, make default backend [Attention] FA2&FA3 support more head sizes, ViT support, make default backend Nov 19, 2025
@MatthewBonanni MatthewBonanni deleted the fa_uneven_k branch November 19, 2025 15:11
@ywang96 ywang96 mentioned this pull request Nov 24, 2025
5 tasks
@Victor49152 Victor49152 mentioned this pull request Nov 26, 2025
5 tasks
devpatelio pushed a commit to SumanthRH/vllm that referenced this pull request Nov 29, 2025
kitaekatt pushed a commit to kitaekatt/vllm that referenced this pull request Dec 1, 2025
@MoyanZitto
Copy link
Copy Markdown
Contributor

I didn't find any head size checks in https://github.com/vllm-project/flash-attention/blob/main/hopper/flash_api.cpp, so I guess this MR probably just extends the supported head_size list for FA2. In other words, FA3 already supports any head_size < 256 by default. @MatthewBonanni

@MatthewBonanni MatthewBonanni changed the title [Attention] FA2&FA3 support more head sizes, ViT support, make default backend [Attention] FA2 support more head sizes, ViT support, make default backend Jan 13, 2026
@MatthewBonanni
Copy link
Copy Markdown
Collaborator Author

@MoyanZitto yes, good point. I've updated the title and description

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build nvidia ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

5 participants