Skip to content

[FA4] Update flash-attention to latest upstream FA4#38690

Merged
LucasWilkinson merged 2 commits intovllm-project:mainfrom
neuralmagic:lwilkinson/update-fa4
Apr 2, 2026
Merged

[FA4] Update flash-attention to latest upstream FA4#38690
LucasWilkinson merged 2 commits intovllm-project:mainfrom
neuralmagic:lwilkinson/update-fa4

Conversation

@LucasWilkinson
Copy link
Copy Markdown
Collaborator

Testing PR for updating FA4 to latest upstream

Point vllm_flash_attn.cmake to updated FA branch (95e93d2) which
syncs flash_attn/cute/ with upstream Dao-AILab/flash-attention.
Bump nvidia-cutlass-dsl>=4.4.2 and quack-kernels>=0.3.3 to match
upstream FA4 requirements.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the vllm-flash-attn Git tag to a newer commit and bumps the minimum versions for nvidia-cutlass-dsl and quack-kernels in the CUDA requirements file. I have no feedback to provide.

@LucasWilkinson LucasWilkinson added the ready ONLY add when PR is ready to merge/full CI is needed label Apr 1, 2026
@MatthewBonanni
Copy link
Copy Markdown
Collaborator

This will fix #36763 thanks to the inclusion of Dao-AILab/flash-attention@0293155

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Copy link
Copy Markdown
Collaborator

@MatthewBonanni MatthewBonanni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-project-automation github-project-automation bot moved this to Ready in NVIDIA Apr 2, 2026
@LucasWilkinson LucasWilkinson changed the title [WIP][Do not merge yet] Update flash-attention to latest upstream FA4 [FA4] Update flash-attention to latest upstream FA4 Apr 2, 2026
@LucasWilkinson LucasWilkinson enabled auto-merge (squash) April 2, 2026 14:37
@LucasWilkinson LucasWilkinson merged commit cb3935a into vllm-project:main Apr 2, 2026
139 of 140 checks passed
@github-project-automation github-project-automation bot moved this from Ready to Done in NVIDIA Apr 2, 2026
mieshkiwrk pushed a commit to mieshkiwrk/vllm that referenced this pull request Apr 2, 2026
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Mieszko Dziadowiec <mdziadowiec@habana.ai>
yzong-rh pushed a commit to yzong-rh/vllm that referenced this pull request Apr 3, 2026
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build nvidia ready ONLY add when PR is ready to merge/full CI is needed

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants