[FA4] Update flash-attention to latest upstream FA4 by LucasWilkinson · Pull Request #38690 · vllm-project/vllm

LucasWilkinson · 2026-04-01T05:19:54Z

Testing PR for updating FA4 to latest upstream

Point vllm_flash_attn.cmake to updated FA branch (95e93d2) which syncs flash_attn/cute/ with upstream Dao-AILab/flash-attention. Bump nvidia-cutlass-dsl>=4.4.2 and quack-kernels>=0.3.3 to match upstream FA4 requirements. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

gemini-code-assist

Code Review

This pull request updates the vllm-flash-attn Git tag to a newer commit and bumps the minimum versions for nvidia-cutlass-dsl and quack-kernels in the CUDA requirements file. I have no feedback to provide.

MatthewBonanni · 2026-04-01T16:10:12Z

This will fix #36763 thanks to the inclusion of Dao-AILab/flash-attention@0293155

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

MatthewBonanni

LGTM

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Mieszko Dziadowiec <mdziadowiec@habana.ai>

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

mergify bot added ci/build nvidia labels Apr 1, 2026

github-project-automation bot added this to NVIDIA Apr 1, 2026

gemini-code-assist bot reviewed Apr 1, 2026

View reviewed changes

LucasWilkinson added the ready ONLY add when PR is ready to merge/full CI is needed label Apr 1, 2026

This was referenced Apr 1, 2026

[Bug]: Kimi-K2.5 outputs only '!!!!!!!!!!' in reasoning field, content is always null #36763

Closed

Update FA4 vllm-project/flash-attention#128

Merged

update to point to main

874d744

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

MatthewBonanni approved these changes Apr 2, 2026

View reviewed changes

github-project-automation bot moved this to Ready in NVIDIA Apr 2, 2026

LucasWilkinson changed the title ~~[WIP][Do not merge yet] Update flash-attention to latest upstream FA4~~ [FA4] Update flash-attention to latest upstream FA4 Apr 2, 2026

LucasWilkinson enabled auto-merge (squash) April 2, 2026 14:37

MatthewBonanni mentioned this pull request Apr 2, 2026

[Attention][MLA] Re-enable FA4 as default MLA prefill backend #38819

Open

LucasWilkinson merged commit cb3935a into vllm-project:main Apr 2, 2026
139 of 140 checks passed

github-project-automation bot moved this from Ready to Done in NVIDIA Apr 2, 2026

mieshkiwrk pushed a commit to mieshkiwrk/vllm that referenced this pull request Apr 2, 2026

[FA4] Update flash-attention to latest upstream FA4 (vllm-project#38690)

8e012d1

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Mieszko Dziadowiec <mdziadowiec@habana.ai>

yzong-rh pushed a commit to yzong-rh/vllm that referenced this pull request Apr 3, 2026

[FA4] Update flash-attention to latest upstream FA4 (vllm-project#38690)

5761876

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FA4] Update flash-attention to latest upstream FA4#38690

[FA4] Update flash-attention to latest upstream FA4#38690
LucasWilkinson merged 2 commits intovllm-project:mainfrom
neuralmagic:lwilkinson/update-fa4

LucasWilkinson commented Apr 1, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

MatthewBonanni commented Apr 1, 2026

Uh oh!

MatthewBonanni left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

LucasWilkinson commented Apr 1, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

MatthewBonanni commented Apr 1, 2026

Uh oh!

MatthewBonanni left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants