[Attention][FA3] Update FA3 to include new swizzle optimization by LucasWilkinson · Pull Request #23465 · vllm-project/vllm

LucasWilkinson · 2025-08-23T18:30:01Z

vLLM side of vllm-project/flash-attention#82

meta-llama/Meta-Llama-3-8B-Instruct, 1xH100, 4k and 2k out

branch   rate     num_prompts  req/s    median TTFT (ms)   std TTFT     p99 TTFT     median TPOT (ms)   std TPOT     p99 TPOT    
------   ----     -----------  -----    ----------------   --------     --------     ----------------   --------     --------    
MAIN     1.00     120          0.88     141.95             29.83        287.21       11.07              0.84         11.67       
PR       1.00     120          0.89     137.69             28.51        277.24       10.81              0.84         11.40       
MAIN     2.00     240          1.53     177.97             54.18        415.02       25.73              4.63         30.20       
PR       2.00     240          1.54     176.99             52.59        403.90       25.72              4.60         30.08       
MAIN     3.00     360          1.65     361.44             18180.54     43681.80     42.99              8.20         52.66       
PR       3.00     360          1.66     345.31             17685.66     42551.03     42.69              8.00         52.07       
MAIN     4.00     480          1.66     54165.70           45848.99     119451.76    49.99              9.37         69.16       
PR       4.00     480          1.67     52780.84           45014.34     117440.91    49.40              9.16         68.40

gemini-code-assist

Code Review

This pull request updates the flash-attention dependency to a new commit hash, likely to incorporate the 'FA3 swizzle optimization' mentioned in the PR title. While pinning to a commit is good for reproducibility, using a commit hash that is not part of a tag or a long-lived branch can pose a risk for future builds and maintenance. I've added a comment suggesting the use of a git tag for better long-term stability.

gemini-code-assist · 2025-08-23T18:31:56Z

cmake/external_projects/vllm_flash_attn.cmake

Pinning dependencies to a specific commit hash is good for reproducibility. However, for long-term maintenance and release builds, it's better to use a git tag. Commit hashes that are not part of a branch or tag can be garbage collected by git, or become hard to track. Since this is a work-in-progress, it's acceptable for now, but it would be best to create a tag in the vllm-project/flash-attention repository for this commit before this PR is merged.

github-actions · 2025-11-26T02:10:46Z

This pull request has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this pull request should remain open. Thank you!

mergify · 2025-12-17T04:44:57Z

Hi @LucasWilkinson, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?

mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

mergify · 2025-12-17T06:31:20Z

Hi @LucasWilkinson, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?

mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

mergify · 2026-01-08T07:26:20Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @LucasWilkinson.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

mergify · 2026-01-16T05:21:03Z

Documentation preview: https://vllm--23465.org.readthedocs.build/en/23465/

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

The scheduler_metadata buffer was sized using max_num_seqs, but during cuda graph capture the batch size can be up to max_cudagraph_size which may be larger. This caused RuntimeError when the scheduler returned more elements than the buffer could hold. Example: with max_num_seqs=1, buffer was 1*4+1=5, but max_cudagraph_size=2 led to scheduler returning 9 elements (2*4+1=9). Fixes CI failures in test_simple_generation[FLASH_ATTN] and similar tests. Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

…-project#23465) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Pai <416932041@qq.com>

…on (#23465)" This reverts commit 2267cb1.

…-project#23465) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: felix01.yu <felix01.yu@vipshop.com>

…-project#23465) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

mergify bot added the ci/build label Aug 23, 2025

LucasWilkinson added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 23, 2025

gemini-code-assist bot reviewed Aug 23, 2025

View reviewed changes

LucasWilkinson force-pushed the lwilkinson/fa3-swizzle branch from ed1629a to 7a4e376 Compare August 25, 2025 16:13

LucasWilkinson requested review from WoosukKwon, alexm-redhat, comaniac, njhill, robertgshaw2-redhat and ywang96 as code owners August 26, 2025 02:20

mergify bot added the v1 label Aug 26, 2025

github-actions bot added the stale Over 90 days of inactivity label Nov 26, 2025

LucasWilkinson force-pushed the lwilkinson/fa3-swizzle branch from a75c6e0 to 916840d Compare December 17, 2025 04:40

github-actions bot added unstale Recieved activity after being labelled stale and removed stale Over 90 days of inactivity labels Dec 18, 2025

LucasWilkinson force-pushed the lwilkinson/fa3-swizzle branch from 16aaf77 to 74296b9 Compare December 28, 2025 17:35

LucasWilkinson force-pushed the lwilkinson/fa3-swizzle branch from 74296b9 to 9d203b4 Compare January 8, 2026 07:09

mergify bot added the needs-rebase label Jan 8, 2026

LucasWilkinson force-pushed the lwilkinson/fa3-swizzle branch from 8b8419d to a2aba4a Compare January 15, 2026 15:42

mergify bot removed the needs-rebase label Jan 15, 2026

LucasWilkinson requested a review from pavanimajety as a code owner January 16, 2026 00:38

LucasWilkinson force-pushed the lwilkinson/fa3-swizzle branch from dfed34c to 0d884bb Compare January 16, 2026 05:20

mergify bot added the documentation Improvements or additions to documentation label Jan 16, 2026

LucasWilkinson force-pushed the lwilkinson/fa3-swizzle branch 2 times, most recently from d83aa79 to 74688d5 Compare January 16, 2026 05:24

LucasWilkinson mentioned this pull request Jan 17, 2026

FA3 variable length attention sort/swizzle vllm-project/flash-attention#82

Merged

LucasWilkinson force-pushed the lwilkinson/fa3-swizzle branch from a7546ef to 34de5bb Compare January 17, 2026 04:17

LucasWilkinson changed the title ~~[WIP][Attention][FA3] Update FA3 to include new swizzle optimization~~ [Attention][FA3] Update FA3 to include new swizzle optimization Jan 17, 2026

LucasWilkinson added 9 commits February 3, 2026 06:26

update FA

02ca67e

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

cuda-graph fix

fca0e1c

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

update

9adfeac

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

fix

a82d2b4

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

remove FA limit

a57b7e4

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

fix FA

146daf8

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

clean

19ab5ba

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

LucasWilkinson force-pushed the lwilkinson/fa3-swizzle branch from b5edaac to 19ab5ba Compare February 3, 2026 06:27

consistency

3459c9f

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

mgoin approved these changes Feb 3, 2026

View reviewed changes

vllm-bot merged commit 2267cb1 into vllm-project:main Feb 3, 2026
93 of 96 checks passed

PiratePai pushed a commit to PiratePai/epd_shm that referenced this pull request Feb 3, 2026

[Attention][FA3] Update FA3 to include new swizzle optimization (vllm…

82dbfdc

…-project#23465) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Pai <416932041@qq.com>

ProExpertProg mentioned this pull request Feb 4, 2026

[CI Failure]: Distributed 2xH100 tests #33802

Closed

3 tasks

ProExpertProg added a commit that referenced this pull request Feb 4, 2026

Revert "[Attention][FA3] Update FA3 to include new swizzle optimizati…

1ad8eac

…on (#23465)" This reverts commit 2267cb1.

ProExpertProg mentioned this pull request Feb 4, 2026

Revert "[Attention][FA3] Update FA3 to include new swizzle optimization" #33841

Merged

LucasWilkinson mentioned this pull request Feb 7, 2026

Reapply [Attention][FA3] Update FA3 to include new swizzle optimization #34043

Merged

ItzDEXX pushed a commit to ItzDEXX/vllm that referenced this pull request Feb 19, 2026

[Attention][FA3] Update FA3 to include new swizzle optimization (vllm…

f0db8b5

…-project#23465) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Attention][FA3] Update FA3 to include new swizzle optimization#23465

[Attention][FA3] Update FA3 to include new swizzle optimization#23465
vllm-bot merged 10 commits intovllm-project:mainfrom
neuralmagic:lwilkinson/fa3-swizzle

LucasWilkinson commented Aug 23, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Aug 23, 2025

Uh oh!

github-actions bot commented Nov 26, 2025

Uh oh!

mergify bot commented Dec 17, 2025

Uh oh!

mergify bot commented Dec 17, 2025

Uh oh!

mergify bot commented Jan 8, 2026

Uh oh!

mergify bot commented Jan 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

LucasWilkinson commented Aug 23, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Aug 23, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Nov 26, 2025

Uh oh!

mergify bot commented Dec 17, 2025

Uh oh!

mergify bot commented Dec 17, 2025

Uh oh!

mergify bot commented Jan 8, 2026

Uh oh!

mergify bot commented Jan 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

LucasWilkinson commented Aug 23, 2025 •

edited by github-actions bot

Loading