Skip to content

[Attention][FA3] Update FA3 to include new swizzle optimization#23465

Merged
vllm-bot merged 10 commits intovllm-project:mainfrom
neuralmagic:lwilkinson/fa3-swizzle
Feb 3, 2026
Merged

[Attention][FA3] Update FA3 to include new swizzle optimization#23465
vllm-bot merged 10 commits intovllm-project:mainfrom
neuralmagic:lwilkinson/fa3-swizzle

Conversation

@LucasWilkinson
Copy link
Collaborator

@LucasWilkinson LucasWilkinson commented Aug 23, 2025

vLLM side of vllm-project/flash-attention#82

meta-llama/Meta-Llama-3-8B-Instruct, 1xH100, 4k and 2k out

branch   rate     num_prompts  req/s    median TTFT (ms)   std TTFT     p99 TTFT     median TPOT (ms)   std TPOT     p99 TPOT    
------   ----     -----------  -----    ----------------   --------     --------     ----------------   --------     --------    
MAIN     1.00     120          0.88     141.95             29.83        287.21       11.07              0.84         11.67       
PR       1.00     120          0.89     137.69             28.51        277.24       10.81              0.84         11.40       
MAIN     2.00     240          1.53     177.97             54.18        415.02       25.73              4.63         30.20       
PR       2.00     240          1.54     176.99             52.59        403.90       25.72              4.60         30.08       
MAIN     3.00     360          1.65     361.44             18180.54     43681.80     42.99              8.20         52.66       
PR       3.00     360          1.66     345.31             17685.66     42551.03     42.69              8.00         52.07       
MAIN     4.00     480          1.66     54165.70           45848.99     119451.76    49.99              9.37         69.16       
PR       4.00     480          1.67     52780.84           45014.34     117440.91    49.40              9.16         68.40 

@mergify mergify bot added the ci/build label Aug 23, 2025
@LucasWilkinson LucasWilkinson added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 23, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the flash-attention dependency to a new commit hash, likely to incorporate the 'FA3 swizzle optimization' mentioned in the PR title. While pinning to a commit is good for reproducibility, using a commit hash that is not part of a tag or a long-lived branch can pose a risk for future builds and maintenance. I've added a comment suggesting the use of a git tag for better long-term stability.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Pinning dependencies to a specific commit hash is good for reproducibility. However, for long-term maintenance and release builds, it's better to use a git tag. Commit hashes that are not part of a branch or tag can be garbage collected by git, or become hard to track. Since this is a work-in-progress, it's acceptable for now, but it would be best to create a tag in the vllm-project/flash-attention repository for this commit before this PR is merged.

@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this pull request should remain open. Thank you!

@github-actions github-actions bot added the stale Over 90 days of inactivity label Nov 26, 2025
@mergify
Copy link

mergify bot commented Dec 17, 2025

Hi @LucasWilkinson, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?
mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

1 similar comment
@mergify
Copy link

mergify bot commented Dec 17, 2025

Hi @LucasWilkinson, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?
mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

@github-actions github-actions bot added unstale Recieved activity after being labelled stale and removed stale Over 90 days of inactivity labels Dec 18, 2025
@LucasWilkinson LucasWilkinson force-pushed the lwilkinson/fa3-swizzle branch from 74296b9 to 9d203b4 Compare January 8, 2026 07:09
@mergify
Copy link

mergify bot commented Jan 8, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @LucasWilkinson.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify
Copy link

mergify bot commented Jan 16, 2026

Documentation preview: https://vllm--23465.org.readthedocs.build/en/23465/

@mergify mergify bot added the documentation Improvements or additions to documentation label Jan 16, 2026
@LucasWilkinson LucasWilkinson force-pushed the lwilkinson/fa3-swizzle branch 2 times, most recently from d83aa79 to 74688d5 Compare January 16, 2026 05:24
@LucasWilkinson LucasWilkinson changed the title [WIP][Attention][FA3] Update FA3 to include new swizzle optimization [Attention][FA3] Update FA3 to include new swizzle optimization Jan 17, 2026
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
The scheduler_metadata buffer was sized using max_num_seqs, but during
cuda graph capture the batch size can be up to max_cudagraph_size which
may be larger. This caused RuntimeError when the scheduler returned more
elements than the buffer could hold.

Example: with max_num_seqs=1, buffer was 1*4+1=5, but max_cudagraph_size=2
led to scheduler returning 9 elements (2*4+1=9).

Fixes CI failures in test_simple_generation[FLASH_ATTN] and similar tests.

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
The scheduler_metadata buffer was sized using max_num_seqs, but during
cuda graph capture the batch size can be up to max_cudagraph_size which
may be larger. This caused RuntimeError when the scheduler returned more
elements than the buffer could hold.

Example: with max_num_seqs=1, buffer was 1*4+1=5, but max_cudagraph_size=2
led to scheduler returning 9 elements (2*4+1=9).

Fixes CI failures in test_simple_generation[FLASH_ATTN] and similar tests.

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
@vllm-bot vllm-bot merged commit 2267cb1 into vllm-project:main Feb 3, 2026
93 of 96 checks passed
PiratePai pushed a commit to PiratePai/epd_shm that referenced this pull request Feb 3, 2026
…-project#23465)

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Pai <416932041@qq.com>
ProExpertProg added a commit that referenced this pull request Feb 4, 2026
gameofdimension pushed a commit to gameofdimension/vllm that referenced this pull request Feb 5, 2026
…-project#23465)

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: felix01.yu <felix01.yu@vipshop.com>
ItzDEXX pushed a commit to ItzDEXX/vllm that referenced this pull request Feb 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build documentation Improvements or additions to documentation ready ONLY add when PR is ready to merge/full CI is needed unstale Recieved activity after being labelled stale v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants