Skip to content

Conversation

@MatthewBonanni
Copy link
Contributor

@MatthewBonanni MatthewBonanni commented Nov 10, 2025

Purpose

vLLM-side PR for vllm-project/flash-attention#107. Removes unused flash_attn_with_kvcache and sets flags for build speedup + size reduction.

It looks like we see a 20% reduction in wheel size (481MB --> 380MB)

Recent commit on main:

[2025-11-14T16:34:18Z] #30 0.877 Wheel dist/vllm-0.11.1rc7.dev169+gc934caee8.cu129-cp38-abi3-linux_x86_64.whl is within the allowed size (481.69 MB).

This PR:

[2025-11-14T15:05:58Z] #30 0.953 Wheel dist/vllm-0.11.1rc7.dev174+g843768001.cu129-cp38-abi3-linux_x86_64.whl is within the allowed size (380.61 MB).

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Matthew Bonanni <[email protected]>
@mergify mergify bot added the ci/build label Nov 10, 2025
@MatthewBonanni MatthewBonanni changed the title change git tag [Attention] Bump FA for removed method Nov 10, 2025
@mergify
Copy link

mergify bot commented Nov 10, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @MatthewBonanni.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Nov 10, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the flash-attention dependency. However, this change introduces two critical issues. First, the new dependency version removes the flash_attn_with_kvcache function, but this function is still used in the test file tests/kernels/attention/test_flash_attn.py, which will likely cause the tests to fail. The test code needs to be updated to reflect this change in the dependency. Second, the dependency now points to a temporary branch on a personal fork. This is a major risk for build stability and security, and it should be pointed to an official repository and a stable commit/tag before merging into a main branch.

@mgoin mgoin added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 10, 2025
Signed-off-by: Matthew Bonanni <[email protected]>
Signed-off-by: Matthew Bonanni <[email protected]>
Signed-off-by: Matthew Bonanni <[email protected]>
@mergify mergify bot removed the needs-rebase label Nov 13, 2025
@LucasWilkinson LucasWilkinson changed the title [Attention] Bump FA for removed method [DO NOT MERGE][Attention] Bump FA for removed method Nov 13, 2025
@MatthewBonanni MatthewBonanni changed the title [DO NOT MERGE][Attention] Bump FA for removed method [Attention] Bump FA for removed method Nov 13, 2025
@mgoin
Copy link
Member

mgoin commented Nov 14, 2025

Wow, it looks like we see a 20% reduction in wheel size (481MB --> 380MB)

Recent commit on main:

[2025-11-14T16:34:18Z] #30 0.877 Wheel dist/vllm-0.11.1rc7.dev169+gc934caee8.cu129-cp38-abi3-linux_x86_64.whl is within the allowed size (481.69 MB).

This PR:

[2025-11-14T15:05:58Z] #30 0.953 Wheel dist/vllm-0.11.1rc7.dev174+g843768001.cu129-cp38-abi3-linux_x86_64.whl is within the allowed size (380.61 MB).

@vllm-bot vllm-bot merged commit 8cc40f8 into vllm-project:main Nov 14, 2025
87 of 89 checks passed
@MatthewBonanni MatthewBonanni deleted the bump_fa branch November 14, 2025 17:19
geodavic pushed a commit to geodavic/vllm that referenced this pull request Nov 16, 2025
Signed-off-by: Matthew Bonanni <[email protected]>
Co-authored-by: Cyrus Leung <[email protected]>
Signed-off-by: George D. Torres <[email protected]>
bwasti pushed a commit to bwasti/vllm that referenced this pull request Nov 17, 2025
Signed-off-by: Matthew Bonanni <[email protected]>
Co-authored-by: Cyrus Leung <[email protected]>
Signed-off-by: Bram Wasti <[email protected]>
devpatelio pushed a commit to SumanthRH/vllm that referenced this pull request Nov 29, 2025
kitaekatt pushed a commit to kitaekatt/vllm that referenced this pull request Dec 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants