[Core] Support all head sizes up to 256 with FlashAttention backend by njhill · Pull Request #8910 · vllm-project/vllm

njhill · 2024-09-27T17:37:40Z

We were previously restricting to specific sizes, but the native FA kernels pad and support arbitrary sizes up to 256.

tlrmchlsmth

Could you add some unit tests? Looks like we may be able to just extend this list here🤞

Lines 32 to 34 in c2ec430

    
           # FlashAttention forward only supports head dimension at most 128 
        
           # https://github.com/ROCmSoftwarePlatform/flash-attention/blob/3d2b6f5d037782cc2c906909a46fb7e2e1b48b25/csrc/flash_attn_rocm/flash_api.cpp#L62 
        
           HEAD_SIZES = [64, 80, 96, 112, 120, 128, 192, 256]

njhill · 2024-09-27T23:54:56Z

Looks like we need to build flash without the FLASHATTENTION_DISABLE_UNEVEN_K flag, have opened vllm-project/flash-attention#21 ... @WoosukKwon wdyt?

github-actions · 2025-02-25T02:01:54Z

This pull request has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this pull request should remain open. Thank you!

mergify · 2025-02-25T02:02:33Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @njhill.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

github-actions · 2025-05-27T02:13:09Z

This pull request has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this pull request should remain open. Thank you!

lgeiger · 2025-05-27T12:54:28Z

@njhill Has there been any progress on this? This would be very useful so we could bring back support for FA in ViTs and multi modal models #12445

[Core] Support all head sizes up to 256 with FlashAttention backend

4451bbc

We were previously restricting to specific sizes, but the native FA kernels pad and support arbitrary sizes up to 256.

njhill requested a review from WoosukKwon September 27, 2024 17:37

tlrmchlsmth reviewed Sep 27, 2024

View reviewed changes

Test more head sizes

dbbe6dc

vllm-project deleted a comment from github-actions bot Sep 27, 2024

github-actions bot added the stale Over 90 days of inactivity label Feb 25, 2025

mergify bot added the needs-rebase label Feb 25, 2025

github-actions bot added unstale Recieved activity after being labelled stale and removed stale Over 90 days of inactivity labels Feb 26, 2025

vrdn-23 mentioned this pull request May 23, 2025

Remove Vision FA warning #18522

Closed

github-actions bot added stale Over 90 days of inactivity and removed unstale Recieved activity after being labelled stale labels May 27, 2025

github-actions bot added unstale Recieved activity after being labelled stale and removed stale Over 90 days of inactivity labels May 28, 2025

robertgshaw2-redhat closed this Aug 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Core] Support all head sizes up to 256 with FlashAttention backend#8910

[Core] Support all head sizes up to 256 with FlashAttention backend#8910
njhill wants to merge 2 commits intovllm-project:mainfrom
njhill:flash-heads

njhill commented Sep 27, 2024

Uh oh!

tlrmchlsmth left a comment

Uh oh!

njhill commented Sep 27, 2024

Uh oh!

github-actions bot commented Feb 25, 2025

Uh oh!

mergify bot commented Feb 25, 2025

Uh oh!

github-actions bot commented May 27, 2025

Uh oh!

lgeiger commented May 27, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	# FlashAttention forward only supports head dimension at most 128
	# https://github.com/ROCmSoftwarePlatform/flash-attention/blob/3d2b6f5d037782cc2c906909a46fb7e2e1b48b25/csrc/flash_attn_rocm/flash_api.cpp#L62
	HEAD_SIZES = [64, 80, 96, 112, 120, 128, 192, 256]

Uh oh!

Conversation

njhill commented Sep 27, 2024

Uh oh!

tlrmchlsmth left a comment

Choose a reason for hiding this comment

Uh oh!

njhill commented Sep 27, 2024

Uh oh!

github-actions bot commented Feb 25, 2025

Uh oh!

mergify bot commented Feb 25, 2025

Uh oh!

github-actions bot commented May 27, 2025

Uh oh!

lgeiger commented May 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

lgeiger commented May 27, 2025 •

edited

Loading