[Core] Support all head sizes up to 256 with FlashAttention backend#8910
[Core] Support all head sizes up to 256 with FlashAttention backend#8910njhill wants to merge 2 commits intovllm-project:mainfrom
Conversation
We were previously restricting to specific sizes, but the native FA kernels pad and support arbitrary sizes up to 256.
tlrmchlsmth
left a comment
There was a problem hiding this comment.
Could you add some unit tests? Looks like we may be able to just extend this list here🤞
vllm/tests/kernels/test_attention.py
Lines 32 to 34 in c2ec430
|
Looks like we need to build flash without the |
|
This pull request has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this pull request should remain open. Thank you! |
|
This pull request has merge conflicts that must be resolved before it can be |
|
This pull request has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this pull request should remain open. Thank you! |
We were previously restricting to specific sizes, but the native FA kernels pad and support arbitrary sizes up to 256.