-
Notifications
You must be signed in to change notification settings - Fork 31.3k
[bugfix] fix apply_rotary_emb error on Ascend NPU #38491
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[bugfix] fix apply_rotary_emb error on Ascend NPU #38491
Conversation
841f29d to
d7a6a1d
Compare
|
cc @SunMarc! |
SunMarc
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing this ! There are still a few places where is_flash_attn_2_available is used instead of is_flash_attn_available that you introduced. Could you fix this ? Other than that LGTM !
660f1a3 to
bb57f04
Compare
@SunMarc Thanks for your suggestion, after rechecking the places where
Additionally, this PR still requires some furthur self-tests, so we change it to draft for now. When it is ready, we will invite you for review, thanks! |
|
Thanks for the update ! Please ping me when this is ready for review ! |
bb57f04 to
5b44cfc
Compare
|
@SunMarc We have finished self-testing the modifications in this PR, it is ready for review and merge :) |
SunMarc
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM !
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
…38491) [bugfix] fix apply_rotary_emb error on Ascend NPU
What does this PR do?
When using Qwen2.5-VL model with Flash Attention 2, we find that the implementation logic about api
torch_npu.npu_rotary_mulis a little bit different from the same api in packageflash-attn.The former can only accept input param
xandsin/coswith 4-dimension and same attention head dimension, while the latter can accept paramsin/coswith 2-dimension and attention head dimension chunked to half.At the same time, we also find that the api
apply_rotary_embis also used in Qwen2.5-omni with the same situation as Qwen2.5-VL.Therefore, this PR is committed for solving the above problem, and update flash attention judgement logic in Qwen2.5-omni and ems model from
is_flash_attn_2_availabletois_flash_attn_availableat the same time.Fixes # (issue)
#38189
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.