Skip to content

Conversation

@FightingZhen
Copy link
Contributor

@FightingZhen FightingZhen commented May 30, 2025

What does this PR do?

When using Qwen2.5-VL model with Flash Attention 2, we find that the implementation logic about api torch_npu.npu_rotary_mul is a little bit different from the same api in package flash-attn.

The former can only accept input param x and sin/cos with 4-dimension and same attention head dimension, while the latter can accept param sin/cos with 2-dimension and attention head dimension chunked to half.

At the same time, we also find that the api apply_rotary_emb is also used in Qwen2.5-omni with the same situation as Qwen2.5-VL.

Therefore, this PR is committed for solving the above problem, and update flash attention judgement logic in Qwen2.5-omni and ems model from is_flash_attn_2_available to is_flash_attn_available at the same time.

Fixes # (issue)
#38189

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

@FightingZhen FightingZhen force-pushed the bugfix_apply_rotary_emb_npu branch 3 times, most recently from 841f29d to d7a6a1d Compare May 30, 2025 09:36
@Rocketknight1
Copy link
Member

cc @SunMarc!

Copy link
Member

@SunMarc SunMarc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this ! There are still a few places where is_flash_attn_2_available is used instead of is_flash_attn_available that you introduced. Could you fix this ? Other than that LGTM !

@SunMarc SunMarc marked this pull request as ready for review May 30, 2025 14:37
@FightingZhen FightingZhen changed the title [bugfix] fix apply_rotary_emb error on Ascend NPU [bugfix] [WIP] fix apply_rotary_emb error on Ascend NPU May 30, 2025
@FightingZhen FightingZhen force-pushed the bugfix_apply_rotary_emb_npu branch 4 times, most recently from 660f1a3 to bb57f04 Compare May 30, 2025 15:46
@FightingZhen FightingZhen marked this pull request as draft May 30, 2025 15:46
@FightingZhen
Copy link
Contributor Author

Thanks for fixing this ! There are still a few places where is_flash_attn_2_available is used instead of is_flash_attn_available that you introduced. Could you fix this ? Other than that LGTM !

@SunMarc Thanks for your suggestion, after rechecking the places where is_flash_attn_2_available is used, there are still some places where this func can not be replaced yet. Below are the places still need using is_flash_attn_2_available and corresponding reason:

  1. Necessary ones: including the func in modeling_flash_attention_utils.py, modeling_utils.py and testing_utils.py, which are used for checking whether package flash-attn exists or not.
  2. Still in progress ones: including the func in flash_paged.py and model modernbert. In order to make sure using them on Ascend NPU correctly, further experiments are still required. Once experimental results are confirmed, a new PR will be commited as soon as possible.
    2.1. The experiment about checking whether paged attention on Ascend NPU is ready or not;
    2.2. The experiment about developing corresponding implementation for the func apply_rotary implemented by triton in package flash-attn.

Additionally, this PR still requires some furthur self-tests, so we change it to draft for now. When it is ready, we will invite you for review, thanks!

@SunMarc
Copy link
Member

SunMarc commented May 30, 2025

Thanks for the update ! Please ping me when this is ready for review !

@FightingZhen FightingZhen force-pushed the bugfix_apply_rotary_emb_npu branch from bb57f04 to 5b44cfc Compare June 3, 2025 03:04
@FightingZhen FightingZhen marked this pull request as ready for review June 3, 2025 03:09
@FightingZhen
Copy link
Contributor Author

@SunMarc We have finished self-testing the modifications in this PR, it is ready for review and merge :)

Copy link
Member

@SunMarc SunMarc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM !

@SunMarc SunMarc enabled auto-merge (squash) June 3, 2025 08:53
@FightingZhen FightingZhen changed the title [bugfix] [WIP] fix apply_rotary_emb error on Ascend NPU [bugfix] fix apply_rotary_emb error on Ascend NPU Jun 3, 2025
@SunMarc SunMarc merged commit fdf86fb into huggingface:main Jun 3, 2025
20 checks passed
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

bvantuan pushed a commit to bvantuan/transformers that referenced this pull request Jun 12, 2025
@FightingZhen FightingZhen deleted the bugfix_apply_rotary_emb_npu branch August 14, 2025 01:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants