Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refine deepseekv2 modeling for to_static #9851

Open
wants to merge 25 commits into
base: develop
Choose a base branch
from

Conversation

zhangbo9674
Copy link
Contributor

@zhangbo9674 zhangbo9674 commented Feb 12, 2025

Before submitting

  • Lint code. If there are lint issues, please format the code first.
# Install and register `pre-commit` in the project folder
pip install pre-commit && pre-commit install

# Process previous code files separately
pre-commit run --file XXXX.py
  • Add test cases into tests folder. If there are codecov issues, please add tests cases first.

PR types

Bug fixes

PR changes

Others

Description

Refine deepseekv2 modeling for to_static

Copy link

paddle-bot bot commented Feb 12, 2025

Thanks for your contribution!

Copy link

codecov bot commented Feb 12, 2025

Codecov Report

Attention: Patch coverage is 0% with 4 lines in your changes missing coverage. Please review.

Project coverage is 51.34%. Comparing base (5eeb7aa) to head (4b83dee).
Report is 22 commits behind head on develop.

Files with missing lines Patch % Lines
paddlenlp/transformers/deepseek_v2/modeling.py 0.00% 3 Missing ⚠️
...addlenlp/transformers/deepseek_v2/modeling_auto.py 0.00% 1 Missing ⚠️

❌ Your patch check has failed because the patch coverage (0.00%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage.
❌ Your project check has failed because the head coverage (51.34%) is below the target coverage (58.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #9851      +/-   ##
===========================================
- Coverage    51.54%   51.34%   -0.21%     
===========================================
  Files          741      745       +4     
  Lines       117904   118584     +680     
===========================================
+ Hits         60775    60886     +111     
- Misses       57129    57698     +569     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -717,9 +716,6 @@ def forward(
attention_mask = self._prepare_decoder_attention_mask(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如果attention_mask是支持use_cache的版本,那么就不是casual_mask,或者推理时使用left-padding,那么attention_mask也不是casual_mask,这里的修改不能覆盖之前的情况

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里的改动,主要是因为动转静不支持如下场景:

if self.config.use_flash_attention:
    attention_mask = None if is_casual_mask(attention_mask) else attention_mask

当前这样该是基于目前的场景下 use_flash_attention 下 attention_mask 一定为 None,还有一种改法,就是将控制流判断后移到调用的地方,但是目前这种情况下,自动并行的切分推导对控制流的场景支持还存在一些问题,因此为了不影响后续流程,先按照前面的改法实现

if is_casual_mask(attention_mask):
    layer_outputs = decoder_layer(
                hidden_states=hidden_states,
                position_ids=position_ids,
                attention_mask=None,
                output_attentions=output_attentions,
                past_key_value=past_key_value,
                use_cache=use_cache,
                attn_mask_startend_row_indices=attn_mask_startend_row_indices)
else:
    layer_outputs = decoder_layer(
                hidden_states=hidden_states,
                position_ids=position_ids,
                attention_mask=attention_mask,
                output_attentions=output_attentions,
                past_key_value=past_key_value,
                use_cache=use_cache,
                attn_mask_startend_row_indices=attn_mask_startend_row_indices)

DrownFish19
DrownFish19 previously approved these changes Feb 18, 2025
Copy link
Collaborator

@DrownFish19 DrownFish19 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants