Refine deepseekv2 modeling for to_static #9851

zhangbo9674 · 2025-02-12T08:11:18Z

Before submitting

Lint code. If there are lint issues, please format the code first.

# Install and register `pre-commit` in the project folder
pip install pre-commit && pre-commit install

# Process previous code files separately
pre-commit run --file XXXX.py

Add test cases into tests folder. If there are codecov issues, please add tests cases first.

PR types

Bug fixes

PR changes

Others

Description

Refine deepseekv2 modeling for to_static

…nto develop

paddle-bot · 2025-02-12T08:11:34Z

Thanks for your contribution!

codecov · 2025-02-12T08:45:55Z

Codecov Report

Attention: Patch coverage is 0% with 4 lines in your changes missing coverage. Please review.

Project coverage is 51.34%. Comparing base (5eeb7aa) to head (4b83dee).
Report is 22 commits behind head on develop.

Files with missing lines	Patch %	Lines
paddlenlp/transformers/deepseek_v2/modeling.py	0.00%	3 Missing ⚠️
...addlenlp/transformers/deepseek_v2/modeling_auto.py	0.00%	1 Missing ⚠️

❌ Your patch check has failed because the patch coverage (0.00%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage.
❌ Your project check has failed because the head coverage (51.34%) is below the target coverage (58.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #9851      +/-   ##
===========================================
- Coverage    51.54%   51.34%   -0.21%     
===========================================
  Files          741      745       +4     
  Lines       117904   118584     +680     
===========================================
+ Hits         60775    60886     +111     
- Misses       57129    57698     +569

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

…/PaddleNLP into dev/test_deep_seekv3

DrownFish19 · 2025-02-18T03:36:22Z

paddlenlp/transformers/deepseek_v2/modeling_auto.py

@@ -717,9 +716,6 @@ def forward(
            attention_mask = self._prepare_decoder_attention_mask(


如果attention_mask是支持use_cache的版本，那么就不是casual_mask，或者推理时使用left-padding，那么attention_mask也不是casual_mask，这里的修改不能覆盖之前的情况

这里的改动，主要是因为动转静不支持如下场景：

if self.config.use_flash_attention: attention_mask = None if is_casual_mask(attention_mask) else attention_mask

当前这样该是基于目前的场景下 use_flash_attention 下 attention_mask 一定为 None，还有一种改法，就是将控制流判断后移到调用的地方，但是目前这种情况下，自动并行的切分推导对控制流的场景支持还存在一些问题，因此为了不影响后续流程，先按照前面的改法实现

if is_casual_mask(attention_mask): layer_outputs = decoder_layer( hidden_states=hidden_states, position_ids=position_ids, attention_mask=None, output_attentions=output_attentions, past_key_value=past_key_value, use_cache=use_cache, attn_mask_startend_row_indices=attn_mask_startend_row_indices) else: layer_outputs = decoder_layer( hidden_states=hidden_states, position_ids=position_ids, attention_mask=attention_mask, output_attentions=output_attentions, past_key_value=past_key_value, use_cache=use_cache, attn_mask_startend_row_indices=attn_mask_startend_row_indices)

DrownFish19

LGTM

zhangbo9674 added 5 commits November 8, 2024 08:52

refine log

2f9956a

Merge branch 'develop' of https://github.com/PaddlePaddle/PaddleNLP i…

ee5f151

…nto develop

Merge branch 'develop' of https://github.com/PaddlePaddle/PaddleNLP i…

377962a

…nto develop

Merge branch 'develop' of https://github.com/PaddlePaddle/PaddleNLP i…

3aa70a8

…nto develop

Merge branch 'develop' of https://github.com/PaddlePaddle/PaddleNLP i…

3caaac7

…nto develop

zhangbo9674 added 2 commits February 12, 2025 08:12

fix

796bff7

add model args

87c33ac

xuxinyi389 and others added 13 commits February 13, 2025 19:35

suppoort_deepseekv2_autoparallel_with_DP/MP

29b05b0

poolish

20be84b

remove_env_set

31ec76c

update_code

34009e7

add_v3

e553a3a

support_sharding

0c96a56

move_to_v3

0aa9fb0

fix_typo

3e84fc6

update_v3_config

495d123

Merge commit 'refs/pull/9862/head' of https://github.com/PaddlePaddle…

14470cf

…/PaddleNLP into dev/test_deep_seekv3

refine

5835f1e

refine

909ffe8

refine

93cfe79

DrownFish19 reviewed Feb 18, 2025

View reviewed changes

fix

c48d6c2

DrownFish19 previously approved these changes Feb 18, 2025

View reviewed changes

zhangbo9674 dismissed DrownFish19’s stale review via 7f7a486 February 19, 2025 02:12

zhangbo9674 added 4 commits February 19, 2025 02:15

fix

7f7a486

fix

428bd09

fix

41b9107

fix

4b83dee

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refine deepseekv2 modeling for to_static #9851

Refine deepseekv2 modeling for to_static #9851

zhangbo9674 commented Feb 12, 2025 •

edited

Loading

paddle-bot bot commented Feb 12, 2025

codecov bot commented Feb 12, 2025 •

edited

Loading

DrownFish19 Feb 18, 2025

zhangbo9674 Feb 18, 2025

DrownFish19 left a comment

		@@ -717,9 +716,6 @@ def forward(
		attention_mask = self._prepare_decoder_attention_mask(

Refine deepseekv2 modeling for to_static #9851

Are you sure you want to change the base?

Refine deepseekv2 modeling for to_static #9851

Conversation

zhangbo9674 commented Feb 12, 2025 • edited Loading

Before submitting

PR types

PR changes

Description

paddle-bot bot commented Feb 12, 2025

codecov bot commented Feb 12, 2025 • edited Loading

Codecov Report

DrownFish19 Feb 18, 2025

Choose a reason for hiding this comment

zhangbo9674 Feb 18, 2025

Choose a reason for hiding this comment

DrownFish19 left a comment

Choose a reason for hiding this comment

zhangbo9674 commented Feb 12, 2025 •

edited

Loading

codecov bot commented Feb 12, 2025 •

edited

Loading