Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refine deepseekv2 modeling for to_static #9851

Open
wants to merge 25 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
2f9956a
refine log
zhangbo9674 Nov 8, 2024
ee5f151
Merge branch 'develop' of https://github.com/PaddlePaddle/PaddleNLP i…
zhangbo9674 Nov 29, 2024
377962a
Merge branch 'develop' of https://github.com/PaddlePaddle/PaddleNLP i…
zhangbo9674 Nov 29, 2024
3aa70a8
Merge branch 'develop' of https://github.com/PaddlePaddle/PaddleNLP i…
zhangbo9674 Dec 6, 2024
3caaac7
Merge branch 'develop' of https://github.com/PaddlePaddle/PaddleNLP i…
zhangbo9674 Feb 11, 2025
796bff7
fix
zhangbo9674 Feb 12, 2025
87c33ac
add model args
zhangbo9674 Feb 12, 2025
29b05b0
suppoort_deepseekv2_autoparallel_with_DP/MP
xuxinyi389 Feb 13, 2025
20be84b
poolish
xuxinyi389 Feb 13, 2025
31ec76c
remove_env_set
xuxinyi389 Feb 13, 2025
34009e7
update_code
xuxinyi389 Feb 13, 2025
e553a3a
add_v3
xuxinyi389 Feb 14, 2025
0c96a56
support_sharding
xuxinyi389 Feb 14, 2025
0aa9fb0
move_to_v3
xuxinyi389 Feb 14, 2025
3e84fc6
fix_typo
xuxinyi389 Feb 14, 2025
495d123
update_v3_config
xuxinyi389 Feb 14, 2025
14470cf
Merge commit 'refs/pull/9862/head' of https://github.com/PaddlePaddle…
zhangbo9674 Feb 17, 2025
5835f1e
refine
zhangbo9674 Feb 17, 2025
909ffe8
refine
zhangbo9674 Feb 17, 2025
93cfe79
refine
zhangbo9674 Feb 17, 2025
c48d6c2
fix
zhangbo9674 Feb 18, 2025
7f7a486
fix
zhangbo9674 Feb 19, 2025
428bd09
fix
zhangbo9674 Feb 19, 2025
41b9107
fix
zhangbo9674 Feb 24, 2025
4b83dee
fix
zhangbo9674 Feb 24, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 3 additions & 11 deletions paddlenlp/transformers/deepseek_v2/modeling.py
Original file line number Diff line number Diff line change
Expand Up @@ -231,11 +231,8 @@
)

attn_weights = attn_weights + attention_mask
if not paddle.in_dynamic_mode():
with paddle.amp.auto_cast(False):

Check warning on line 234 in paddlenlp/transformers/deepseek_v2/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/deepseek_v2/modeling.py#L234

Added line #L234 was not covered by tests
attn_weights = F.softmax(attn_weights, axis=-1, dtype="float32").astype(query_states.dtype)
else:
with paddle.amp.auto_cast(False):
attn_weights = F.softmax(attn_weights, axis=-1, dtype="float32").astype(query_states.dtype)

attn_weights = F.dropout(attn_weights, p=config.attention_dropout, training=training)

Expand Down Expand Up @@ -335,12 +332,7 @@
f"Implementation of fused_rms_norm is not available on {get_env_device()}. Please install paddle_xpu to use this feature"
)

if paddle.in_dynamic_mode():
with paddle.amp.auto_cast(False):
hidden_states = hidden_states.astype("float32")
variance = hidden_states.pow(2).mean(-1, keepdim=True)
hidden_states = paddle.rsqrt(variance + self.variance_epsilon) * hidden_states
else:
with paddle.amp.auto_cast(False):

Check warning on line 335 in paddlenlp/transformers/deepseek_v2/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/deepseek_v2/modeling.py#L335

Added line #L335 was not covered by tests
hidden_states = hidden_states.astype("float32")
variance = hidden_states.pow(2).mean(-1, keepdim=True)
hidden_states = paddle.rsqrt(variance + self.variance_epsilon) * hidden_states
Expand Down Expand Up @@ -543,7 +535,7 @@

t = paddle.arange(seq_len, dtype=paddle.float32)

freqs = paddle.outer(t, self.inv_freq)
freqs = paddle.outer(t, paddle.cast(self.inv_freq, dtype="float32"))

Check warning on line 538 in paddlenlp/transformers/deepseek_v2/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/deepseek_v2/modeling.py#L538

Added line #L538 was not covered by tests

_mscale = float(
yarn_get_mscale(self.scaling_factor, self.mscale)
Expand Down
6 changes: 5 additions & 1 deletion paddlenlp/transformers/deepseek_v2/modeling_auto.py
Original file line number Diff line number Diff line change
Expand Up @@ -705,7 +705,11 @@
inputs_embeds = self.embed_tokens(input_ids)

# embed positions
if attn_mask_startend_row_indices is not None or get_use_casual_mask():
if (

Check warning on line 708 in paddlenlp/transformers/deepseek_v2/modeling_auto.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/deepseek_v2/modeling_auto.py#L708

Added line #L708 was not covered by tests
attn_mask_startend_row_indices is not None
or get_use_casual_mask()
or (self.config.use_flash_attention and self.training)
):
attention_mask = None
else:
# [bs, seq_len]
Expand Down
Loading