-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refine deepseekv2 modeling for to_static #9851
base: develop
Are you sure you want to change the base?
Refine deepseekv2 modeling for to_static #9851
Conversation
Thanks for your contribution! |
Codecov ReportAttention: Patch coverage is
❌ Your patch check has failed because the patch coverage (0.00%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## develop #9851 +/- ##
===========================================
- Coverage 51.54% 51.34% -0.21%
===========================================
Files 741 745 +4
Lines 117904 118584 +680
===========================================
+ Hits 60775 60886 +111
- Misses 57129 57698 +569 ☔ View full report in Codecov by Sentry. |
…/PaddleNLP into dev/test_deep_seekv3
@@ -717,9 +716,6 @@ def forward( | |||
attention_mask = self._prepare_decoder_attention_mask( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
如果attention_mask是支持use_cache的版本,那么就不是casual_mask,或者推理时使用left-padding,那么attention_mask也不是casual_mask,这里的修改不能覆盖之前的情况
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里的改动,主要是因为动转静不支持如下场景:
if self.config.use_flash_attention:
attention_mask = None if is_casual_mask(attention_mask) else attention_mask
当前这样该是基于目前的场景下 use_flash_attention 下 attention_mask 一定为 None,还有一种改法,就是将控制流判断后移到调用的地方,但是目前这种情况下,自动并行的切分推导对控制流的场景支持还存在一些问题,因此为了不影响后续流程,先按照前面的改法实现
if is_casual_mask(attention_mask):
layer_outputs = decoder_layer(
hidden_states=hidden_states,
position_ids=position_ids,
attention_mask=None,
output_attentions=output_attentions,
past_key_value=past_key_value,
use_cache=use_cache,
attn_mask_startend_row_indices=attn_mask_startend_row_indices)
else:
layer_outputs = decoder_layer(
hidden_states=hidden_states,
position_ids=position_ids,
attention_mask=attention_mask,
output_attentions=output_attentions,
past_key_value=past_key_value,
use_cache=use_cache,
attn_mask_startend_row_indices=attn_mask_startend_row_indices)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Before submitting
tests
folder. If there are codecov issues, please add tests cases first.PR types
Bug fixes
PR changes
Others
Description
Refine deepseekv2 modeling for to_static