-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refine deepseekv2 modeling for to_static #9851
Open
zhangbo9674
wants to merge
25
commits into
PaddlePaddle:develop
Choose a base branch
from
zhangbo9674:dev/test_deep_seekv3
base: develop
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+8
−12
Open
Changes from 20 commits
Commits
Show all changes
25 commits
Select commit
Hold shift + click to select a range
2f9956a
refine log
zhangbo9674 ee5f151
Merge branch 'develop' of https://github.com/PaddlePaddle/PaddleNLP i…
zhangbo9674 377962a
Merge branch 'develop' of https://github.com/PaddlePaddle/PaddleNLP i…
zhangbo9674 3aa70a8
Merge branch 'develop' of https://github.com/PaddlePaddle/PaddleNLP i…
zhangbo9674 3caaac7
Merge branch 'develop' of https://github.com/PaddlePaddle/PaddleNLP i…
zhangbo9674 796bff7
fix
zhangbo9674 87c33ac
add model args
zhangbo9674 29b05b0
suppoort_deepseekv2_autoparallel_with_DP/MP
xuxinyi389 20be84b
poolish
xuxinyi389 31ec76c
remove_env_set
xuxinyi389 34009e7
update_code
xuxinyi389 e553a3a
add_v3
xuxinyi389 0c96a56
support_sharding
xuxinyi389 0aa9fb0
move_to_v3
xuxinyi389 3e84fc6
fix_typo
xuxinyi389 495d123
update_v3_config
xuxinyi389 14470cf
Merge commit 'refs/pull/9862/head' of https://github.com/PaddlePaddle…
zhangbo9674 5835f1e
refine
zhangbo9674 909ffe8
refine
zhangbo9674 93cfe79
refine
zhangbo9674 c48d6c2
fix
zhangbo9674 7f7a486
fix
zhangbo9674 428bd09
fix
zhangbo9674 41b9107
fix
zhangbo9674 4b83dee
fix
zhangbo9674 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
如果attention_mask是支持use_cache的版本,那么就不是casual_mask,或者推理时使用left-padding,那么attention_mask也不是casual_mask,这里的修改不能覆盖之前的情况
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里的改动,主要是因为动转静不支持如下场景:
当前这样该是基于目前的场景下 use_flash_attention 下 attention_mask 一定为 None,还有一种改法,就是将控制流判断后移到调用的地方,但是目前这种情况下,自动并行的切分推导对控制流的场景支持还存在一些问题,因此为了不影响后续流程,先按照前面的改法实现