-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[LLM] Update model convert and fix TP for deepseekv3 #9797
[LLM] Update model convert and fix TP for deepseekv3 #9797
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #9797 +/- ##
===========================================
+ Coverage 51.64% 52.21% +0.57%
===========================================
Files 741 730 -11
Lines 119141 115812 -3329
===========================================
- Hits 61525 60477 -1048
+ Misses 57616 55335 -2281 ☔ View full report in Codecov by Sentry. |
@@ -1189,6 +1191,15 @@ def _get_name_mappings(cls, config: DeepseekV2Config) -> list[StateDictNameMappi | |||
model_mappings.append([f"layers.{layer_index}.mlp.shared_experts.up_proj.weight", None, "transpose"]) | |||
model_mappings.append([f"layers.{layer_index}.mlp.shared_experts.down_proj.weight", None, "transpose"]) | |||
|
|||
# MTP (eagle) parameters for inference | |||
if layer_index == config.num_hidden_layers: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已经修改
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Before submitting
tests
folder. If there are codecov issues, please add tests cases first.PR types
Bug fixes
PR changes
Models
Description
_get_name_mapping
增加e_score_correction_bias
。embed_tokens.weight
和layers.61.embed_tokens.weight
的共享权重。tp_action
中为embed_tokens.weight
和layers.61.embed_tokens.weight
随机选择action,这会导致错误选择并且遗漏其中一个。MoE
模块