[LLM] Update model convert and fix TP for deepseekv3 #9797

DrownFish19 · 2025-01-20T07:05:35Z

Before submitting

Lint code. If there are lint issues, please format the code first.

# Install and register `pre-commit` in the project folder
pip install pre-commit && pre-commit install

# Process previous code files separately
pre-commit run --file XXXX.py

Add test cases into tests folder. If there are codecov issues, please add tests cases first.

PR types

Bug fixes

PR changes

Models

Description

为_get_name_mapping增加e_score_correction_bias。
转换最后一层（MTP（Eagle）参数）以用于推理
- DeepseekV3 包含 embed_tokens.weight 和 layers.61.embed_tokens.weight的共享权重。
- PaddleNLP 从 tp_action 中为embed_tokens.weight 和 layers.61.embed_tokens.weight随机选择action，这会导致错误选择并且遗漏其中一个。
- 修改匹配逻辑，将tp_action对应key按照从长到短排序，优先匹配长key，保证匹配正确。
修复 DeepseekV2MLP 中的 MoE 模块
新增 Deepseek-R1系列参数

codecov · 2025-01-21T04:36:19Z

Codecov Report

Attention: Patch coverage is 4.16667% with 23 lines in your changes missing coverage. Please review.

Project coverage is 52.21%. Comparing base (fb3e4c0) to head (a0f0535).
Report is 10 commits behind head on develop.

Files with missing lines	Patch %	Lines
paddlenlp/transformers/deepseek_v2/modeling.py	4.54%	21 Missing ⚠️
paddlenlp/transformers/conversion_utils.py	0.00%	2 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #9797      +/-   ##
===========================================
+ Coverage    51.64%   52.21%   +0.57%     
===========================================
  Files          741      730      -11     
  Lines       119141   115812    -3329     
===========================================
- Hits         61525    60477    -1048     
+ Misses       57616    55335    -2281

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

yuanlehome · 2025-01-21T06:44:31Z

paddlenlp/transformers/deepseek_v2/modeling.py

@@ -1189,6 +1191,15 @@ def _get_name_mappings(cls, config: DeepseekV2Config) -> list[StateDictNameMappi
            model_mappings.append([f"layers.{layer_index}.mlp.shared_experts.up_proj.weight", None, "transpose"])
            model_mappings.append([f"layers.{layer_index}.mlp.shared_experts.down_proj.weight", None, "transpose"])

+            # MTP (eagle) parameters for inference
+            if layer_index == config.num_hidden_layers:


这一块建议yong用num_nextn_predict_layers来决定逻辑，未来v4可能不一定只有1层～

已经修改

ZHUI

LGTM

DrownFish19 added 2 commits January 20, 2025 06:39

fix model convert and tp in MoEMLP

e3108d7

fix tp_action filter

278da40

yuanlehome reviewed Jan 21, 2025

View reviewed changes

DrownFish19 added 2 commits January 24, 2025 13:27

update convert accoding to num_nextn_predict_layers

2cebed5

add deepseek-R1

a0f0535

ZHUI approved these changes Jan 24, 2025

View reviewed changes

ZHUI merged commit 96856bd into PaddlePaddle:develop Jan 24, 2025
9 of 12 checks passed

DrownFish19 deleted the dev_20250120_update_deepseekv3 branch February 7, 2025 02:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[LLM] Update model convert and fix TP for deepseekv3 #9797

[LLM] Update model convert and fix TP for deepseekv3 #9797

DrownFish19 commented Jan 20, 2025 •

edited

Loading

codecov bot commented Jan 21, 2025 •

edited

Loading

yuanlehome Jan 21, 2025

DrownFish19 Jan 24, 2025

ZHUI left a comment

[LLM] Update model convert and fix TP for deepseekv3 #9797

[LLM] Update model convert and fix TP for deepseekv3 #9797

Conversation

DrownFish19 commented Jan 20, 2025 • edited Loading

Before submitting

PR types

PR changes

Description

codecov bot commented Jan 21, 2025 • edited Loading

Codecov Report

yuanlehome Jan 21, 2025

Choose a reason for hiding this comment

DrownFish19 Jan 24, 2025

Choose a reason for hiding this comment

ZHUI left a comment

Choose a reason for hiding this comment

DrownFish19 commented Jan 20, 2025 •

edited

Loading

codecov bot commented Jan 21, 2025 •

edited

Loading