Auto sft #9728

blacksheep-Aristotle · 2025-01-02T08:58:00Z

PR types

New features

PR changes

Others

Description

nlp支持中层api/基础api sft训练。
支持sft 全量微调
支持lora 微调训练

paddle-bot · 2025-01-06T07:42:32Z

Thanks for your contribution!

#### Before submitting ```shell # Install and register `pre-commit` in the project folder pip install pre-commit && pre-commit install # Process previous code files separately pre-commit run --file XXXX.py ``` ### PR types  ### PR changes  ### Description 1.rebase 2.修复sft文件依赖错误，LoRAAutoConfig错误

CLAassistant · 2025-02-21T07:37:43Z

All committers have signed the CLA.

#### Before submitting - [ ] Lint code. If there are lint issues, please format the code first. ```shell # Install and register `pre-commit` in the project folder pip install pre-commit && pre-commit install # Process previous code files separately pre-commit run --file XXXX.py ``` - [ ] Add test cases into `tests` folder. If there are codecov issues, please add tests cases first. ### PR types  ### PR changes  ### Description add ci guard

…ate loss base

jeff41404 · 2025-02-25T03:29:24Z

llm/run_lora_hand.sh

+    --auto_parallel_resume_form_hybrid_parallel true \
+    --sharding_parallel_config "enable_stage1_tensor_fusion" \
+    --tensor_parallel_config "enable_mp_async_allreduce" \
+    --pipeline_parallel_config "enable_sharding_comm_overlap enable_dp_comm_overlap enable_overlap_p2p_comm disable_p2p_cache_shape" \


files llm/run_sft_hand.sh and llm/run_lora_hand.sh need to be added?

jeff41404 · 2025-02-25T03:35:02Z

paddlenlp/data/data_collator.py

+        if "labels" in batch.keys():
+            value = batch.pop("labels")
+            batch["labels"] = value


Why do we need to add this code? it seems that this code does not change batch?

jeff41404 · 2025-02-25T03:38:06Z

llm/auto_parallel/run_finetune_auto.py

+    # #NOTE(zhangwl):this may move to wrap
+    # trainable_parameters = [p for p in model.parameters() if not p.stop_gradient]
+    # trainer.set_optimizer_grouped_parameters(trainable_parameters)
+
+    # def forward_pre_hook(layer, input):
+    #     print(f"{layer} forward start")
+
+    # def forward_post_hook(layer, input, output):
+    #     print(f"{layer} forward done")
+
+    # for layer in trainer.model.sublayers():
+    #     layer.register_forward_pre_hook(forward_pre_hook)
+    #     layer.register_forward_post_hook(forward_post_hook)
+    # Train


should it be deleted?

jeff41404 · 2025-02-25T03:42:43Z

paddlenlp/transformers/model_utils.py

@@ -2769,6 +2769,7 @@ def merge_auto_dist_configs(self, configs):
            },....
        ]
        """
+        # import pdb;pdb.set_trace()


should it be deleted?

jeff41404 · 2025-02-25T03:48:47Z

The title and description are too simple, the title needs to express: using auto parallel to validate SFT/LORA of llama model. The description needs to include a comparison of loss and performance of SFT/LORA between manual parallel and auto parallel.

blacksheep-Aristotle force-pushed the auto_sft branch from 56c9890 to 93041e1 Compare January 3, 2025 07:42

blacksheep-Aristotle and others added 28 commits January 8, 2025 15:10

add single_model network and use intermediate api

85d6511

[AutoParallel]: fix llama_model_network run error

e87135d

New version of auto config

8bb66c9

fix gpt_network to use intermediate_api

0f0ad13

fix gpt_network to use intermediate_api

802827c

update api

8427405

update plan

f150efa

qwen fit base api

8e9f00d

[AutoParallel]:gpt single network support tp to share_embedding

0bed451

add intermediate ci

f18b49d

add single_model network and use intermediate api

18dc01d

New version of auto config

917d995

fix sharding

573ccd1

fix gpt_network to use intermediate_api

9fab6f3

fix gpt_network to use intermediate_api

1a3c7a0

update gpt run_pretrain_py

64ced7c

fix sharding error

05b4845

fix gpt format error

dff5db5

[AutoParallel]:fix llama vpp ci error

78d94e5

[AutoParallel]:fix ipp error

57eb989

[AutoParallel]:fix a100 ci error

a7dfee3

[AutoParallel]:fix a100 ci error

64cdab4

[AutoParallel]:add explanatory note

3ec357b

Delete =1.0.0

c22e514

[AutoParallel]:add run_fintune scripts

f71d655

[AutoParallel]:auto parallel support lora model

c015b35

update auto_lora_model

af1af5c

update auto_lora_model

aa841d4

blacksheep-Aristotle and others added 12 commits February 20, 2025 20:34

[AutoParallel]:update format

2e540c6

[AutoParallel]:support input attentionmask

981c246

[AutoParallel]:shard dataloader support multi inputs

bf1c607

[AutoParallel]:auto_sft rebase develop

c259ff3

[AutoParallel]:auto_sft rebase develop

8a0823e

[AutoParallel]:fix lora parallel mode

b8977d4

[AutoParallel]: fix bug about file dep circular and lora config error

1d0585b

[AutoParallel]: recover default base api config in launch script

059ba83

[AutoParallel]: add to do

b3d0fa5

[AutoParallel]: add to do

d917f46

Merge branch 'auto_sft' into auto_sft

95f7925

liufengwei0103 and others added 12 commits February 22, 2025 14:30

[AutoParallel]: add lora ci and fix bug

e07595a

[AutoParallel]: fix lint error

6a669f4

[AutoParallel]: fix lint error

57dc1f1

[AutoParallel]: fix lint error

5761457

[AutoParallel]: fix ci error about print model

79c954e

[AutoParallel]: fix typo

92c0861

[AutoParallel]: fix

cf7bac0

[AutoParallel]: fix llama_dygraph_auto_bs8_fp16_DP2-MP2-PP2_intermedi…

be2cb7b

…ate loss base

[AutoParallel]: delete typo

54a44a1

[AutoParallel]: delete comment

c77619b

[AutoParallel]: fix ci error not on a100

4413a72

jeff41404 reviewed Feb 25, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Auto sft #9728

Auto sft #9728

blacksheep-Aristotle commented Jan 2, 2025

paddle-bot bot commented Jan 6, 2025

CLAassistant commented Feb 21, 2025 •

edited

Loading

jeff41404 Feb 25, 2025

jeff41404 Feb 25, 2025

jeff41404 Feb 25, 2025

jeff41404 Feb 25, 2025

jeff41404 commented Feb 25, 2025

Auto sft #9728

Are you sure you want to change the base?

Auto sft #9728

Conversation

blacksheep-Aristotle commented Jan 2, 2025

PR types

PR changes

Description

paddle-bot bot commented Jan 6, 2025

CLAassistant commented Feb 21, 2025 • edited Loading

jeff41404 Feb 25, 2025

Choose a reason for hiding this comment

jeff41404 Feb 25, 2025

Choose a reason for hiding this comment

jeff41404 Feb 25, 2025

Choose a reason for hiding this comment

jeff41404 Feb 25, 2025

Choose a reason for hiding this comment

jeff41404 commented Feb 25, 2025

CLAassistant commented Feb 21, 2025 •

edited

Loading