-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Auto sft #9728
base: develop
Are you sure you want to change the base?
Auto sft #9728
Conversation
56c9890
to
93041e1
Compare
Thanks for your contribution! |
<!-- Demo: PaddlePaddle#26 --> #### Before submitting ```shell # Install and register `pre-commit` in the project folder pip install pre-commit && pre-commit install # Process previous code files separately pre-commit run --file XXXX.py ``` ### PR types <!-- One of [ New features | Bug fixes | Function optimization | Performance optimization | Breaking changes | Others ] --> ### PR changes <!-- One of [ Models | APIs | Docs | Others ] --> ### Description 1.rebase 2.修复sft文件依赖错误,LoRAAutoConfig错误
<!-- Demo: PaddlePaddle#26 --> #### Before submitting - [ ] Lint code. If there are lint issues, please format the code first. ```shell # Install and register `pre-commit` in the project folder pip install pre-commit && pre-commit install # Process previous code files separately pre-commit run --file XXXX.py ``` - [ ] Add test cases into `tests` folder. If there are codecov issues, please add tests cases first. ### PR types <!-- One of [ New features | Bug fixes | Function optimization | Performance optimization | Breaking changes | Others ] --> ### PR changes <!-- One of [ Models | APIs | Docs | Others ] --> ### Description add ci guard
--auto_parallel_resume_form_hybrid_parallel true \ | ||
--sharding_parallel_config "enable_stage1_tensor_fusion" \ | ||
--tensor_parallel_config "enable_mp_async_allreduce" \ | ||
--pipeline_parallel_config "enable_sharding_comm_overlap enable_dp_comm_overlap enable_overlap_p2p_comm disable_p2p_cache_shape" \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
files llm/run_sft_hand.sh
and llm/run_lora_hand.sh
need to be added?
if "labels" in batch.keys(): | ||
value = batch.pop("labels") | ||
batch["labels"] = value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to add this code? it seems that this code does not change batch
?
# #NOTE(zhangwl):this may move to wrap | ||
# trainable_parameters = [p for p in model.parameters() if not p.stop_gradient] | ||
# trainer.set_optimizer_grouped_parameters(trainable_parameters) | ||
|
||
# def forward_pre_hook(layer, input): | ||
# print(f"{layer} forward start") | ||
|
||
# def forward_post_hook(layer, input, output): | ||
# print(f"{layer} forward done") | ||
|
||
# for layer in trainer.model.sublayers(): | ||
# layer.register_forward_pre_hook(forward_pre_hook) | ||
# layer.register_forward_post_hook(forward_post_hook) | ||
# Train |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should it be deleted?
@@ -2769,6 +2769,7 @@ def merge_auto_dist_configs(self, configs): | |||
},.... | |||
] | |||
""" | |||
# import pdb;pdb.set_trace() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should it be deleted?
The title and description are too simple, the title needs to express: using auto parallel to validate SFT/LORA of llama model. The description needs to include a comparison of loss and performance of SFT/LORA between manual parallel and auto parallel. |
PR types
New features
PR changes
Others
Description
nlp支持中层api/基础api sft训练。
支持sft 全量微调
支持lora 微调训练