-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[New Features]LoRA-GA supports pipeline parallel #9819
base: develop
Are you sure you want to change the base?
Conversation
Thanks for your contribution! |
Codecov ReportAttention: Patch coverage is
❌ Your patch check has failed because the patch coverage (0.00%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## develop #9819 +/- ##
===========================================
+ Coverage 52.22% 52.49% +0.27%
===========================================
Files 730 730
Lines 115793 115836 +43
===========================================
+ Hits 60475 60813 +338
+ Misses 55318 55023 -295 ☔ View full report in Codecov by Sentry. |
@@ -180,7 +262,12 @@ def get_module_gradient( | |||
|
|||
if tp_degree > 1: | |||
# remove prefix and suffix in name | |||
model_split_key = local_grad_name.split(base_model_prefix)[-1].rsplit(rank_suffix, 1)[0] | |||
if pp_to_single_mapping is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pp + dp/sharding的情况下,不需要pp_to_single_mapping吗?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pp_to_single_mapping是为了找到和lora_split_mapping和base_model_split_mapping对应的名称,不开tp的时候不需要split。
model = fleet.distributed_model(model) | ||
|
||
return model | ||
|
||
def training_pipeline_step(self, model: nn.Layer, inputs: Dict[str, Union[paddle.Tensor, Any]]) -> paddle.Tensor: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这块重写的目的是?会影响到LoRA吗?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LoRAGATrainer不需要优化器,所以我去掉了Trainer里的training_pipeline_step优化器相关的部分。LoRA不受影响,只有LoRA-GA在初始化的时候会走到LoRAGATrainer。
Before submitting
tests
folder. If there are codecov issues, please add tests cases first.PR types
New features
PR changes
APIs
Description
LoRA-GA支持pp