-
Notifications
You must be signed in to change notification settings - Fork 40
feat: Support LoraConfig in TorchTune BuiltinTrainer #102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Support LoraConfig in TorchTune BuiltinTrainer #102
Conversation
Signed-off-by: Electronic-Waste <[email protected]>
Signed-off-by: Electronic-Waste <[email protected]>
Signed-off-by: Electronic-Waste <[email protected]>
Signed-off-by: Electronic-Waste <[email protected]>
@Electronic-Waste: GitHub didn't allow me to request PR reviews from the following users: kubeflow/kubeflow-trainer-team, kubeflow/kubeflow-sdk-team. Note that only kubeflow members and repo collaborators can review this PR, and authors cannot review their own PRs. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Pull Request Test Coverage Report for Build 18339419302Warning: This coverage report may be inaccurate.This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.
Details
💛 - Coveralls |
Thanks @Electronic-Waste for your PR. |
@szaher Yes, TorchTune has been deprecated. So we plan to support "Kubeflow Dynamic LLM Trainer Framework"(kubeflow/trainer#2839) in the future. However, TorchTune is still one of the best choices for SFT in a wide range of models. And also, we've recently added a new runtime for Qwen in Kubeflow Trainer(kubeflow/trainer#2835) to provide initial out-of-box LLM fine-tuning experience on Kubernetes. We will follow this workflow:
This will maintain the backward compatibility and also provides early support for out-of-box LLM fine-tuning experience on Kubernetes. I think it's a more graceful and user-friendly workflow instead of deprecating TorchTune immediately. What's more, we also want to present this feature at Kubeflow Summit NA this year. So it might be better if we could implement this feature before that. WDYT? @kubeflow/kubeflow-trainer-team @kubeflow/kubeflow-sdk-team |
+1 to what @Electronic-Waste mentioned! As we agreed on Dynamic LLM Trainer Framework as part of kubeflow/trainer#2752, I think we shouldn't deprecate TorchTune LLM Trainer for now, but instead keep it extensible to support multiple backends |
Agree with @Electronic-Waste to support LoRA configurations in TorchTune config for now. |
Signed-off-by: Electronic-Waste <[email protected]>
Signed-off-by: Electronic-Waste <[email protected]>
Signed-off-by: Electronic-Waste <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Electronic-Waste!
I left a few comments.
cc @kubeflow/kubeflow-sdk-team
@Electronic-Waste Please also add unit test for it: https://github.com/kubeflow/sdk/blob/main/kubeflow/trainer/backends/kubernetes/backend_test.py#L694 |
Signed-off-by: Electronic-Waste <[email protected]>
Signed-off-by: Electronic-Waste <[email protected]>
/milestone v0.2 |
I think, this PR should be ready now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this @Electronic-Waste!
/lgtm
/approve
/hold for kubeflow/trainer#2832
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: andreyvelich The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/hold cancel |
* feat: Add lora types. Signed-off-by: Electronic-Waste <[email protected]> * chore: propagate lora parameters in command. Signed-off-by: Electronic-Waste <[email protected]> * feat(lora): Add support for QLoRA. Signed-off-by: Electronic-Waste <[email protected]> * fix(lora): remove extra quote symbol in lora attn module. Signed-off-by: Electronic-Waste <[email protected]> * fix(lora): replace direct field override with field map. Signed-off-by: Electronic-Waste <[email protected]> * fix(lora): remove extra flags. Signed-off-by: Electronic-Waste <[email protected]> * fix(lora): fix wrong default list value in LoraConfig. Signed-off-by: Electronic-Waste <[email protected]> * fix(lora): rmeove outdated code. Signed-off-by: Electronic-Waste <[email protected]> * test(backend): Add test for lora. Signed-off-by: Electronic-Waste <[email protected]> --------- Signed-off-by: Electronic-Waste <[email protected]>
What this PR does / why we need it:
This PR introduces
LoraConfig
inTorchTuneConfig
, and propagates configs in cmd lines.I tested it with script:
Results:
/cc @kubeflow/kubeflow-trainer-team @kubeflow/kubeflow-sdk-team @kramaranya @szaher @eoinfennessy @franciscojavierarceo @rudeigerc
Which issue(s) this PR fixes (optional, in
Fixes #<issue number>, #<issue number>, ...
format, will close the issue(s) when PR gets merged):Fixes #
Checklist: