Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FEAT / Trainer: Add adamw 4bit optimizer #31865

Merged
merged 13 commits into from
Aug 22, 2024
Merged

FEAT / Trainer: Add adamw 4bit optimizer #31865

merged 13 commits into from
Aug 22, 2024

Conversation

SunMarc
Copy link
Member

@SunMarc SunMarc commented Jul 9, 2024

What does this PR do ?

This PR adds the 4-bit optimizer from torchao library into HF Trainer. For now, it requires the main branch of torchao and torch >=2.3 (maybe we can wait a bit before merging). For those who wants to try, you can pass optim="adamw_torch_4bit" in TrainingArguments.

Since we already have the 8-bit optimizer from bnb that works well, i'm not adding it.

Related thread : https://x.com/marksaroufim/status/1809398186198593566

cc @muellerzr as you might be interested

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Contributor

@muellerzr muellerzr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! LG2M, cc @msaroufim :)

@msaroufim
Copy link

msaroufim commented Jul 9, 2024

There's also an AdamWFp8 btw and it's the fastest one we've found when the HW supports it https://github.com/pytorch/ao/tree/main/torchao/prototype/low_bit_optim#benchmarks

Also cc @gau-nernst this is very exciting!

Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding!

@gau-nernst gau-nernst mentioned this pull request Jul 10, 2024
@SunMarc
Copy link
Member Author

SunMarc commented Jul 10, 2024

There's also an AdamWFp8 btw and it's the fastest one we've found when the HW supports it https://github.com/pytorch/ao/tree/main/torchao/prototype/low_bit_optim#benchmarks

Nice ! I'll add it in a separate PR !

@msaroufim
Copy link

Heads up @SunMarc we just released torchao 0.4! https://github.com/pytorch/ao/releases/tag/v0.4.0

@SunMarc
Copy link
Member Author

SunMarc commented Aug 8, 2024

Nice ! I'll merge it as soon as we merge the torchao quantization PR in transformers as there is some overlap !

@SunMarc SunMarc merged commit c42d264 into main Aug 22, 2024
24 checks passed
@SunMarc SunMarc deleted the add-4bit-optim branch August 22, 2024 13:07
zucchini-nlp pushed a commit to zucchini-nlp/transformers that referenced this pull request Aug 30, 2024
* add 4bit optimizer

* style

* fix msg

* style

* add qgalore

* Revert "add qgalore"

This reverts commit 25278e8.

* style

* version check
zucchini-nlp pushed a commit to zucchini-nlp/transformers that referenced this pull request Aug 30, 2024
* add 4bit optimizer

* style

* fix msg

* style

* add qgalore

* Revert "add qgalore"

This reverts commit 25278e8.

* style

* version check
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants