[Low-bit optim] Improve compile time + Fix PyTorch 2.3 support for 4-bit optim #812

gau-nernst · 2024-09-05T02:42:49Z

Static-shape compile optim step for single parameter + disable cache size limit.

For a given model, the number of different argument combinations to single_param_adam() is fixed -> safe to disable cache limit without the risk of always re-compiling.

Benefits

Improve compile time for 8-bit and FP8 optim (since we don't compile optim step for all parameters at once anymore) -> no noticeable compile time now 🤯
Improve speed for 4-bit optim (thanks to static shape) -> on par with 8-bit optim now
Fix PyTorch 2.3 support for 4-bit optim (introduced by Move more utils to TorchAOBaseTensor #784 (comment))
(Unintended) Fix unusual memory usage of FP8 optim -> same memory footprint as 8-bit optim now

Others

Re-enable FSDP test for low-bit optim with torch nightly FSDP 2 low bit optim broken on pytorch nightlies #652
Remove notes on slow compile time

TODO:

Update benchmarks with ViT-H (benchmarks/benchmark_low_bit_adam.py)
Update benchmarks with Llama2-7B ([Low-bit optim] Add Llama2-7B finetune benchmarks #746)

Llama2-7B benchmarks

Fine-tune Llama2-7B on Alpaca dataset. PyTorch 2.4, full BF16, 1 epoch, A100, fixed random seed. Benchmark is done with torchtune 52d1b838.

AdamW impl	Peak memory allocated (GB)	toks/s	`truthfulqa_mc2` acc
Not fine-tuned	-	-	38.95
PyTorch (fused)	51.6	3200	42.61
bnb 8-bit	39.3	3000	42.75
ao 8-bit	39.1	2900	41.50
ao 4-bit	33.2	2900	42.27

NOTE: lpmm's 4-bit AdamW does not support BF16 weights.

pytorch-bot · 2024-09-05T02:42:51Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/812

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit d144f42 with merge base 599319f ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

…bit optim (#812) * disable recompile limit * remove _prepare_param_groups() * re-enable FSDP test. update ViT benchmarks * update * update * update readme

* clean up unused files * fix tests: HF TOKEN not available on-pr, add evaluation.md to tests * markup docs * fix evaluations.md * add markup to native execution md * install wget for gguf.md testing, prevent evaluation.md failures * remove secrets from yml files * update * remove copy pasta from macosand macos-mps tests * typo * format

disable recompile limit

7df0fb0

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 5, 2024

gau-nernst marked this pull request as draft September 5, 2024 02:43

gau-nernst added 6 commits September 5, 2024 10:56

remove _prepare_param_groups()

657cb7f

re-enable FSDP test. update ViT benchmarks

9ed5069

update

4470d79

Merge branch 'main' into low_bit_optim_fix

ca120c9

update

f630e7a

update readme

d144f42

gau-nernst marked this pull request as ready for review September 5, 2024 09:56

gau-nernst requested a review from msaroufim September 5, 2024 11:35

msaroufim approved these changes Sep 5, 2024

View reviewed changes

msaroufim merged commit 1e7f132 into pytorch:main Sep 5, 2024
17 checks passed

gau-nernst deleted the low_bit_optim_fix branch September 5, 2024 16:02

gau-nernst mentioned this pull request Sep 5, 2024

FSDP 2 low bit optim broken on pytorch nightlies #652

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Low-bit optim] Improve compile time + Fix PyTorch 2.3 support for 4-bit optim #812

[Low-bit optim] Improve compile time + Fix PyTorch 2.3 support for 4-bit optim #812

gau-nernst commented Sep 5, 2024 •

edited

Loading

pytorch-bot bot commented Sep 5, 2024 •

edited

Loading

[Low-bit optim] Improve compile time + Fix PyTorch 2.3 support for 4-bit optim #812

[Low-bit optim] Improve compile time + Fix PyTorch 2.3 support for 4-bit optim #812

Conversation

gau-nernst commented Sep 5, 2024 • edited Loading

Llama2-7B benchmarks

pytorch-bot bot commented Sep 5, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/812

✅ No Failures

gau-nernst commented Sep 5, 2024 •

edited

Loading

pytorch-bot bot commented Sep 5, 2024 •

edited

Loading