Enhance Auto-Round #870

yiliu30 · 2024-09-11T03:16:14Z

This PR includes several enhancements for Auto-round:

1. Bring `torch.compile` to speed up the `Auto-round` optimization process:

python torchao/prototype/autoround/autoround_llm.py -c

meta-llama/Llama-2-7b-chat-hf, about 1.29x , 27 sec/block -> 21 sec/block
meta-llama/Meta-Llama-3.1-8B-Instruct, about 1.23x, 32 sec/block -> 26 sec/block

yiliu30#18

2. Add `AO_USE_DETERMINISTIC_ALGORITHMS` for reproducing the `lm-eval` results:

AO_USE_DETERMINISTIC_ALGORITHMS=1 python torchao/prototype/autoround/eval_autoround.py

yiliu30#19

3. Expose `gradient_accumulate_steps` to users and update results:

For meta-llama/Meta-Llama-3.1-8B-Instruct, here are the updated results:

	Avg.	Mmlu	Piqa	Winogrande	Hellaswag	Lambada_openai
bf16	0.7080	0.6783	0.8003	0.7403	0.5910	0.7303
torchao-int4wo	0.6883	0.6363	0.7938	0.7348	0.5784	0.6980
autoround-4bit	0.6996	0.6669	0.7916	0.7285	0.5846	0.7262
autoround-4bit*	0.7010	0.6621	0.7976	0.7316	0.5847	0.7291

For more details, please refer README.md

yiliu30#20

TODO:

Add gradient_accumulate_steps and compile_optimization_process to eval.py after Move autoround from generate.py to eval.py #868 lands. Done by a5722d8

cc @wenhuach21 @thuang6

Signed-off-by: yiliu30 <[email protected]>

pytorch-bot · 2024-09-11T03:16:17Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/870

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 47103a5 with merge base bd264f9 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

wenhuach21 · 2024-09-11T04:11:39Z

The reason why accumualte_gradient is better is:
to better align with the ao's API, we switched from random sampling in our original training implementation to using fixed samples within each batch, which reduces flexibility. However, gradient accumulation helps to partially recover this flexibility.

yiliu30 · 2024-09-11T05:13:47Z

Hi @jerryzh168, could you please take a look, thanks!

torchao/prototype/autoround/README.md

torchao/prototype/autoround/core.py

jerryzh168

LGTM!

Signed-off-by: yiliu30 <[email protected]>

* Bring `torch.compile` to `quant_block_v2_`. (#18) Signed-off-by: yiliu30 <[email protected]> * Add `AO_USE_DETERMINISTIC_ALGORITHMS` for reproducing results (#19) Signed-off-by: yiliu30 <[email protected]> * Add `gradient_accumulate_steps` and update results (#20) Signed-off-by: yiliu30 <[email protected]> * update the readme Signed-off-by: yiliu30 <[email protected]> * udpate Signed-off-by: yiliu30 <[email protected]> * update the desc Signed-off-by: yiliu30 <[email protected]> * rename `train_bs` to `batch_size` Signed-off-by: yiliu30 <[email protected]> * update the eval Signed-off-by: yiliu30 <[email protected]> * update Signed-off-by: yiliu30 <[email protected]> --------- Signed-off-by: yiliu30 <[email protected]>

yiliu30 added 3 commits September 11, 2024 09:52

Bring torch.compile to quant_block_v2_. (#18)

61a1d31

Signed-off-by: yiliu30 <[email protected]>

Add AO_USE_DETERMINISTIC_ALGORITHMS for reproducing results (#19)

77e5dcc

Signed-off-by: yiliu30 <[email protected]>

Add gradient_accumulate_steps and update results (#20)

b3c70ff

Signed-off-by: yiliu30 <[email protected]>

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 11, 2024

yiliu30 marked this pull request as ready for review September 11, 2024 05:16

wenhuach21 mentioned this pull request Sep 11, 2024

[New feature Request]Add Support for torch.compile model intel/auto-round#251

Closed

jerryzh168 reviewed Sep 13, 2024

View reviewed changes

torchao/prototype/autoround/README.md Outdated Show resolved Hide resolved

jerryzh168 reviewed Sep 13, 2024

View reviewed changes

torchao/prototype/autoround/core.py Outdated Show resolved Hide resolved

jerryzh168 approved these changes Sep 13, 2024

View reviewed changes

yiliu30 added 7 commits September 13, 2024 05:29

update the readme

27c7671

Signed-off-by: yiliu30 <[email protected]>

udpate

00530e1

Signed-off-by: yiliu30 <[email protected]>

update the desc

35a9be2

Signed-off-by: yiliu30 <[email protected]>

Merge branch 'main' into enhance/autoround

3a2c299

rename train_bs to batch_size

a92b143

Signed-off-by: yiliu30 <[email protected]>

update the eval

a5722d8

Signed-off-by: yiliu30 <[email protected]>

update

47103a5

Signed-off-by: yiliu30 <[email protected]>

jerryzh168 merged commit 85a6113 into pytorch:main Sep 18, 2024
17 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhance Auto-Round #870

Enhance Auto-Round #870

yiliu30 commented Sep 11, 2024 •

edited

Loading

pytorch-bot bot commented Sep 11, 2024 •

edited

Loading

wenhuach21 commented Sep 11, 2024

yiliu30 commented Sep 11, 2024

jerryzh168 left a comment

Enhance Auto-Round #870

Enhance Auto-Round #870

Conversation

yiliu30 commented Sep 11, 2024 • edited Loading

1. Bring torch.compile to speed up the Auto-round optimization process:

2. Add AO_USE_DETERMINISTIC_ALGORITHMS for reproducing the lm-eval results:

3. Expose gradient_accumulate_steps to users and update results:

TODO:

pytorch-bot bot commented Sep 11, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/870

✅ No Failures

wenhuach21 commented Sep 11, 2024

yiliu30 commented Sep 11, 2024

jerryzh168 left a comment

Choose a reason for hiding this comment

yiliu30 commented Sep 11, 2024 •

edited

Loading

1. Bring `torch.compile` to speed up the `Auto-round` optimization process:

2. Add `AO_USE_DETERMINISTIC_ALGORITHMS` for reproducing the `lm-eval` results:

3. Expose `gradient_accumulate_steps` to users and update results:

pytorch-bot bot commented Sep 11, 2024 •

edited

Loading