-
Notifications
You must be signed in to change notification settings - Fork 177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhance Auto-Round #870
Enhance Auto-Round #870
Conversation
Signed-off-by: yiliu30 <[email protected]>
Signed-off-by: yiliu30 <[email protected]>
Signed-off-by: yiliu30 <[email protected]>
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/870
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 47103a5 with merge base bd264f9 (): This comment was automatically generated by Dr. CI and updates every 15 minutes. |
The reason why accumualte_gradient is better is: |
Hi @jerryzh168, could you please take a look, thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Signed-off-by: yiliu30 <[email protected]>
Signed-off-by: yiliu30 <[email protected]>
Signed-off-by: yiliu30 <[email protected]>
Signed-off-by: yiliu30 <[email protected]>
Signed-off-by: yiliu30 <[email protected]>
Signed-off-by: yiliu30 <[email protected]>
* Bring `torch.compile` to `quant_block_v2_`. (#18) Signed-off-by: yiliu30 <[email protected]> * Add `AO_USE_DETERMINISTIC_ALGORITHMS` for reproducing results (#19) Signed-off-by: yiliu30 <[email protected]> * Add `gradient_accumulate_steps` and update results (#20) Signed-off-by: yiliu30 <[email protected]> * update the readme Signed-off-by: yiliu30 <[email protected]> * udpate Signed-off-by: yiliu30 <[email protected]> * update the desc Signed-off-by: yiliu30 <[email protected]> * rename `train_bs` to `batch_size` Signed-off-by: yiliu30 <[email protected]> * update the eval Signed-off-by: yiliu30 <[email protected]> * update Signed-off-by: yiliu30 <[email protected]> --------- Signed-off-by: yiliu30 <[email protected]>
* Bring `torch.compile` to `quant_block_v2_`. (#18) Signed-off-by: yiliu30 <[email protected]> * Add `AO_USE_DETERMINISTIC_ALGORITHMS` for reproducing results (#19) Signed-off-by: yiliu30 <[email protected]> * Add `gradient_accumulate_steps` and update results (#20) Signed-off-by: yiliu30 <[email protected]> * update the readme Signed-off-by: yiliu30 <[email protected]> * udpate Signed-off-by: yiliu30 <[email protected]> * update the desc Signed-off-by: yiliu30 <[email protected]> * rename `train_bs` to `batch_size` Signed-off-by: yiliu30 <[email protected]> * update the eval Signed-off-by: yiliu30 <[email protected]> * update Signed-off-by: yiliu30 <[email protected]> --------- Signed-off-by: yiliu30 <[email protected]>
This PR includes several enhancements for
Auto-round
:1. Bring
torch.compile
to speed up theAuto-round
optimization process:meta-llama/Llama-2-7b-chat-hf
, about 1.29x , 27 sec/block -> 21 sec/blockmeta-llama/Meta-Llama-3.1-8B-Instruct
, about 1.23x, 32 sec/block -> 26 sec/blockyiliu30#18
2. Add
AO_USE_DETERMINISTIC_ALGORITHMS
for reproducing thelm-eval
results:yiliu30#19
3. Expose
gradient_accumulate_steps
to users and update results:meta-llama/Meta-Llama-3.1-8B-Instruct
, here are the updated results:For more details, please refer README.md
yiliu30#20
TODO:
gradient_accumulate_steps
andcompile_optimization_process
toeval.py
after Moveautoround
fromgenerate.py
toeval.py
#868 lands. Done by a5722d8cc @wenhuach21 @thuang6