-
Notifications
You must be signed in to change notification settings - Fork 185
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow Int4WeightOnlyQuantizer to set different dtype for scales_and_zeros #479
Conversation
scales_and_zeros As titled. Currently `Int4WeightOnlyQuantizer` is hardcoded to return `scales_and_zeros` with dtype `torch.bfloat16`. Adding `dtype` argument into the flow so that it can be different dtype.
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/479
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit f3c320a with merge base a35a1cd (): This comment was automatically generated by Dr. CI and updates every 15 minutes. |
) -> None: | ||
super().__init__() | ||
self.padding = not _check_linear_int4_k(in_features, groupsize, inner_k_tiles) | ||
if self.padding: | ||
from model import find_multiple |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there's a module called model
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks I think this is a relic of when gptq was more deeply coupled with gpt-fast
This seems fine to merge although I do worry that most of our gptq tests are disabled right now in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly looks fine but FYI we don't really have anyone maintaining the gptq example so if there's a use-case for it please let me know
I'm migrating torchchat to use these APIs, to be prepared for shared kernels across ET and PyTorch eager/compile. |
…eros (pytorch#479) * Allow Int4WeightOnlyQuantizer to set different dtype for scales_and_zeros As titled. Currently `Int4WeightOnlyQuantizer` is hardcoded to return `scales_and_zeros` with dtype `torch.bfloat16`. Adding `dtype` argument into the flow so that it can be different dtype. * Add comment
readme update
* Update quantize.py to use torchao Quantizers Summary: Remove duplicate code for Int4WeightOnlyQuantizer and Int8DynActInt4WeightQuantizer and use torchao API. Test Plan: ``` python torchchat.py generate llama2 --quantize '{"linear:int4": {"groupsize": 256}, "precision": {"dtype":"float16"}, "executor":{"accelerator":"cpu"}}' --prompt "Once upon a time," --max-new-tokens 256 python torchchat.py generate llama2 --quantize '{"linear:a8w4dq": {"groupsize": 256}, "precision": {"dtype":"float16"}, "executor":{"accelerator":"cpu"}}' --prompt "Once upon a time," --max-new-tokens 256 ``` Reviewers: Subscribers: Tasks: Tags: * Fix import Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Install torchao from gh * Explain import Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Fix dependencies * Test ao PR pytorch#479 * Update torchao hash * Update torchao pin * Fix scheduler bf16/fp16 mix error * Incorporate torchao changes Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * update hash * Fix GPU CI job Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * More fix * Fix executorch CI job * Use quant api for int4 weight only quantization Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Fix Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Fix again Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Fix 3 * Fix 4 * Try something * debug * Only migrate 8a4w --------- Co-authored-by: Jack Zhang <[email protected]>
As titled. Currently
Int4WeightOnlyQuantizer
is hardcoded to returnscales_and_zeros
with dtypetorch.bfloat16
. Addingdtype
argument into the flow so that it can be different dtype.