Allow Int4WeightOnlyQuantizer to set different dtype for scales_and_zeros #479

larryliu0820 · 2024-07-05T18:53:58Z

As titled. Currently Int4WeightOnlyQuantizer is hardcoded to return scales_and_zeros with dtype torch.bfloat16. Adding dtype argument into the flow so that it can be different dtype.

scales_and_zeros As titled. Currently `Int4WeightOnlyQuantizer` is hardcoded to return `scales_and_zeros` with dtype `torch.bfloat16`. Adding `dtype` argument into the flow so that it can be different dtype.

pytorch-bot · 2024-07-05T18:54:01Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/479

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit f3c320a with merge base a35a1cd ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

larryliu0820 · 2024-07-05T18:54:44Z

torchao/quantization/GPTQ.py

    ) -> None:
        super().__init__()
        self.padding = not _check_linear_int4_k(in_features, groupsize, inner_k_tiles)
        if self.padding:
-            from model import find_multiple


I don't think there's a module called model

Thanks I think this is a relic of when gptq was more deeply coupled with gpt-fast

msaroufim · 2024-07-05T20:51:10Z

This seems fine to merge although I do worry that most of our gptq tests are disabled right now in test/quantization/test_quant.api.py

torchao/quantization/GPTQ.py

msaroufim

Mostly looks fine but FYI we don't really have anyone maintaining the gptq example so if there's a use-case for it please let me know

larryliu0820 · 2024-07-05T21:04:01Z

Mostly looks fine but FYI we don't really have anyone maintaining the gptq example so if there's a use-case for it please let me know

I'm migrating torchchat to use these APIs, to be prepared for shared kernels across ET and PyTorch eager/compile.

…eros (pytorch#479) * Allow Int4WeightOnlyQuantizer to set different dtype for scales_and_zeros As titled. Currently `Int4WeightOnlyQuantizer` is hardcoded to return `scales_and_zeros` with dtype `torch.bfloat16`. Adding `dtype` argument into the flow so that it can be different dtype. * Add comment

readme update

* Update quantize.py to use torchao Quantizers Summary: Remove duplicate code for Int4WeightOnlyQuantizer and Int8DynActInt4WeightQuantizer and use torchao API. Test Plan: ``` python torchchat.py generate llama2 --quantize '{"linear:int4": {"groupsize": 256}, "precision": {"dtype":"float16"}, "executor":{"accelerator":"cpu"}}' --prompt "Once upon a time," --max-new-tokens 256 python torchchat.py generate llama2 --quantize '{"linear:a8w4dq": {"groupsize": 256}, "precision": {"dtype":"float16"}, "executor":{"accelerator":"cpu"}}' --prompt "Once upon a time," --max-new-tokens 256 ``` Reviewers: Subscribers: Tasks: Tags: * Fix import Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Install torchao from gh * Explain import Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Fix dependencies * Test ao PR pytorch#479 * Update torchao hash * Update torchao pin * Fix scheduler bf16/fp16 mix error * Incorporate torchao changes Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * update hash * Fix GPU CI job Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * More fix * Fix executorch CI job * Use quant api for int4 weight only quantization Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Fix Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Fix again Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Fix 3 * Fix 4 * Try something * debug * Only migrate 8a4w --------- Co-authored-by: Jack Zhang <[email protected]>

Allow Int4WeightOnlyQuantizer to set different dtype for

df3b18a

scales_and_zeros As titled. Currently `Int4WeightOnlyQuantizer` is hardcoded to return `scales_and_zeros` with dtype `torch.bfloat16`. Adding `dtype` argument into the flow so that it can be different dtype.

larryliu0820 requested review from jerryzh168 and HDCharles July 5, 2024 18:54

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 5, 2024

larryliu0820 requested a review from msaroufim July 5, 2024 18:54

larryliu0820 commented Jul 5, 2024

View reviewed changes

msaroufim reviewed Jul 5, 2024

View reviewed changes

torchao/quantization/GPTQ.py Outdated Show resolved Hide resolved

msaroufim self-requested a review July 5, 2024 20:56

msaroufim approved these changes Jul 5, 2024

View reviewed changes

Add comment

f3c320a

larryliu0820 merged commit 9f85488 into main Jul 5, 2024
13 checks passed

msaroufim deleted the quant_dtype branch July 5, 2024 21:55

yanbing-j pushed a commit to yanbing-j/ao that referenced this pull request Dec 9, 2024

Update README.md (pytorch#479)

1ecc094

readme update

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow Int4WeightOnlyQuantizer to set different dtype for scales_and_zeros #479

Allow Int4WeightOnlyQuantizer to set different dtype for scales_and_zeros #479

larryliu0820 commented Jul 5, 2024

pytorch-bot bot commented Jul 5, 2024 •

edited

Loading

larryliu0820 Jul 5, 2024

msaroufim Jul 5, 2024

msaroufim commented Jul 5, 2024

msaroufim left a comment

larryliu0820 commented Jul 5, 2024

Allow Int4WeightOnlyQuantizer to set different dtype for scales_and_zeros #479

Allow Int4WeightOnlyQuantizer to set different dtype for scales_and_zeros #479

Conversation

larryliu0820 commented Jul 5, 2024

pytorch-bot bot commented Jul 5, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/479

✅ No Failures

larryliu0820 Jul 5, 2024

Choose a reason for hiding this comment

msaroufim Jul 5, 2024

Choose a reason for hiding this comment

msaroufim commented Jul 5, 2024

msaroufim left a comment

Choose a reason for hiding this comment

larryliu0820 commented Jul 5, 2024

pytorch-bot bot commented Jul 5, 2024 •

edited

Loading