Remove input_quant_func from AffineQuantizedTensor subclass #243

jerryzh168 · 2024-05-15T01:21:35Z

Summary:
Currently we have a input_quant_func in the AffineQuantizedTensor, which is a bit convoluted, we want to use a separate LinearActAffineQuantizedTensor subclass for activation quantization (dynamic quantization) instead

also added dispatch for int8act-int8 weight dynamic quantization that's calling int_scaled_matmul kernel in the end

Test Plan:
python test/quantization/test_quant_api.py -k test_quantized_tensor_subclass_8da4w
python test/quantization/test_quant_api.py -k test_quantized_tensor_subclass_int8_dyn_quant

Reviewers:

Subscribers:

Tasks:

Tags:

pytorch-bot · 2024-05-15T01:21:37Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/243

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit 166353f with merge base cae3d82 ():

NEW FAILURE - The following job has failed:

.github/workflows/build.yml (gh)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Summary: Currently we have a input_quant_func in the AffineQuantizedTensor, which is a bit convoluted, we want to use a separate LinearActAffineQuantizedTensor subclass for activation quantization (dynamic quantization) instead Test Plan: python test/quantization/test_quant_api.py -k test_quantized_tensor_subclass_8da4w Reviewers: Subscribers: Tasks: Tags:

Summary: This PR added dispatch for int8act-int8 weight dynamic quantization that's calling `int_scaled_matmul` kernel in the end Test Plan: python test/quantization/test_quant_api.py -k test_quantized_tensor_subclass_int8_dyn_quant Reviewers: Subscribers: Tasks: Tags:

torchao/quantization/subclass.py

cpuhrsch

Great :) Let's move AffineQuantizedTensor into dtypes next and create a PyTorch style conversion function? We should also not need to use torch_function to overwrite linear, but it makes sense to do it as a follow up because it'll require us to add support for detach, view, addmm, etc. to AffineQuantizedTensor

jerryzh168 · 2024-05-15T23:57:47Z

Great :) Let's move AffineQuantizedTensor into dtypes next and create a PyTorch style conversion function? We should also not need to use torch_function to overwrite linear, but it makes sense to do it as a follow up because it'll require us to add support for detach, view, addmm, etc. to AffineQuantizedTensor

sounds good. main thing is transpose, we need to think about how to support that with the scales/zero_point and block_size arg

) * Remove input_quant_func from AffineQuantizedTensor subclass Summary: Currently we have a input_quant_func in the AffineQuantizedTensor, which is a bit convoluted, we want to use a separate LinearActAffineQuantizedTensor subclass for activation quantization (dynamic quantization) instead Test Plan: python test/quantization/test_quant_api.py -k test_quantized_tensor_subclass_8da4w Reviewers: Subscribers: Tasks: Tags: * Add dispatch for dynamic quantization in `AffineQuantizedTensor` Summary: This PR added dispatch for int8act-int8 weight dynamic quantization that's calling `int_scaled_matmul` kernel in the end Test Plan: python test/quantization/test_quant_api.py -k test_quantized_tensor_subclass_int8_dyn_quant Reviewers: Subscribers: Tasks: Tags: * Fix test

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 15, 2024

jerryzh168 force-pushed the dyn_quant branch from 0691839 to d71a65a Compare May 15, 2024 01:34

jerryzh168 requested review from cpuhrsch, HDCharles, msaroufim and andrewor14 May 15, 2024 22:25

cpuhrsch reviewed May 15, 2024

View reviewed changes

torchao/quantization/subclass.py Outdated Show resolved Hide resolved

cpuhrsch reviewed May 15, 2024

View reviewed changes

torchao/quantization/subclass.py Outdated Show resolved Hide resolved

cpuhrsch approved these changes May 15, 2024

View reviewed changes

Fix test

b43bce7

jerryzh168 force-pushed the dyn_quant branch from be66b2f to b43bce7 Compare May 16, 2024 00:08

Merge branch 'main' into dyn_quant

166353f

jerryzh168 merged commit cda787c into pytorch:main May 16, 2024
13 checks passed

jerryzh168 deleted the dyn_quant branch May 16, 2024 00:45

yanbing-j pushed a commit to yanbing-j/ao that referenced this pull request Dec 9, 2024

kludge workaround for AOTI fail on x86 Linux (pytorch#243)

3b213b0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove input_quant_func from AffineQuantizedTensor subclass #243

Remove input_quant_func from AffineQuantizedTensor subclass #243

jerryzh168 commented May 15, 2024 •

edited

Loading

pytorch-bot bot commented May 15, 2024 •

edited

Loading

cpuhrsch left a comment

jerryzh168 commented May 15, 2024

Remove input_quant_func from AffineQuantizedTensor subclass #243

Remove input_quant_func from AffineQuantizedTensor subclass #243

Conversation

jerryzh168 commented May 15, 2024 • edited Loading

pytorch-bot bot commented May 15, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/243

❌ 1 New Failure

cpuhrsch left a comment

Choose a reason for hiding this comment

jerryzh168 commented May 15, 2024

jerryzh168 commented May 15, 2024 •

edited

Loading

pytorch-bot bot commented May 15, 2024 •

edited

Loading