Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more options in choose_qparams_affine for tinygemm op #227

Merged
merged 2 commits into from
May 9, 2024

Conversation

jerryzh168
Copy link
Contributor

@jerryzh168 jerryzh168 commented May 7, 2024

Summary:
This is in preparation for replacing tinygemm q/dq ops with the unified quant primitive ops

tinygemm choose_qparams op (and also quantize/dequantize op): https://github.com/pytorch/ao/blob/main/torchao/quantization/quant_primitives.py#L36 is different from the other choose_qparams op in that it does not enforce zero_point to be exactly representable, meaning the floating point value 0 can't be exactly represented by an integer value in quantized tensor. This PR adds a flag to produce a zero_point value that can be adapted to be used by tinygemm kernels.

Test Plan:
python test/quantization/test_quant_primitives.py -k test_tinygemm_get_groupwise_affine_qparams

Reviewers:

Subscribers:

Tasks:

Tags:

Copy link

pytorch-bot bot commented May 7, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/227

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit a0458db with merge base f6d56ca (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 7, 2024
Copy link
Contributor

@HDCharles HDCharles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to compare with the existing numerics of choosing qparams and q/dq. If we're introducing error for no benefit then i don't thing we should deprecate the other primitives. At the very least we need to discuss the functional changes first.

also need to test whether these primitives also work with gptq, there are nuances that i'm not sure are being captured.

@jerryzh168 jerryzh168 changed the title Add test for choose_qparams for tinygemm ops Add more options in choose_qparams_affine for tinygemm op May 8, 2024
@jerryzh168 jerryzh168 requested a review from HDCharles May 8, 2024 00:58
@jerryzh168
Copy link
Contributor Author

we need to compare with the existing numerics of choosing qparams and q/dq. If we're introducing error for no benefit then i don't thing we should deprecate the other primitives. At the very least we need to discuss the functional changes first.

also need to test whether these primitives also work with gptq, there are nuances that i'm not sure are being captured.

sounds good, updated the op to work with tinygemm numerics.

for gptq and other types of e2e quant tests, we could establish a dashboard for perf and accuracy first, we can prioritize this in 0.3

@jerryzh168 jerryzh168 force-pushed the tinygemm branch 5 times, most recently from 8d76b57 to 40d5a75 Compare May 8, 2024 16:42
@jerryzh168 jerryzh168 dismissed HDCharles’s stale review May 8, 2024 16:45

updated the op to close the accuracy gap, please take a look again

(scale_ao, _) = get_group_qparams_symmetric(weight, n_bit, groupsize)
torch.testing.assert_allclose(scale_obs, scale_ao, rtol=0, atol=0)
(scale_ao, _) = get_group_qparams_symmetric(weight, n_bit, groupsize, precision=torch.float16)
torch.testing.assert_close(scale_obs, scale_ao, rtol=0, atol=0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mean assert_close? I saw assert_allclose is deprecated, that's why I updated them

target_dtype (torch.dtype): dtype for target quantized Tensor
quant_min (Optional[int]): minimum quantized value for target quantized Tensor
quant_max (Optioanl[int]): maximum quantized value for target quantized Tensor
eps (Optional[float]): minimum scale, if not provided, default to eps of input.dtype
scale_dtype (torch.dtype): dtype for scale Tensor
zero_point_dtype (torch.dtype): dtype for zero_point Tensor
_is_zero_exactly_representable (bool): a private flag to indicate whether we need zero to be exactly
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. I'd expect symmetric without zero_point to imply this. Is that true?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is talking about whether floating point value 0 is exactly representable by a quantized value or not, it's not related to symmetric/asymmetric quant. this is assumed by most of the existing quantized kernels

target_dtype (torch.dtype): dtype for target quantized Tensor
quant_min (Optional[int]): minimum quantized value for target quantized Tensor
quant_max (Optioanl[int]): maximum quantized value for target quantized Tensor
eps (Optional[float]): minimum scale, if not provided, default to eps of input.dtype
scale_dtype (torch.dtype): dtype for scale Tensor
zero_point_dtype (torch.dtype): dtype for zero_point Tensor
is_exact_zero (bool): a flag to indicate whether we need zero to be exactly
Copy link
Contributor

@cpuhrsch cpuhrsch May 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Can you also define what "zero being exactly representable" means for the outputs? You can add zero_padding as an example of when this is useful. Also I'd remove the is_ part and choose preserve_zero or some other verb. Since this is a kwarg to a function and not a property of a class it seems more consistent.

Summary:
This is in preparation for replacing tinygemm q/dq ops with the unified quant primitive ops

Test Plan:
python test/quantization/test_quant_primitives.py -k test_tinygemm_get_groupwise_affine_qparams

Reviewers:

Subscribers:

Tasks:

Tags:
@jerryzh168 jerryzh168 merged commit b91b6be into pytorch:main May 9, 2024
13 checks passed
@jerryzh168 jerryzh168 deleted the tinygemm branch May 9, 2024 02:46
dbyoung18 pushed a commit to dbyoung18/ao that referenced this pull request Jul 31, 2024
Summary:
This is in preparation for replacing tinygemm q/dq ops with the unified quant primitive ops

Test Plan:
python test/quantization/test_quant_primitives.py -k test_tinygemm_get_groupwise_affine_qparams

Reviewers:

Subscribers:

Tasks:

Tags:
yanbing-j pushed a commit to yanbing-j/ao that referenced this pull request Dec 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants