deduplicate code for some torchao q/dq ops #173

jerryzh168 · 2024-04-24T23:46:27Z

Summary:
This just removes the implementation, we can have follow up PRs to remove the call all together after we have replaced all implementation with the new blockwise quant code

get_group_qparams_symmetric
dynamically_quantize_per_tensor
dynamically_quantize_per_channel
dequantize_per_tensor
dequantize_per_channel

Note that there are some tinygemm specific ops that calcualtes zero_point in float domain, we could think about how to replace later, e.g. we can have a flag to indicate whether we calculate zero_point in float domain or quantized domain

Test Plan:
CI

Reviewers:

Subscribers:

Tasks:

Tags:

cpuhrsch · 2024-04-25T02:42:30Z

Do you want to try applying this more APIs in one PR? I think this function isn't used in a torch.compile context so we can't get signal whether the new API works well with compile.

mikekgfb

Thank you!

jerryzh168 · 2024-04-25T03:05:53Z

Do you want to try applying this more APIs in one PR? I think this function isn't used in a torch.compile context so we can't get signal whether the new API works well with compile.

OK sure, I can apply this to the rest tomorrow

torchao/quantization/quant_primitives.py

cpuhrsch · 2024-04-26T06:45:29Z

torchao/quantization/quant_primitives.py

-    if zero_point is not None:
-        y -= zero_point
-    return y * scale
+    eps = torch.finfo(torch.float32).eps


This seems unused. Just in case you meant to use it. We could add a linter to CI to help catch this, but is not super important at the moment. I'll add it for the list for 0.3.

torchao/quantization/quant_primitives.py

cpuhrsch · 2024-04-26T06:47:55Z

torchao/quantization/quant_primitives.py

-    quant = torch.clamp(x_zp, quant_min, quant_max).to(target_dtype)
+    eps = torch.finfo(torch.float32).eps
+    block_size = (1, x.shape[1])
+    scale_dtype = torch.float32


Should we also tie this to x.dtype? Previous code did scale = torch.clamp(scale, min=eps).to(x.dtype).

maybe, I'm not sure if we'll have use cases that has a different dtype though, maybe I can make this default to x.dtype instead of torch.float32?

I'm not sure. If we always expect float32 then we could add an assert just so it doesn't fail.

I just changed this to x.dtype in choose_qparams_affine

eps has to be float32's eps to pass the tests, I guess we could change later

torchao/quantization/quant_primitives.py

cpuhrsch

Please see comments around use sites. I also it's worthwhile to compare the output of TORCH_LOGS='output_code' for one or two of these to see if the resulting code is still fused.

jerryzh168 · 2024-04-26T17:31:08Z

@cpuhrsch I'll need to fix the CI first btw, but thanks for the review

jerryzh168 · 2024-04-26T17:39:58Z

Please see comments around use sites. I also it's worthwhile to compare the output of TORCH_LOGS='output_code' for one or two of these to see if the resulting code is still fused.

do we have performance benchmarks for these things?

cpuhrsch · 2024-04-26T19:35:58Z

@jerryzh168 - mostly within other repositories (aside from https://github.com/pytorch/ao/tree/739e62d197b25d40422fe23fad3df2c7d2efb9d7/tutorials/quantize_vit). But if the refactor here ends up generating the same code, it should perform the same way. We can optimize more after.

jerryzh168 · 2024-04-29T20:11:51Z

verified quantize_vit gives the same result: https://www.diffchecker.com/DoqCSkRC/

addressed comments

Summary: This just removes the implementation, we can have follow up PRs to remove the call all together after we have replaced all implementation with the new blockwise quant code Test Plan: CI Reviewers: Subscribers: Tasks: Tags:

cpuhrsch · 2024-04-29T21:34:50Z

torchao/quantization/quant_primitives.py

+    zero_point_dtype = torch.int32
+
+    qscheme_to_mapping_type = {
+        torch.per_tensor_affine: MappingType.ASYMMETRIC,


Very, very nit: Hm, I wondering if MappingType is the right name... - We can definitely do this in a follow up.

so MappingType means how we map from floating point to quantized values. I'm open to other suggestions as well. although we may remove this and just split the function into two in the future, so we could discuss this a little bit later (after we verified this with executorch)

cpuhrsch · 2024-04-30T23:39:19Z

torchao/quantization/subclass.py

@@ -218,8 +218,11 @@ def dequantize(self, dtype=None):
        """
        Obtain the dequantized version of the quantized tensor subclass
        """
+        zero_points = torch.zeros(self.q_scales.shape, device=self.q_scales.device, dtype=self.q_scales.dtype)


I'm surprised this didn't cause a regression. Seems like a big change.

Summary: This just removes the implementation, we can have follow up PRs to remove the call all together after we have replaced all implementation with the new blockwise quant code Test Plan: CI Reviewers: Subscribers: Tasks: Tags: Co-authored-by: cpuhrsch <[email protected]>

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 24, 2024

jerryzh168 force-pushed the dedup-1 branch 2 times, most recently from 68dc554 to 28bd36c Compare April 24, 2024 23:58

jerryzh168 requested review from cpuhrsch and HDCharles April 25, 2024 00:00

jerryzh168 force-pushed the dedup-1 branch 4 times, most recently from d181692 to b969c58 Compare April 25, 2024 00:55

mikekgfb approved these changes Apr 25, 2024

View reviewed changes

jerryzh168 force-pushed the dedup-1 branch from c22e9e7 to 02250a0 Compare April 25, 2024 20:52

jerryzh168 changed the title ~~deduplicate code for get_group_qparams_symmetric~~ deduplicate code for some torchao q/dq ops Apr 25, 2024

jerryzh168 force-pushed the dedup-1 branch from 02250a0 to 10a8c5b Compare April 26, 2024 05:08

cpuhrsch reviewed Apr 26, 2024

View reviewed changes

torchao/quantization/quant_primitives.py Outdated Show resolved Hide resolved

cpuhrsch reviewed Apr 26, 2024

View reviewed changes

torchao/quantization/quant_primitives.py Outdated Show resolved Hide resolved

cpuhrsch reviewed Apr 26, 2024

View reviewed changes

torchao/quantization/quant_primitives.py Show resolved Hide resolved

cpuhrsch reviewed Apr 26, 2024

View reviewed changes

torchao/quantization/quant_primitives.py Outdated Show resolved Hide resolved

cpuhrsch previously requested changes Apr 26, 2024

View reviewed changes

jerryzh168 force-pushed the dedup-1 branch from 7a9d22f to 0464b30 Compare April 29, 2024 19:20

jerryzh168 requested a review from cpuhrsch April 29, 2024 20:11

deduplicate code for get_group_qparams_symmetric

05ec481

Summary: This just removes the implementation, we can have follow up PRs to remove the call all together after we have replaced all implementation with the new blockwise quant code Test Plan: CI Reviewers: Subscribers: Tasks: Tags:

jerryzh168 force-pushed the dedup-1 branch from cfb11b6 to 05ec481 Compare April 29, 2024 20:16

cpuhrsch approved these changes Apr 29, 2024

View reviewed changes

Merge branch 'main' into dedup-1

519c47f

cpuhrsch reviewed Apr 29, 2024

View reviewed changes

jerryzh168 merged commit 6bcf244 into pytorch:main Apr 29, 2024
13 checks passed

cpuhrsch reviewed Apr 30, 2024

View reviewed changes

yanbing-j pushed a commit to yanbing-j/ao that referenced this pull request Dec 9, 2024

Skip download of safetensor files from hf (pytorch#173)

8be8516

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

deduplicate code for some torchao q/dq ops #173

deduplicate code for some torchao q/dq ops #173

jerryzh168 commented Apr 24, 2024 •

edited

Loading

cpuhrsch commented Apr 25, 2024

mikekgfb left a comment

jerryzh168 commented Apr 25, 2024

cpuhrsch Apr 26, 2024

cpuhrsch Apr 26, 2024

jerryzh168 Apr 26, 2024

cpuhrsch Apr 26, 2024

jerryzh168 Apr 26, 2024

jerryzh168 Apr 29, 2024

cpuhrsch left a comment

jerryzh168 commented Apr 26, 2024

jerryzh168 commented Apr 26, 2024

cpuhrsch commented Apr 26, 2024

jerryzh168 commented Apr 29, 2024

cpuhrsch Apr 29, 2024

jerryzh168 Apr 29, 2024

cpuhrsch Apr 30, 2024

deduplicate code for some torchao q/dq ops #173

deduplicate code for some torchao q/dq ops #173

Conversation

jerryzh168 commented Apr 24, 2024 • edited Loading

cpuhrsch commented Apr 25, 2024

mikekgfb left a comment

Choose a reason for hiding this comment

jerryzh168 commented Apr 25, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cpuhrsch left a comment

Choose a reason for hiding this comment

jerryzh168 commented Apr 26, 2024

jerryzh168 commented Apr 26, 2024

cpuhrsch commented Apr 26, 2024

jerryzh168 commented Apr 29, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jerryzh168 commented Apr 24, 2024 •

edited

Loading