Allow quantized linear registration in a different file #783

jerryzh168 · 2024-08-30T20:24:32Z

Summary:

Previously there was some ordering that we need to maintain for quantized linear dispatch table in AffineQuantizedTensor, the reason is there is a fallback entry that dequantizes the input:

ao/torchao/dtypes/affine_quantized_tensor.py

Line 1195 in ba2d3b1

(_linear_quantized_act_fallback_check, _linear_quantized_act_fallback_impl),

so the dispatches with two inputs quantized (static or dynamic quantization) must come before this entry and dispatches with weight only quantization, however the fallback is not really used/needed in practice, since people typically just want to call into a very specific kernel.

From offline discussions with @drisspg and @HDCharles, it might be useful to have a "quantized_linear_impl" for LayoutType, this allows people to specify and check which quantized_linear_impl they want to use to make sure they can call into the specific kernel, when this field is set, we'll not run the fallback path for quantized linear either (dequantize all activation and weight tensors and run the floating point linear op)
I think this can be added for a specific layout type if people want to and we don't have to enforce this in the base LayoutType, otherwise we'd have to specify this for all LayoutType which seems a bit cumbersome. The contract here is when people want to skip the fallback path, they can do:

class MyLayoutType:
    quantized_linear_impl = "some_impl"
    ...

and have the dispatch_condition function check for this:

def dispatch_condition(...):
    return weight_tensor.layout_type.quantized_linear_impl == "some_impl" and ...

if there are some issues with the implementation, it will not call the fallback in this case

Test Plan:
python test/dtypes/test_affine_quantized.py -k test_register_new_dispatch

Reviewers:

Subscribers:

Tasks:

Tags:

pytorch-bot · 2024-08-30T20:24:35Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/783

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit b7c1512 with merge base e2dad4a ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

vayuda · 2024-09-02T17:32:36Z

torchao/dtypes/affine_quantized_tensor.py

+    """
+    _AQT_QLINEAR_DISPATCH_TABLE[dispatch_condition] = impl
+
+def _deregister_aqt_quantized_linear_dispatch(dispatch_condition):


When should this be used?

this is used to remove some registration, most of the time probably don't need to use it, I'm using this just to make sure test pass since the registration is global

@drisspg

Summary: Previously there was some ordering that we need to maintain for quantized linear dispatch table in AffineQuantizedTensor, the reason is there is a fallback entry that dequantizes the input: https://github.com/pytorch/ao/blob/ba2d3b1333b90ccd0186216649a1c58c6a17ce56/torchao/dtypes/affine_quantized_tensor.py#L1195 so the dispatches with two inputs quantized (static or dynamic quantization) must come before this entry and dispatches with weight only quantization, however the fallback is not really used/needed in practice, since people typically just want to call into a very specific kernel. From offline discussions with @drisspg and @HDCharles, it might be useful to have a "quantized_linear_impl" for `LayoutType`, this allows people to specify and check which quantized_linear_impl they want to use to make sure they can call into the specific kernel, when this field is set, we'll not run the fallback path for quantized linear either (dequantize all activation and weight tensors and run the floating point linear op) I think this can be added for a specific layout type if people want to and we don't have to enforce this in the base `LayoutType` Test Plan: python test/dtypes/test_affine_quantized.py -k test_register_new_dispatch Reviewers: Subscribers: Tasks: Tags:

@drisspg

* Allow quantized linear registration in a different file Summary: Previously there was some ordering that we need to maintain for quantized linear dispatch table in AffineQuantizedTensor, the reason is there is a fallback entry that dequantizes the input: https://github.com/pytorch/ao/blob/ba2d3b1333b90ccd0186216649a1c58c6a17ce56/torchao/dtypes/affine_quantized_tensor.py#L1195 so the dispatches with two inputs quantized (static or dynamic quantization) must come before this entry and dispatches with weight only quantization, however the fallback is not really used/needed in practice, since people typically just want to call into a very specific kernel. From offline discussions with @drisspg and @HDCharles, it might be useful to have a "quantized_linear_impl" for `LayoutType`, this allows people to specify and check which quantized_linear_impl they want to use to make sure they can call into the specific kernel, when this field is set, we'll not run the fallback path for quantized linear either (dequantize all activation and weight tensors and run the floating point linear op) I think this can be added for a specific layout type if people want to and we don't have to enforce this in the base `LayoutType` Test Plan: python test/dtypes/test_affine_quantized.py -k test_register_new_dispatch Reviewers: Subscribers: Tasks: Tags: * fix error * de-register dispatch * make register/deregister fn public * rebase and fix error

jerryzh168 requested review from vkuzo, drisspg and HDCharles August 30, 2024 20:24

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 30, 2024

jerryzh168 requested a review from vayuda August 30, 2024 20:24

jerryzh168 mentioned this pull request Sep 2, 2024

Add sparse marlin AQT layout #621

Merged

jerryzh168 requested a review from jcaip September 2, 2024 17:24

vayuda reviewed Sep 2, 2024

View reviewed changes

vayuda approved these changes Sep 2, 2024

View reviewed changes

jerryzh168 added 5 commits September 2, 2024 10:53

fix error

f43ef3d

de-register dispatch

ae7dd16

make register/deregister fn public

c1acd5f

rebase and fix error

b7c1512

jerryzh168 force-pushed the improve-quantized-linear branch from 915e8d3 to b7c1512 Compare September 2, 2024 17:55

jerryzh168 merged commit e15e509 into pytorch:main Sep 3, 2024
17 checks passed

jerryzh168 deleted the improve-quantized-linear branch September 3, 2024 03:06

jerryzh168 mentioned this pull request Sep 10, 2024

int8 dynamic quant + bsr support #821

Merged

jerryzh168 mentioned this pull request Oct 28, 2024

[RFC] Follow Up for torchao developer experience discussion #1184

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow quantized linear registration in a different file #783

Allow quantized linear registration in a different file #783

jerryzh168 commented Aug 30, 2024 •

edited

Loading

pytorch-bot bot commented Aug 30, 2024 •

edited

Loading

vayuda Sep 2, 2024

jerryzh168 Sep 2, 2024 •

edited

Loading

Allow quantized linear registration in a different file #783

Allow quantized linear registration in a different file #783

Conversation

jerryzh168 commented Aug 30, 2024 • edited Loading

pytorch-bot bot commented Aug 30, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/783

✅ No Failures

vayuda Sep 2, 2024

Choose a reason for hiding this comment

jerryzh168 Sep 2, 2024 • edited Loading

Choose a reason for hiding this comment

jerryzh168 commented Aug 30, 2024 •

edited

Loading

pytorch-bot bot commented Aug 30, 2024 •

edited

Loading

jerryzh168 Sep 2, 2024 •

edited

Loading