Add support for `AQTLayout`, `PlainAQTLayout` and `TensorCoreTiledAQTLayout` #278

jerryzh168 · 2024-05-25T05:41:48Z

Summary:
Today AffineQuantizedTensor has hardcoded storage format of int_data, scale, zero_point. But this does not work if we want to support packed weight. In this PR, we added support to hide the storage details for AffineQuantizedTensor in a family of tensor subclasses, all should inherit from the base Layout storage type: AQTLayout (affine quantized tensor storage). This is part of the project of refactoring AffineQuantizedTensor to be a proper clean API for both developers (as a showcase of how people can add new dtypes/quantization techniques) and modeling users.

This PR just added support for

base AQTLayout
plain layout tensor (PlainAQTLayout) that stores int_data, scale and zero_point tensors directly
tensor core tiled storage tensor (TensorCoreTiledAQTLayout) storing packed weight (result of torch.ops.aten._convert_weight_to_int4pack) and the combined scale_and_zero tensor, for now we only have numeric checks, we'll do performance checks on example models later
this improves performance for current AffineQuantizedTensor because we are doing packing beforehand, instead of on the fly.

AffineQuantizedTensor will have the following:

layout_tensor: AQTLayout (can store data of different storage formats)
layout: str (a string represents the type of storage_tensor we have, can be used in dispatch) (overwrite the original tensor layout property)

Users can add a new type of layout tensor for a specific layout with:

@register_aqt_layout_cls("my_layout")
class MyAQTLayout(AQTLayout):
    ...

Target Audience:
As mentioned before,
(1). this serves as an example for developers who are developing new quantization techniques/dtypes, and who also need packing / special storage layout for the weight, they can copy paste what we are doing here and apply to their own use cases
(2). for modeling users, packing weight before hand would give a performance boost to the model, but they'll need to specify the specific extended_layout they want to pack into, e.g. tensor_core_tiled

Test Plan:
python test/quantization/test_quant_api.py

Reviewers:

Subscribers:

Tasks:

Tags:

pytorch-bot · 2024-05-25T05:41:50Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/278

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 4758982 with merge base 62745fc ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

cpuhrsch · 2024-05-28T22:39:55Z

Nice! Nit: Let's please call it Layout :)

Summary: Today `AffineQuantizedTensor` has hardcoded storage format of `int_data`, `scale`, `zero_point`. But this does not work if we want to support packed weight. In this PR, we added support to hide the storage details for `AffineQuantizedTensor` in a family of tensor subclasses, all should inherit from the base Storage type: `AQTStorage` (affine quantized tensor storage) This PR just added support for a plain storage tensor (`PlainAQTStorage`) that stores `int_data`, `scale` and `zero_point` tensors directly, in the next PR we'll also support storing packed weight (result of `torch.ops.aten._convert_weight_to_int4pack`) in a different type of `AQTStorage`. `AffineQuantizedTensor` will have the following: - storage_tensor: AQTStorage (can store data of different storage formats) - storage_layout: str (a string represents the type of storage_tensor we have, can be used in dispatch) Test Plan: python test/quantization/test_quant_api.py Reviewers: Subscribers: Tasks: Tags:

msaroufim · 2024-05-29T19:17:14Z

test/quantization/test_quant_api.py

@@ -496,7 +496,7 @@ def test_quantized_tensor_subclass_int8_dyn_quant(self):
        m = ToyLinearModel(1024, 1024, 1024).eval().to(torch.bfloat16).to("cuda")
        m_copy = copy.deepcopy(m)
        # setting batch_size to 20 to be compatible with the kernel
-        example_inputs = tuple(map(lambda x: x.to(torch.bfloat16).to("cuda"), m.example_inputs(batch_size=20)))
+        example_inputs = m.example_inputs(batch_size=20, dtype=torch.bfloat16, device="cuda")
        m = quantize(m, get_apply_int8dyn_quant())



I don't see any tests explicitly testing TensorCoreTiledAQTLayout

msaroufim · 2024-05-29T19:18:53Z

torchao/dtypes/aqt.py

-        self.int_data = int_data
-        self.scale = scale
-        self.zero_point = zero_point
+        self.layout_tensor = layout_tensor


I understand the benefit of group the above 3 configs into the same object but I don't think I'm quite clear on why this needs to be a subclass? Is it because we need to override the behavior of detach to do bit unpacking?

we need a subclass so that we can support different packing format, like the ones that's used by tensor core

so the ideal abstraction here I feel should be "storage subclass" cc @ezyang @albanD

With a non-standard layout, I don't doubt the need for a subclass (since you need to override the behavior of operations to know how to deal with the layout), but I doubt the need for a wrapper subclass, just do a basic storage and have all the smarts in the tensor. That's how non-contiguous layouts work on dense tensor!

@ezyang IIUC you mean we just put the layout related logic to tensor subclass itself and create different tensor subclasses for different layout?

the problem here is that we still want the same tensor (AffineQuantizedTensor) but can be packed into different formats (different storage)

btw, this is just a use case I want to make sure you are aware, not a feature request since we can workaround with tensor subclass, even though not that "naturally"

When the tensor is repacked multiple ways, what kind of lifetime do you have for the packing?

oh sorry, I meant we still use the same tensor subclass, not the same tensor instance

@ezyang - the analogy here is changing from strided to COO to BSR to strided etc.

Essentially we want to introduce new layouts for this Tensor.

This will also be useful for https://github.com/pytorch/ao/blob/4c1d568263582db3e48299d2eba78d8ad1f6274d/torchao/dtypes/uint4.py to implement different bitpacking formats (different layouts).

There are many ways to skin this cat. But the way I would think about it is that if you want a bunch of different behaviors to live in a single class (no extra subclass hierarchy), well, you can't use the class dispatch mechanism to dispatch them. So you need your own dispatch mechanism. Luckily, in Python, that's pretty easy to build.

Yes, agreed. Coming up with a good way to skin this cat could help us make it a generic recipe for PyTorch users. @jerryzh168 and I have been chatting a bit about it.

jerryzh168 · 2024-05-29T19:57:57Z

torchao/quantization/quant_api.py

@@ -348,7 +348,7 @@ def apply_int4wo_quant(weight):
        preserve_zero = False
        zero_point_dtype = torch.bfloat16
        zero_point_domain = ZeroPointDomain.FLOAT
-        return to_aq(weight, mapping_type, block_size, target_dtype, quant_min, quant_max, eps, zero_point_dtype=zero_point_dtype, preserve_zero=preserve_zero, zero_point_domain=zero_point_domain)
+        return to_aq(weight, mapping_type, block_size, target_dtype, quant_min, quant_max, eps, zero_point_dtype=zero_point_dtype, preserve_zero=preserve_zero, zero_point_domain=zero_point_domain, extended_layout="tensor_core_tiled")


@msaroufim "tensor_core_tiled" is tested here

…Layout` (pytorch#278) Summary: Today `AffineQuantizedTensor` has hardcoded storage format of `int_data`, `scale`, `zero_point`. But this does not work if we want to support packed weight. In this PR, we added support to hide the storage details for `AffineQuantizedTensor` in a family of tensor subclasses, all should inherit from the base Storage type: `AQTStorage` (affine quantized tensor storage) This PR just added support for a plain storage tensor (`PlainAQTStorage`) that stores `int_data`, `scale` and `zero_point` tensors directly, in the next PR we'll also support storing packed weight (result of `torch.ops.aten._convert_weight_to_int4pack`) in a different type of `AQTStorage`. `AffineQuantizedTensor` will have the following: - storage_tensor: AQTStorage (can store data of different storage formats) - storage_layout: str (a string represents the type of storage_tensor we have, can be used in dispatch) Test Plan: python test/quantization/test_quant_api.py Reviewers: Subscribers: Tasks: Tags:

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 25, 2024

jerryzh168 requested a review from cpuhrsch May 25, 2024 05:42

jerryzh168 force-pushed the layout branch 3 times, most recently from 4d232b8 to 5dd3200 Compare May 25, 2024 22:51

jerryzh168 changed the title ~~Add support for AQTStorage and PlainAQTStorage~~ Add support for AQTStorage, PlainAQTStorage and TiledAQTStorage May 25, 2024

jerryzh168 changed the title ~~Add support for AQTStorage, PlainAQTStorage and TiledAQTStorage~~ Add support for AQTStorage, PlainAQTStorage and TensorCoreTiledAQTStorage May 25, 2024

jerryzh168 force-pushed the layout branch from 5dd3200 to e310a4f Compare May 25, 2024 23:01

cpuhrsch requested a review from msaroufim May 28, 2024 17:09

jerryzh168 force-pushed the layout branch from e310a4f to 8353c20 Compare May 29, 2024 15:59

jerryzh168 force-pushed the layout branch from 8353c20 to f08f6de Compare May 29, 2024 16:22

jerryzh168 changed the title ~~Add support for AQTStorage, PlainAQTStorage and TensorCoreTiledAQTStorage~~ Add support for AQTLayout, PlainAQTLayout and TensorCoreTiledAQTLayout May 29, 2024

Merge branch 'main' into layout

ec3f79f

msaroufim reviewed May 29, 2024

View reviewed changes

msaroufim approved these changes May 29, 2024

View reviewed changes

jerryzh168 commented May 29, 2024

View reviewed changes

Merge branch 'main' into layout

4758982

jerryzh168 merged commit 374fec4 into pytorch:main May 29, 2024
13 checks passed

jerryzh168 deleted the layout branch May 29, 2024 21:31

jerryzh168 mentioned this pull request Jun 18, 2024

[RFC] torchao Contributor Guide #391

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for `AQTLayout`, `PlainAQTLayout` and `TensorCoreTiledAQTLayout` #278

Add support for `AQTLayout`, `PlainAQTLayout` and `TensorCoreTiledAQTLayout` #278

jerryzh168 commented May 25, 2024 •

edited

Loading

pytorch-bot bot commented May 25, 2024 •

edited

Loading

cpuhrsch commented May 28, 2024

msaroufim May 29, 2024

msaroufim May 29, 2024 •

edited

Loading

jerryzh168 May 29, 2024 •

edited

Loading

ezyang May 29, 2024

jerryzh168 May 29, 2024 •

edited

Loading

ezyang May 29, 2024

jerryzh168 May 29, 2024

cpuhrsch May 30, 2024 •

edited

Loading

cpuhrsch May 30, 2024

ezyang May 31, 2024

cpuhrsch May 31, 2024

jerryzh168 May 29, 2024

Add support for AQTLayout, PlainAQTLayout and TensorCoreTiledAQTLayout #278

Add support for AQTLayout, PlainAQTLayout and TensorCoreTiledAQTLayout #278

Conversation

jerryzh168 commented May 25, 2024 • edited Loading

pytorch-bot bot commented May 25, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/278

✅ No Failures

cpuhrsch commented May 28, 2024

Choose a reason for hiding this comment

msaroufim May 29, 2024 • edited Loading

Choose a reason for hiding this comment

jerryzh168 May 29, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jerryzh168 May 29, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cpuhrsch May 30, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Add support for `AQTLayout`, `PlainAQTLayout` and `TensorCoreTiledAQTLayout` #278

Add support for `AQTLayout`, `PlainAQTLayout` and `TensorCoreTiledAQTLayout` #278

jerryzh168 commented May 25, 2024 •

edited

Loading

pytorch-bot bot commented May 25, 2024 •

edited

Loading

msaroufim May 29, 2024 •

edited

Loading

jerryzh168 May 29, 2024 •

edited

Loading

jerryzh168 May 29, 2024 •

edited

Loading

cpuhrsch May 30, 2024 •

edited

Loading