Migrate to config for Int8DynamicActivationIntxWeightConfig #1836

metascroy · 2025-03-05T01:21:38Z

This PR:

Migrates to Int8DynamicActivationIntxWeightConfig
Merges PackedLinearInt8DynamicActivationIntxWeightLayout to use the same quantizer, and merges the tests

pytorch-bot · 2025-03-05T01:21:42Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1836

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit fc46e34 with merge base ada4c02 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

metascroy · 2025-03-05T01:27:38Z

torchao/dtypes/affine_quantized_tensor.py

@drisspg @jerryzh168 are we ok adding tensor_impl_ctr_kwargs to from_hp_to_intx.

It can be used to propagate a bias when constructing the weight tensor subclass via from_plain.

Jack-Khuu

Mostly nits

Not terribly familiar with this code, but passes the gut test

Jack-Khuu · 2025-03-05T23:53:54Z

torchao/dtypes/affine_quantized_tensor.py

+        if tensor_impl_ctr_kwargs is None:
+            tensor_impl_ctr_kwargs = {}
+        tensor_impl = tensor_impl_ctr(
+            data, scale, zero_point, _layout, **tensor_impl_ctr_kwargs
+        )


Don't know which style AO uses, no strong pref

Suggested change

if tensor_impl_ctr_kwargs is None:

tensor_impl_ctr_kwargs = {}

tensor_impl = tensor_impl_ctr(

data, scale, zero_point, _layout, **tensor_impl_ctr_kwargs

)

tensor_impl = tensor_impl_ctr(

data, scale, zero_point, _layout, **(tensor_impl_ctr_kwargs or {})

)

I'd like to hear from @drisspg or someone from torchao on this change.

Not so much on the style preference, but more so on whether they're OK adding tensor_impl_ctr_kwargs to the to_affine_quantized_intx signature.

Jack-Khuu · 2025-03-06T00:26:47Z

torchao/experimental/tests/test_int8_dynamic_activation_intx_weight.py

+        quantized_model_reference = copy.deepcopy(model)
+        quantize_(
+            quantized_model_reference,
+            int8_dynamic_activation_intx_weight(
+                weight_dtype=weight_dtype,
+                granularity=granularity,
+                has_weight_zeros=has_weight_zeros,
+                layout=reference_layout,
+            ),
+        )
+
+        with torch.no_grad():
+            result = quantized_model(activations)
+            expected_result = quantized_model_reference(activations)


nit: We can factor out the creation of expected_results since it's just PlainLayout in both cases (different models)

Jack-Khuu · 2025-03-06T00:33:21Z

torchao/experimental/quant_api.py

-            and layout.target == Target.ATEN
-        )
+    weight_dtype: torch.dtype = torch.int4
+    granularity: Union[PerRow, PerGroup] = PerRow()


Why not
granularity: Union[PerRow, PerGroup] = PerGroup(128),

like int8_dynamic_activation_intx_weight?

PerRow is safer default because it doesn't depend on input data size. I expect users should always specify this parameter

Jack-Khuu · 2025-03-06T00:33:40Z

torchao/experimental/quant_api.py

-            )
+
+@register_quantize_module_handler(Int8DynamicActivationIntxWeightConfig)
+def _int8_dynamic_activation_intx_weigh_transform(


Suggested change

def _int8_dynamic_activation_intx_weigh_transform(

def _int8_dynamic_activation_intx_weight_transform(

Jack-Khuu · 2025-03-06T00:34:32Z

torchao/experimental/quant_api.py

+    tensor_impl_ctr_kwargs = None
+    if isinstance(layout, PackedLinearInt8DynamicActivationIntxWeightLayout):
+        # We need to create a new layout object for each module because when
+        # granulairty is PerRow, the layout objects cannot share the group_size


Suggested change

# granulairty is PerRow, the layout objects cannot share the group_size

# granularity is PerRow, the layout objects cannot share the group_size

Jack-Khuu · 2025-03-06T00:44:00Z

torchao/experimental/packed_linear_int8_dynamic_activation_intx_weight_layout.py

+    if weight_tensor.tensor_impl.get_layout().has_bias:
+        assert (
+            bias is None
+        ), "bias should be None because it is already packed with the weights (has_bias=True)"


nit: if: assert; also fine with leaving it as-is for legibility

Suggested change

if weight_tensor.tensor_impl.get_layout().has_bias:

assert (

bias is None

), "bias should be None because it is already packed with the weights (has_bias=True)"

assert (

not weight_tensor.tensor_impl.get_layout().has_bias or bias is None

), "bias should be None because it is already packed with the weights (has_bias=True)"

Jack-Khuu · 2025-03-06T00:50:28Z

torchao/experimental/quant_api.py

-                if torch.backends.kleidiai.is_available():
-                    if isinstance(granularity, PerGroup):
-                        scale_dtype = (
-                            torch.bfloat16
-                        )  # KleidiAI kernel requires bfloat16 scale_dtype


Seems like we always use float32 in to_affine_quantized_intx. Is this intentional?

KleidiAI tests pass with this. This was only used for python-based quantization that computes qvals, scales, zeros, not by what was passed to the kernel itself.

metascroy requested a review from digantdesai March 5, 2025 01:21

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 5, 2025

metascroy commented Mar 5, 2025

View reviewed changes

metascroy force-pushed the kleidi-ai-tests branch from e332b54 to 4b3a742 Compare March 5, 2025 05:15

drisspg added the topic: improvement Use this tag if this PR is an improvement (doesn't fit into any of the other categories) label Mar 5, 2025

metascroy added 4 commits March 5, 2025 09:27

init

94370d3

up

f098d87

up

5387f76

up

fc46e34

metascroy force-pushed the kleidi-ai-tests branch from f138c3d to fc46e34 Compare March 5, 2025 17:27

metascroy mentioned this pull request Mar 5, 2025

Migrate to int args #1846

Open

metascroy requested a review from Jack-Khuu March 5, 2025 23:34

metascroy mentioned this pull request Mar 5, 2025

Update ARM CPU experimental kernels from AO to leverage pip install pytorch/torchchat#1458

Open

Jack-Khuu approved these changes Mar 6, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migrate to config for Int8DynamicActivationIntxWeightConfig #1836

Migrate to config for Int8DynamicActivationIntxWeightConfig #1836

metascroy commented Mar 5, 2025

pytorch-bot bot commented Mar 5, 2025 •

edited

Loading

metascroy Mar 5, 2025

Jack-Khuu left a comment

Jack-Khuu Mar 5, 2025

metascroy Mar 6, 2025

Jack-Khuu Mar 6, 2025

Jack-Khuu Mar 6, 2025

metascroy Mar 6, 2025

Jack-Khuu Mar 6, 2025

Jack-Khuu Mar 6, 2025

Jack-Khuu Mar 6, 2025

Jack-Khuu Mar 6, 2025

metascroy Mar 6, 2025

	def _int8_dynamic_activation_intx_weigh_transform(
	def _int8_dynamic_activation_intx_weight_transform(

	# granulairty is PerRow, the layout objects cannot share the group_size
	# granularity is PerRow, the layout objects cannot share the group_size

Migrate to config for Int8DynamicActivationIntxWeightConfig #1836

Are you sure you want to change the base?

Migrate to config for Int8DynamicActivationIntxWeightConfig #1836

Conversation

metascroy commented Mar 5, 2025

pytorch-bot bot commented Mar 5, 2025 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1836

✅ No Failures

Choose a reason for hiding this comment

Jack-Khuu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pytorch-bot bot commented Mar 5, 2025 •

edited

Loading