TorchAO int4 hqq quant has accuracy issue on CPU #1823

leslie-fang-intel · 2025-03-04T08:41:09Z

Here is the script to reproduce this issue

PyTorch: 98bf2f1170f500308b29f54ee69d62a53447b649
TorchAO: 1c76736

import torch
import torch._inductor.config as config
import torchao
import torchao.quantization.quant_api  as quant_api
from torchao.dtypes import Int4CPULayout
from torchao.utils import unwrap_tensor_subclass
import numpy as np
import random

local_seed = 2024

torch.manual_seed(local_seed) # Set PyTorch seed
np.random.seed(seed=local_seed) # Set Numpy seed
random.seed(local_seed) # Set the Python seed

config.freezing = True
config.max_autotune = True
config.max_autotune_gemm_backends = "ATEN,"
config.epilogue_fusion = False

in_feature = 64
out_feature = 16
M = 1
dtype = torch.float16
has_bias = False

use_hqq = True

device = "cpu"

class Mod(torch.nn.Module):
    def __init__(self, dtype: torch.dtype, has_bias: bool):
        super().__init__()
        self.linear = torch.nn.Linear(in_feature, out_feature, bias=has_bias).to(dtype)

    def forward(self, a):
        tmp = self.linear(a)
        return tmp

with torch.no_grad():
    mod = Mod(dtype, has_bias).eval().to(device)
    a = torch.randn(M, in_feature).to(dtype).to(device)
    ref_res = mod(a)
    quant_api.quantize_(
        mod,
        quant_api.int4_weight_only(group_size=64, use_hqq=use_hqq, layout=Int4CPULayout()),
        set_inductor_config=False,
    )
    print("mod is: {}".format(mod), flush=True)
    unwrap_tensor_subclass(mod)
    cmod = torch.compile(mod)
    res = cmod(a)
    print("ref_res is: {}".format(ref_res), flush=True)
    print("res is: {}".format(res), flush=True)
    print(torch.testing.assert_allclose(ref_res, res, rtol=1e-2, atol=1e-2), flush=True)

And the output is

ref_res is: tensor([[-0.3396,  0.4819, -0.0533, -0.2271, -0.2703,  0.9180, -0.6357,  0.0704,
         -0.7354, -1.0088, -0.0609, -0.2749, -0.1572,  0.7842, -0.0208,  0.1705]],
       dtype=torch.float16)
res is: tensor([[37.4062, 44.0000, 37.7188, 43.6250, 43.3438, 40.2500, 38.8750, 39.0938,
         43.0625, 37.3750, 43.7812, 39.7188, 37.9375, 39.1562, 38.1875, 43.8750]],
       dtype=torch.float16)

The text was updated successfully, but these errors were encountered:

leslie-fang-intel · 2025-03-04T08:45:31Z

After looking into this issue, 2 issues have been found

Issue 1 (which seems to be the main issue): Int4CPULayout need to be added into

ao/torchao/dtypes/affine_quantized_tensor.py

Line 238 in 81a2813

_layout, (TensorCoreTiledLayout, PlainLayout)

to enable the raw_output

Issue 2: I found hqq use a different way to quant the input which means we need a different dequant inside _weight_int4pack_mm_for_cpu.

Norm int4:

ao/torchao/quantization/quant_primitives.py

Lines 434 to 439 in 81a2813

    
           assert zero_point_domain == ZeroPointDomain.FLOAT.name 
        
           mid_point = (quant_max + quant_min + 1) / 2 
        
           min_val = zero_point - scale * mid_point 
        
           quant = torch.clamp( 
        
               torch.round((input - min_val) / scale), quant_min, quant_max 
        
           )

Hqq int4:

ao/torchao/quantization/quant_primitives.py

Line 1241 in 81a2813

W_q = torch.round(W * scale + zero).clamp(min_max[0], min_max[1])

Nevertheless, the kernel implementation assumes lut is

  static constexpr float lut[16] = {
    -8.0f, -7.0f, -6.0f, -5.0f,
    -4.0f, -3.0f, -2.0f, -1.0f,
    0.0f, 1.0f, 2.0f, 3.0f,
    4.0f, 5.0f, 6.0f, 7.0f
  };

and dequant wq as
https://github.com/pytorch/pytorch/blob/da2688f6242cce6e07d5ba8eecd9609e4a4b2c34/aten/src/ATen/native/cpu/int4mm_kernel.cpp#L545 which cause the wrong result of hqq

leslie-fang-intel · 2025-03-04T08:48:57Z

Hi @jerryzh168, Do you know how popular hqq quantization is in the community and TorchAO? If it's not widely used, can I explicitly throw an error in TorchAO for this case for now?

leslie-fang-intel self-assigned this Mar 4, 2025

leslie-fang-intel linked a pull request Mar 4, 2025 that will close this issue

Enable the CPU int4 with HQQ quant #1824

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TorchAO int4 hqq quant has accuracy issue on CPU #1823

TorchAO int4 hqq quant has accuracy issue on CPU #1823

leslie-fang-intel commented Mar 4, 2025 •

edited

Loading

leslie-fang-intel commented Mar 4, 2025 •

edited

Loading

leslie-fang-intel commented Mar 4, 2025

TorchAO int4 hqq quant has accuracy issue on CPU #1823

TorchAO int4 hqq quant has accuracy issue on CPU #1823

Comments

leslie-fang-intel commented Mar 4, 2025 • edited Loading

leslie-fang-intel commented Mar 4, 2025 • edited Loading

leslie-fang-intel commented Mar 4, 2025

leslie-fang-intel commented Mar 4, 2025 •

edited

Loading

leslie-fang-intel commented Mar 4, 2025 •

edited

Loading