Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TorchAO int4 hqq quant has accuracy issue on CPU #1823

Open
leslie-fang-intel opened this issue Mar 4, 2025 · 2 comments · May be fixed by #1824
Open

TorchAO int4 hqq quant has accuracy issue on CPU #1823

leslie-fang-intel opened this issue Mar 4, 2025 · 2 comments · May be fixed by #1824
Assignees

Comments

@leslie-fang-intel
Copy link
Collaborator

leslie-fang-intel commented Mar 4, 2025

Here is the script to reproduce this issue

  • PyTorch: 98bf2f1170f500308b29f54ee69d62a53447b649
  • TorchAO: 1c76736
import torch
import torch._inductor.config as config
import torchao
import torchao.quantization.quant_api  as quant_api
from torchao.dtypes import Int4CPULayout
from torchao.utils import unwrap_tensor_subclass
import numpy as np
import random

local_seed = 2024

torch.manual_seed(local_seed) # Set PyTorch seed
np.random.seed(seed=local_seed) # Set Numpy seed
random.seed(local_seed) # Set the Python seed

config.freezing = True
config.max_autotune = True
config.max_autotune_gemm_backends = "ATEN,"
config.epilogue_fusion = False

in_feature = 64
out_feature = 16
M = 1
dtype = torch.float16
has_bias = False

use_hqq = True

device = "cpu"

class Mod(torch.nn.Module):
    def __init__(self, dtype: torch.dtype, has_bias: bool):
        super().__init__()
        self.linear = torch.nn.Linear(in_feature, out_feature, bias=has_bias).to(dtype)

    def forward(self, a):
        tmp = self.linear(a)
        return tmp

with torch.no_grad():
    mod = Mod(dtype, has_bias).eval().to(device)
    a = torch.randn(M, in_feature).to(dtype).to(device)
    ref_res = mod(a)
    quant_api.quantize_(
        mod,
        quant_api.int4_weight_only(group_size=64, use_hqq=use_hqq, layout=Int4CPULayout()),
        set_inductor_config=False,
    )
    print("mod is: {}".format(mod), flush=True)
    unwrap_tensor_subclass(mod)
    cmod = torch.compile(mod)
    res = cmod(a)
    print("ref_res is: {}".format(ref_res), flush=True)
    print("res is: {}".format(res), flush=True)
    print(torch.testing.assert_allclose(ref_res, res, rtol=1e-2, atol=1e-2), flush=True)

And the output is

ref_res is: tensor([[-0.3396,  0.4819, -0.0533, -0.2271, -0.2703,  0.9180, -0.6357,  0.0704,
         -0.7354, -1.0088, -0.0609, -0.2749, -0.1572,  0.7842, -0.0208,  0.1705]],
       dtype=torch.float16)
res is: tensor([[37.4062, 44.0000, 37.7188, 43.6250, 43.3438, 40.2500, 38.8750, 39.0938,
         43.0625, 37.3750, 43.7812, 39.7188, 37.9375, 39.1562, 38.1875, 43.8750]],
       dtype=torch.float16)
@leslie-fang-intel leslie-fang-intel self-assigned this Mar 4, 2025
@leslie-fang-intel
Copy link
Collaborator Author

leslie-fang-intel commented Mar 4, 2025

After looking into this issue, 2 issues have been found

@leslie-fang-intel
Copy link
Collaborator Author

Hi @jerryzh168, Do you know how popular hqq quantization is in the community and TorchAO? If it's not widely used, can I explicitly throw an error in TorchAO for this case for now?

@leslie-fang-intel leslie-fang-intel linked a pull request Mar 4, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant