-
Notifications
You must be signed in to change notification settings - Fork 177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fp8dq requires both dimensions to be divisible by 16 #1268
Comments
Hey, yes, that's a requirement of scaled_mm in general though Line 635 in 4120526
you can use something like Lines 776 to 782 in 4120526
as a filter fn argument in quantize_ we're working on other kernels that are more flexible |
should probably add this to ao/torchao/quantization/quant_api.py Line 947 in aeff75b
|
import torch
from torchao.quantization import (
float8_dynamic_activation_float8_weight,
quantize_,
)
import logging
logging.getLogger("torchao").setLevel(logging.INFO)
logging.basicConfig(level=logging.INFO)
dim1 = 32
dim2 = 15
class ToyModel(torch.nn.Module):
def __init__(self):
super().__init__()
self.model = torch.nn.Linear(dim1, dim2)
def forward(self, x):
return self.model(x)
model = ToyModel().to("cuda").eval()
quantize_(model, float8_dynamic_activation_float8_weight())
model = torch.compile(model=model, fullgraph=True, mode="max-autotune")
model(torch.randn(2, 32).to('cuda')) we do properly raise
|
maybe it's a issue with torchao versions, @piotr-bazan-nv what torchao version are you using? |
@jerryzh168 It's |
#1194 is added after the release I think, you should be able to get the change in nightly or 0.7 |
Thanks @jerryzh168. Closing the issue then. |
When trying to quantize a model the exepction is raised:
Minimal code to reproduce the issue:
Is this by design or is it a bug? Currently this prevents many models to be quantized.
The text was updated successfully, but these errors were encountered: