fp8dq requires both dimensions to be divisible by 16 #1268

piotr-bazan-nv · 2024-11-12T15:06:27Z

When trying to quantize a model the exepction is raised:

TorchRuntimeError: Failed running call_function <built-in function linear>(*(FakeTensor(..., device='cuda:0', size=(2, 32)), LinearActivationQuantizedTensor(AffineQuantizedTensor(layout_tensor=Float8AQTLayout(
float8_data=FakeTensor(..., device='cuda:0', size=(15, 32), dtype=torch.float8_e4m3fn),
scale=FakeTensor(..., device='cuda:0', size=()),
transposed=False, layout_type=Float8LayoutType(mm_config=Float8MMConfig(emulate=False, use_fast_accum=True, pad_inner_dim=False))), block_size=torch.Size([15, 32]), shape=torch.Size([15, 32]), device=cuda:0, dtype=torch.float32, requires_grad=False), functools.partial(<function _input_activation_quant_func_fp8 at 0x7a94b4f4d120>, activation_granularity=PerTensor(), activation_dtype=torch.float8_e4m3fn)), Parameter(FakeTensor(..., device='cuda:0', size=(15,), requires_grad=True))), **{}):
Expected both dimensions of mat2 to be divisble by 16 but got torch.Size([32, 15])

Minimal code to reproduce the issue:

import torch
from torchao.quantization import (
    float8_dynamic_activation_float8_weight,
    quantize_,
)
dim1 = 32
dim2 = 15

class ToyModel(torch.nn.Module):

    def __init__(self):
        super().__init__()
        self.model = torch.nn.Linear(dim1, dim2)

    def forward(self, x):
        return self.model(x)

model = ToyModel().to("cuda").eval()

quantize_(model, float8_dynamic_activation_float8_weight())
model = torch.compile(model=model, fullgraph=True, mode="max-autotune")
model(torch.randn(2, 32).to('cuda'))

Is this by design or is it a bug? Currently this prevents many models to be quantized.

The text was updated successfully, but these errors were encountered:

HDCharles · 2024-11-12T21:02:45Z

Hey, yes, that's a requirement of scaled_mm in general though

ao/test/float8/test_base.py

Line 635 in 4120526

    
           "Expected trailing dimension of mat1 to be divisible by 16 but got mat1 shape: (16x41)."

you can use something like

ao/test/float8/test_base.py

Lines 776 to 782 in 4120526

    
           def module_filter_fn(mod, fqn): 
        
               return ( 
        
                   mod.in_features >= size_limit 
        
                   and mod.out_features >= size_limit 
        
                   and mod.in_features % 16 == 0 
        
                   and mod.out_features % 16 == 0 
        
               )

as a filter fn argument in quantize_

we're working on other kernels that are more flexible

jerryzh168 · 2024-11-19T01:06:44Z

should probably add this to

ao/torchao/quantization/quant_api.py

Line 947 in aeff75b

def apply_float8_dynamic_activation_quant(weight: torch.Tensor):

if it applies to float8 quant method itself

drisspg · 2024-11-19T01:28:56Z

import torch
from torchao.quantization import (
    float8_dynamic_activation_float8_weight,
    quantize_,
)
import logging

logging.getLogger("torchao").setLevel(logging.INFO)

logging.basicConfig(level=logging.INFO)
dim1 = 32
dim2 = 15

class ToyModel(torch.nn.Module):

    def __init__(self):
        super().__init__()
        self.model = torch.nn.Linear(dim1, dim2)

    def forward(self, x):
        return self.model(x)

model = ToyModel().to("cuda").eval()

quantize_(model, float8_dynamic_activation_float8_weight())
model = torch.compile(model=model, fullgraph=True, mode="max-autotune")
model(torch.randn(2, 32).to('cuda'))

we do properly raise

INFO:torchao.quantization.quant_api:Skipping float8 quantization: weight shape torch.Size([15, 32]) is not compatible with _scaled_mm. Both input dimension (32) and output dimension (15) must be multiples of 16.

jerryzh168 · 2024-11-19T03:18:01Z

maybe it's a issue with torchao versions, @piotr-bazan-nv what torchao version are you using?

piotr-bazan-nv · 2024-11-19T08:00:53Z

@jerryzh168 It's 0.6.1

jerryzh168 · 2024-11-19T22:16:25Z

#1194 is added after the release I think, you should be able to get the change in nightly or 0.7

piotr-bazan-nv · 2024-11-20T07:29:59Z

Thanks @jerryzh168. Closing the issue then.

HDCharles mentioned this issue Nov 12, 2024

check whats brekaing float8_test #1273

Open

vkuzo assigned drisspg Nov 13, 2024

piotr-bazan-nv closed this as completed Nov 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fp8dq requires both dimensions to be divisible by 16 #1268

fp8dq requires both dimensions to be divisible by 16 #1268

piotr-bazan-nv commented Nov 12, 2024 •

edited by drisspg

Loading

HDCharles commented Nov 12, 2024 •

edited

Loading

jerryzh168 commented Nov 19, 2024

drisspg commented Nov 19, 2024

jerryzh168 commented Nov 19, 2024

piotr-bazan-nv commented Nov 19, 2024

jerryzh168 commented Nov 19, 2024

piotr-bazan-nv commented Nov 20, 2024

fp8dq requires both dimensions to be divisible by 16 #1268

fp8dq requires both dimensions to be divisible by 16 #1268

Comments

piotr-bazan-nv commented Nov 12, 2024 • edited by drisspg Loading

HDCharles commented Nov 12, 2024 • edited Loading

jerryzh168 commented Nov 19, 2024

drisspg commented Nov 19, 2024

jerryzh168 commented Nov 19, 2024

piotr-bazan-nv commented Nov 19, 2024

jerryzh168 commented Nov 19, 2024

piotr-bazan-nv commented Nov 20, 2024

piotr-bazan-nv commented Nov 12, 2024 •

edited by drisspg

Loading

HDCharles commented Nov 12, 2024 •

edited

Loading