From f842c66f0e72fff96222f503571b60caa51bf311 Mon Sep 17 00:00:00 2001 From: Vasiliy Kuznetsov Date: Mon, 16 Sep 2024 16:08:57 -0700 Subject: [PATCH] Update README.md for float8 inference (#896) --- torchao/quantization/README.md | 20 ++++++++++++++++++-- 1 file changed, 18 insertions(+), 2 deletions(-) diff --git a/torchao/quantization/README.md b/torchao/quantization/README.md index 673aae1b5f..433688e44b 100644 --- a/torchao/quantization/README.md +++ b/torchao/quantization/README.md @@ -97,7 +97,7 @@ change_linear_weights_to_int4_woqtensors(model) Note: The quantization error incurred by applying int4 quantization to your model can be fairly significant, so using external techniques like GPTQ may be necessary to obtain a usable model. -#### A16W8 WeightOnly Quantization +#### A16W8 Int8 WeightOnly Quantization ```python # for torch 2.4+ @@ -109,7 +109,7 @@ from torchao.quantization.quant_api import change_linear_weights_to_int8_woqtens change_linear_weights_to_int8_woqtensors(model) ``` -#### A8W8 Dynamic Quantization +#### A8W8 Int8 Dynamic Quantization ```python # for torch 2.4+ @@ -121,6 +121,22 @@ from torchao.quantization.quant_api import change_linear_weights_to_int8_dqtenso change_linear_weights_to_int8_dqtensors(model) ``` +#### A16W8 Float8 WeightOnly Quantization + +```python +# for torch 2.5+ +from torchao.quantization import quantize_, float8_weight_only +quantize_(model, float8_weight_only()) +``` + +#### A16W8 Float8 Dynamic Quantization with Rowwise Scaling + +```python +# for torch 2.5+ +from torchao.quantization.quant_api import quantize_, PerRow, float8_dynamic_activation_float8_weight +quantize_(model, float8_dynamic_activation_float8_weight(granularity=PerRow())) +``` + #### A16W6 Floating Point WeightOnly Quantization ```python