Skip to content

Commit

Permalink
Add AUTOQUANT_CACHE docs for reusing the same quantization plan (py…
Browse files Browse the repository at this point in the history
  • Loading branch information
RobinKa authored Jun 6, 2024
1 parent a4cc35e commit 2023756
Showing 1 changed file with 15 additions and 0 deletions.
15 changes: 15 additions & 0 deletions torchao/quantization/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,21 @@ model = torchao.autoquant(torch.compile(model, mode='max-autotune'))
model(input)
```

Sometimes it is desirable to reuse a quantization plan that `autoquant` came up with. `torchao.quantization.AUTOQUANT_CACHE` is a dictionary holding autoquant's benchmark results. We can save it and restore it later, which will cause `autoquant` to choose the same quantization methods.

```python
import pickle
import torchao.quantization

# After the first forward pass (when quantization was done)
with open("quantization-cache.pkl", "wb") as f:
pickle.dump(torchao.quantization.AUTOQUANT_CACHE)

# On load
with open("quantization-cache.pkl", "rb") as f:
torchao.quantization.AUTOQUANT_CACHE.update(pickle.load(f))
```


## A8W8 Dynamic Quantization

Expand Down

0 comments on commit 2023756

Please sign in to comment.