[WIP] Codebook quantization flow #1299

DerekLiu35 · 2024-11-16T19:57:50Z

This PR adds codebook quantization flow per #1195

Usage

import torch
from torchao.prototype.quantization.codebook.codebook_quantized_tensor import CodebookQuantizedTensor

input_tensor = torch.randn(1024, 1024,  device='cuda')

block_size = (1, 1)
code_dtype = torch.uint4

quantized_tensor = CodebookQuantizedTensor.from_float(input_tensor, block_size, code_dtype)

dequantized_tensor = quantized_tensor.dequantize()

ToDo

make fit_kmeans faster. Right now it takes >1 hour if you try to quantize a 1B model.

pytorch-bot · 2024-11-16T19:57:53Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1299

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

[DomainsOnly] Jobs fail with GLIBC version not found

This comment was automatically generated by Dr. CI and updates every 15 minutes.

jerryzh168 · 2024-11-19T18:20:35Z

thanks for the contribution! yeah "> 1hour" seems a bit too slow, any ideas to speedup?

jerryzh168 · 2024-11-19T18:22:22Z

also after this is done, it would useful if you can add codebookquant to generate.py (

ao/torchao/_models/llama/generate.py

Line 209 in b714026

if quantization:

) and eval (

ao/torchao/_models/llama/eval.py

Line 71 in b714026

if quantization:

) to test the e2e model performance and accuracy

DerekLiu35 · 2024-11-19T23:53:48Z

thanks for the contribution! yeah "> 1hour" seems a bit too slow, any ideas to speedup?

I think

For block_size = (1, 1), It's similar to nf4tensor, so we can use absolute distance for scalars instead of euclidean distance
We could also decrease max_iter from 1000 to 200 for fit_kmeans but this would increase quantization error.

DerekLiu35 added 2 commits November 16, 2024 14:53

Add codebook_ops

a6ddac0

Add codebook_quanized_tensor

ebfcf6c

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 16, 2024

Add __init__.py

9d06e13

Merge branch 'pytorch:main' into main

f9c548b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Codebook quantization flow #1299

[WIP] Codebook quantization flow #1299

DerekLiu35 commented Nov 16, 2024

pytorch-bot bot commented Nov 16, 2024 •

edited

Loading

jerryzh168 commented Nov 19, 2024

jerryzh168 commented Nov 19, 2024

DerekLiu35 commented Nov 19, 2024

[WIP] Codebook quantization flow #1299

Are you sure you want to change the base?

[WIP] Codebook quantization flow #1299

Conversation

DerekLiu35 commented Nov 16, 2024

Usage

ToDo

pytorch-bot bot commented Nov 16, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1299

❗ 1 Active SEVs

jerryzh168 commented Nov 19, 2024

jerryzh168 commented Nov 19, 2024

DerekLiu35 commented Nov 19, 2024

pytorch-bot bot commented Nov 16, 2024 •

edited

Loading