Add generic fake quantized embedding for QAT #1085

andrewor14 · 2024-10-15T21:54:32Z

Summary: This is equivalent to #1020 but for nn.Embedding. This commit adds a generic fake quantized embedding module to replace the uses of the existing more specific QAT embeddings. For example, Int4WeightOnlyQATEmbedding can be expressed as follows:

from torchao.quantization.prototype.qat.api import FakeQuantizeConfig
from torchao.quantization.prototype.qat.embedding import FakeQuantizedEmbedding

weight_config = FakeQuantizeConfig(
    dtype=torch.int4,
    group_size=group_size,
    is_symmetric=True,
)
fq_embedding = FakeQuantizedEmbedding(16, 32, weight_config=weight_config)

Test Plan:
python test/quantization/test_qat.py -k test_qat_4w_embedding
python test/quantization/test_qat.py -k test_fake_quantized_embedding_4w

pytorch-bot · 2024-10-15T21:54:35Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1085

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 53239e2 with merge base 48bc81c ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

torchao/quantization/prototype/qat/embedding.py

Summary: This is equivalent to #1020 but for nn.Embedding. This commit adds a generic fake quantized embedding module to replace the uses of the existing more specific QAT embeddings. For example, `Int4WeightOnlyQATEmbedding` can be expressed as follows: ``` from torchao.quantization.prototype.qat.api import FakeQuantizeConfig from torchao.quantization.prototype.qat.embedding import FakeQuantizedEmbedding weight_config = FakeQuantizeConfig( dtype=torch.int4, group_size=group_size, is_symmetric=True, ) fq_embedding = FakeQuantizedEmbedding(16, 32, weight_config=weight_config) ``` Test Plan: python test/quantization/test_qat.py -k test_qat_4w_embedding python test/quantization/test_qat.py -k test_fake_quantized_embedding_4w

…at/ folder (pytorch#1076) * [Hackability Refactor] Move known_model_params under torchchat (pytorch#1073) * [Hackability Refactor] Migrate CLI call sites to explicitly go through torchchat.py (pytorch#1075) * [Hackability Refactor] Move model.py underneath torchchat/ (pytorch#1077) * Move model.py * Clear out init to avoid package circular import * [Hackability Refactor] Move select top level docs into folders within torchchat (pytorch#1080) * [Hackability Refactor] Move the top level util folder into torchchat/utils (pytorch#1079) * [Hackability Refactor] Move the top level util file into torchchat/utils/ * Cleared out init to avoid packing * [Hackability Refactor] Collapse gguf_util into gguf_loader (pytorch#1078) * [Hackability Refactor] Collapse gguf_util into gguf_loader * Update bad import * [Hackability Refactor] Move model_config into torchchat/model_config (pytorch#1082) * [Hackability Refactor] Move cli related files under torchchat/cli (pytorch#1083) * [Hackability Refactor] Move build/util into torchchat/utils (pytorch#1084) * [Hackability Refactor] Easy Moves: eval, gguf_loader, quantize, model_dist (pytorch#1085) * [Hackability Refactor] Easy Cheap Moves: eval, gguf_loader, quantize, model_dist * Update eval.py call sites that slipped through the initial pass * [Hackability Refactor] Update missed direct file calls to use torchchat.py (pytorch#1088) * [Hackability Refactor] Move export and generate under torchchat/ (pytorch#1089) * [Hackability Refactor] Move scripts under torchchat/utils (pytorch#1090) * [Hackability Refactor] Move scripts under torchchat/utils * Fix install script for AOTI * Update referenced path in build_android * Adding missing utils path * Add another layer for torchchat * Move the source command depending on if TC root is defined * [Hackability Refactor] Move installation related files into install/ (pytorch#1081) * [Hackability Refactor] Move installation related files into install/ * Fix install req path * Test fix with install path for bash * Debug messages * Remove changes to install in et_python_libs * Remove debug echo * Fix pin path for et * [Hackability Refactor] Restricted Lint (pytorch#1091) * [Hackability Refactor] Removing __main__ from export/generate/eval (pytorch#1092)

andrewor14 requested a review from jerryzh168 October 15, 2024 21:54

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 15, 2024

jerryzh168 approved these changes Oct 15, 2024

View reviewed changes

jerryzh168 reviewed Oct 15, 2024

View reviewed changes

torchao/quantization/prototype/qat/embedding.py Outdated Show resolved Hide resolved

jerryzh168 reviewed Oct 15, 2024

View reviewed changes

torchao/quantization/prototype/qat/embedding.py Outdated Show resolved Hide resolved

andrewor14 force-pushed the fq-embedding branch from e88f00e to 997e2ce Compare October 16, 2024 02:52

andrewor14 force-pushed the fq-embedding branch from 997e2ce to 53239e2 Compare October 16, 2024 02:52

andrewor14 merged commit 0b71b8d into main Oct 16, 2024
17 checks passed

andrewor14 deleted the fq-embedding branch October 21, 2024 14:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add generic fake quantized embedding for QAT #1085

Add generic fake quantized embedding for QAT #1085

andrewor14 commented Oct 15, 2024 •

edited

Loading

pytorch-bot bot commented Oct 15, 2024 •

edited

Loading

Add generic fake quantized embedding for QAT #1085

Add generic fake quantized embedding for QAT #1085

Conversation

andrewor14 commented Oct 15, 2024 • edited Loading

pytorch-bot bot commented Oct 15, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1085

✅ No Failures

andrewor14 commented Oct 15, 2024 •

edited

Loading

pytorch-bot bot commented Oct 15, 2024 •

edited

Loading