Support `model.to` int8 weight only quantized model #122

jerryzh168 · 2024-04-04T06:49:12Z

Summary:
registering fields as buffers so they get picked up in model.to

Test Plan:
python test/quantization/test_quant_api.py -k test_int8_wo_quant_save_load Reviewers:

Subscribers:

Tasks:

Tags:

Summary: registering fields as buffers so they get picked up in `model.to` Test Plan: python test/quantization/test_quant_api.py -k test_int8_wo_quant_save_load Reviewers: Subscribers: Tasks: Tags:

* add int4 non-gptq and bugfixes (#119) Summary: int4weightlinear had a bug that made it not pad when it should have Test Plan: python test/quantization/test_quant_api.py -k "int4wo" Reviewers: Subscribers: Tasks: Tags: * fixing bug in GPTQ (#120) * fixing bug in GPTQ Summary: shape was always padded even when not needed. Test Plan: pythont test/quantization/test_quant_api.py -k "test_gptq_quantizer_int4wo" Reviewers: Subscribers: Tasks: Tags: * removing extra spaces Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Support `model.to` int8 weight only quantized model (#122) Summary: registering fields as buffers so they get picked up in `model.to` Test Plan: python test/quantization/test_quant_api.py -k test_int8_wo_quant_save_load Reviewers: Subscribers: Tasks: Tags: --------- Co-authored-by: HDCharles <[email protected]>

cpuhrsch · 2024-04-04T16:28:25Z

test/quantization/test_quant_api.py

+        apply_weight_only_int8_quant(m)
+        example_inputs = m.example_inputs()
+        ref = m(*example_inputs)
+        _TMP_FN = "_test.pt"


Using named temporary files might be better

https://docs.python.org/3/library/tempfile.html#tempfile.NamedTemporaryFile

Summary: registering fields as buffers so they get picked up in `model.to` Test Plan: python test/quantization/test_quant_api.py -k test_int8_wo_quant_save_load Reviewers: Subscribers: Tasks: Tags:

jerryzh168 requested a review from cpuhrsch April 4, 2024 06:49

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 4, 2024

jerryzh168 requested a review from HDCharles April 4, 2024 06:49

Support model.to int8 weight only quantized model

bdb7bc2

Summary: registering fields as buffers so they get picked up in `model.to` Test Plan: python test/quantization/test_quant_api.py -k test_int8_wo_quant_save_load Reviewers: Subscribers: Tasks: Tags:

jerryzh168 force-pushed the int8 branch from 595a2cc to bdb7bc2 Compare April 4, 2024 06:59

jerryzh168 merged commit 76e2ef5 into main Apr 4, 2024
7 checks passed

jerryzh168 deleted the int8 branch April 4, 2024 07:10

cpuhrsch reviewed Apr 4, 2024

View reviewed changes

yanbing-j pushed a commit to yanbing-j/ao that referenced this pull request Dec 9, 2024

Fix name_to_dtype error in export.py (pytorch#122)

66390f4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support `model.to` int8 weight only quantized model #122

Support `model.to` int8 weight only quantized model #122

jerryzh168 commented Apr 4, 2024

cpuhrsch Apr 4, 2024

cpuhrsch Apr 4, 2024

Support model.to int8 weight only quantized model #122

Support model.to int8 weight only quantized model #122

Conversation

jerryzh168 commented Apr 4, 2024

cpuhrsch Apr 4, 2024

Choose a reason for hiding this comment

cpuhrsch Apr 4, 2024

Choose a reason for hiding this comment

Support `model.to` int8 weight only quantized model #122

Support `model.to` int8 weight only quantized model #122