Initial support for 8da4w QAT #138

andrewor14 · 2024-04-16T00:02:40Z

Summary: This commit adds support for QAT, where linear layers are fake quantized with int8 per token dynamic activations (8da) and int4 grouped per channel weights (4w). This initial implementation uses the same module swap approach as 8da4w PTQ for simplicity and code reuse. In the future, we may wish to consider migrating both flows to use tensor subclasses for better composability with other PyTorch features.

Test Plan:
python test/quantization/test_qat.py -k test_fake_quantize_per_channel_group
python test/quantization/test_qat.py -k test_fake_quantize_per_token
python test/quantization/test_qat.py -k test_qat_8da4w_linear
python test/quantization/test_qat.py -k test_qat_8da4w_quantizer

Reviewers: jerryzh168, cpuhrsch, HDCharles

Subscribers: jerryzh168, cpuhrsch, HDCharles, supriyar

Tasks: #86

cpuhrsch

nit: you don't need the extra underscore in _prototype, but otherwise looks good as a way to get started :)

jerryzh168 · 2024-04-16T20:53:18Z

torchao/quantization/prototype/qat.py

+        return (qmin, qmax)
+
+
+def replace_linear_8da4w_qat(


this seems to be the same as replace_linear_8da4w, maybe we want to abstract out a helper function, to be less error prone

jerryzh168

looks good, please make sure CI passes before landing

Summary: This commit adds support for QAT, where linear layers are fake quantized with int8 per token dynamic activations (8da) and int4 grouped per channel weights (4w). This initial implementation uses the same module swap approach as 8da4w PTQ for simplicity and code reuse. In the future, we may wish to consider migrating both flows to use tensor subclasses for better composability with other PyTorch features. Test Plan: python test/quantization/test_qat.py -k test_fake_quantize_per_channel_group python test/quantization/test_qat.py -k test_fake_quantize_per_token python test/quantization/test_qat.py -k test_qat_8da4w_linear python test/quantization/test_qat.py -k test_qat_8da4w_quantizer Reviewers: jerryzh168, cpuhrsch, HDCharles Subscribers: jerryzh168, cpuhrsch, HDCharles, supriyar Tasks: #86

wat3rBro · 2024-04-19T00:06:08Z

torchao/quantization/GPTQ.py

@@ -1144,14 +1144,30 @@ def replace_linear_8da4w(
                        ),
                    )
            else:
-                replace_linear_8da4w(
+                _replace_linear_8da4w(
                    child,
                    groupsize,
                    padding_allowed,
                    precision,
                    scales_precision,
                )


Does it miss the linear_class? cc. @jerryzh168 @andrewor14

I can fix it in fbcode first I guess

oh my bad, let me submit a PR

Summary: This commit adds support for QAT, where linear layers are fake quantized with int8 per token dynamic activations (8da) and int4 grouped per channel weights (4w). This initial implementation uses the same module swap approach as 8da4w PTQ for simplicity and code reuse. In the future, we may wish to consider migrating both flows to use tensor subclasses for better composability with other PyTorch features. Test Plan: python test/quantization/test_qat.py -k test_fake_quantize_per_channel_group python test/quantization/test_qat.py -k test_fake_quantize_per_token python test/quantization/test_qat.py -k test_qat_8da4w_linear python test/quantization/test_qat.py -k test_qat_8da4w_quantizer Reviewers: jerryzh168, cpuhrsch, HDCharles Subscribers: jerryzh168, cpuhrsch, HDCharles, supriyar Tasks: pytorch#86

andrewor14 requested review from cpuhrsch, jerryzh168 and HDCharles April 16, 2024 00:02

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 16, 2024

andrewor14 force-pushed the 8da4w_qat branch 2 times, most recently from 5b1d404 to 044280a Compare April 16, 2024 00:04

cpuhrsch approved these changes Apr 16, 2024

View reviewed changes

cpuhrsch reviewed Apr 16, 2024

View reviewed changes

andrewor14 force-pushed the 8da4w_qat branch 6 times, most recently from 653a07a to d5cd97f Compare April 16, 2024 20:26

jerryzh168 reviewed Apr 16, 2024

View reviewed changes

jerryzh168 approved these changes Apr 16, 2024

View reviewed changes

andrewor14 force-pushed the 8da4w_qat branch 2 times, most recently from ae8bd77 to 83dd03a Compare April 16, 2024 23:16

andrewor14 force-pushed the 8da4w_qat branch from 83dd03a to 96c29a0 Compare April 18, 2024 15:50

andrewor14 merged commit d3f4a70 into main Apr 18, 2024
13 checks passed

wat3rBro reviewed Apr 19, 2024

View reviewed changes

andrewor14 deleted the 8da4w_qat branch April 19, 2024 22:44

yanbing-j pushed a commit to yanbing-j/ao that referenced this pull request Dec 9, 2024

Update README.md (pytorch#138)

eeb9b31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial support for 8da4w QAT #138

Initial support for 8da4w QAT #138

andrewor14 commented Apr 16, 2024 •

edited

Loading

cpuhrsch left a comment •

edited

Loading

jerryzh168 Apr 16, 2024

jerryzh168 left a comment

wat3rBro Apr 19, 2024

jerryzh168 Apr 19, 2024

jerryzh168 Apr 19, 2024

andrewor14 Apr 19, 2024

Initial support for 8da4w QAT #138

Initial support for 8da4w QAT #138

Conversation

andrewor14 commented Apr 16, 2024 • edited Loading

cpuhrsch left a comment • edited Loading

Choose a reason for hiding this comment

jerryzh168 Apr 16, 2024

Choose a reason for hiding this comment

jerryzh168 left a comment

Choose a reason for hiding this comment

wat3rBro Apr 19, 2024

Choose a reason for hiding this comment

jerryzh168 Apr 19, 2024

Choose a reason for hiding this comment

jerryzh168 Apr 19, 2024

Choose a reason for hiding this comment

andrewor14 Apr 19, 2024

Choose a reason for hiding this comment

andrewor14 commented Apr 16, 2024 •

edited

Loading

cpuhrsch left a comment •

edited

Loading