Add convert path for 8da4w QAT #154

andrewor14 · 2024-04-22T17:56:32Z

Summary: This commit implements the convert path for 8da4w QAT, which swaps the QAT linear with the quantized linear, and quantizing the weights the same way as the PTQ flow. The result is a model that is identical to the one output by the PTQ flow.

Test Plan:
python test/quantization/test_qat.py -k test_qat_8da4w_quantizer

Reviewers: jerryzh168, cpuhrsch

Subscribers: jerryzh168, cpuhrsch, supriyar

jerryzh168 · 2024-04-22T22:39:05Z

torchao/quantization/prototype/qat.py

@@ -123,6 +155,7 @@ def forward(self, x: torch.Tensor) -> torch.Tensor:
            )
            return torch.nn.functional.linear(x_fq, w_fq)

+        # TODO: move this to common util


right, this probably doesn't have to live here, I'm also adding a new util for this as well in quant_primitives.py as well

jerryzh168

looks good!

Summary: This commit implements the convert path for 8da4w QAT, which swaps the QAT linear with the quantized linear, and quantizing the weights the same way as the PTQ flow. The result is a model that is identical to the one output by the PTQ flow. Test Plan: python test/quantization/test_qat.py -k test_qat_8da4w_quantizer Reviewers: jerryzh168, cpuhrsch Subscribers: jerryzh168, cpuhrsch, supriyar

* clean up gguf loading. Move model loading to meta. * remove cpu * Fix CI and validation scripts (pytorch#154) * missing device (pytorch#232) * Use generator args to group all arguments to generator (pytorch#231) * prompt * chat_mode, num_samples * Move more generator args to use dataclass (pytorch#233) * prompt * chat_mode, num_samples * move more args * more gen args * update * args * undo some changes * typos * Minor lint fixes (pytorch#236) * remove redundancy & remove int4 linear test from ET tests (pytorch#237) * remove redundancy * no int4 linear on ET * small changes --------- Co-authored-by: Guang Yang <[email protected]> Co-authored-by: Michael Gschwind <[email protected]> Co-authored-by: Mergen Nachin <[email protected]>

andrewor14 requested review from cpuhrsch and jerryzh168 April 22, 2024 17:56

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 22, 2024

jerryzh168 reviewed Apr 22, 2024

View reviewed changes

jerryzh168 approved these changes Apr 22, 2024

View reviewed changes

andrewor14 force-pushed the 8da4w_qat_convert branch 2 times, most recently from 4c8663a to 7d93bf7 Compare April 23, 2024 14:05

andrewor14 force-pushed the 8da4w_qat_convert branch from 7d93bf7 to 36fad62 Compare April 24, 2024 19:57

andrewor14 merged commit 03c3529 into main Apr 24, 2024
13 checks passed

andrewor14 deleted the 8da4w_qat_convert branch April 24, 2024 22:06

yanbing-j pushed a commit to yanbing-j/ao that referenced this pull request Dec 9, 2024

Fix CI and validation scripts (pytorch#154)

3ca044f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add convert path for 8da4w QAT #154

Add convert path for 8da4w QAT #154

andrewor14 commented Apr 22, 2024

jerryzh168 Apr 22, 2024

jerryzh168 left a comment

Add convert path for 8da4w QAT #154

Add convert path for 8da4w QAT #154

Conversation

andrewor14 commented Apr 22, 2024

jerryzh168 Apr 22, 2024

Choose a reason for hiding this comment

jerryzh168 left a comment

Choose a reason for hiding this comment