Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add convert path for 8da4w QAT #154

Merged
merged 1 commit into from
Apr 24, 2024
Merged

Add convert path for 8da4w QAT #154

merged 1 commit into from
Apr 24, 2024

Conversation

andrewor14
Copy link
Contributor

Summary: This commit implements the convert path for 8da4w QAT, which swaps the QAT linear with the quantized linear, and quantizing the weights the same way as the PTQ flow. The result is a model that is identical to the one output by the PTQ flow.

Test Plan:
python test/quantization/test_qat.py -k test_qat_8da4w_quantizer

Reviewers: jerryzh168, cpuhrsch

Subscribers: jerryzh168, cpuhrsch, supriyar

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 22, 2024
@@ -123,6 +155,7 @@ def forward(self, x: torch.Tensor) -> torch.Tensor:
)
return torch.nn.functional.linear(x_fq, w_fq)

# TODO: move this to common util
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, this probably doesn't have to live here, I'm also adding a new util for this as well in quant_primitives.py as well

Copy link
Contributor

@jerryzh168 jerryzh168 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good!

@andrewor14 andrewor14 force-pushed the 8da4w_qat_convert branch 2 times, most recently from 4c8663a to 7d93bf7 Compare April 23, 2024 14:05
Summary: This commit implements the convert path for 8da4w QAT,
which swaps the QAT linear with the quantized linear, and
quantizing the weights the same way as the PTQ flow. The result
is a model that is identical to the one output by the PTQ flow.

Test Plan:
python test/quantization/test_qat.py -k test_qat_8da4w_quantizer

Reviewers: jerryzh168, cpuhrsch

Subscribers: jerryzh168, cpuhrsch, supriyar
@andrewor14 andrewor14 merged commit 03c3529 into main Apr 24, 2024
13 checks passed
@andrewor14 andrewor14 deleted the 8da4w_qat_convert branch April 24, 2024 22:06
dbyoung18 pushed a commit to dbyoung18/ao that referenced this pull request Jul 31, 2024
Summary: This commit implements the convert path for 8da4w QAT,
which swaps the QAT linear with the quantized linear, and
quantizing the weights the same way as the PTQ flow. The result
is a model that is identical to the one output by the PTQ flow.

Test Plan:
python test/quantization/test_qat.py -k test_qat_8da4w_quantizer

Reviewers: jerryzh168, cpuhrsch

Subscribers: jerryzh168, cpuhrsch, supriyar
yanbing-j pushed a commit to yanbing-j/ao that referenced this pull request Dec 9, 2024
yanbing-j pushed a commit to yanbing-j/ao that referenced this pull request Dec 9, 2024
* clean up gguf loading.  Move model loading to meta.

* remove cpu

* Fix CI and validation scripts (pytorch#154)

* missing device (pytorch#232)

* Use generator args to group all arguments to generator (pytorch#231)

* prompt

* chat_mode, num_samples

* Move more generator args to use dataclass (pytorch#233)

* prompt

* chat_mode, num_samples

* move more args

* more gen args

* update

* args

* undo some changes

* typos

* Minor lint fixes (pytorch#236)

* remove redundancy & remove int4 linear test from ET tests (pytorch#237)

* remove redundancy

* no int4 linear on ET

* small changes

---------

Co-authored-by: Guang Yang <[email protected]>
Co-authored-by: Michael Gschwind <[email protected]>
Co-authored-by: Mergen Nachin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants