Move quant API to quantization README #142

jerryzh168 · 2024-04-17T05:21:50Z

Summary:
att

Test Plan:
/

Reviewers:

Subscribers:

Tasks:

Tags:

supriyar · 2024-04-17T17:20:13Z

README.md

-4. Integration with other PyTorch native libraries like torchtune and ExecuTorch
+2. [Quantization algorithms](./torchao/quantization) such as dynamic quant, smoothquant, GPTQ that run on CPU/GPU and Mobile.
+3. [Sparsity algorithms](./torchao/sparsity) such as Wanda that help improve accuracy of sparse networks
+4. Integration with other PyTorch native libraries like [torchtune](https://github.com/pytorch/torchtune) and [ExecuTorch](https://github.com/pytorch/executorch)

 ## Key Features
 * Native PyTorch techniques, composable with torch.compile


can you link to the quantization readme from the main page?

supriyar · 2024-04-17T17:20:40Z

torchao/quantization/README.md

+This technique works best when the torch._inductor.config.use_mixed_mm option is enabled. This avoids dequantizing the weight tensor before the matmul, instead fusing the dequantization into the matmul, thereby avoiding materialization of a large floating point weight tensor.
+
+
+## A16W4 WeightOnly Quantization


is it also possible to add an example on how to use GPTQ?

jerryzh168 · 2024-04-17T18:31:26Z

README.md

-2. Quantization [algorithms](./torchao/quantization) such as dynamic quant, smoothquant, GPTQ that run on CPU/GPU and Mobile.
-3. Sparsity [algorithms](./torchao/sparsity) such as Wanda that help improve accuracy of sparse networks
-4. Integration with other PyTorch native libraries like torchtune and ExecuTorch
+2. [Quantization algorithms](./torchao/quantization) such as dynamic quant, smoothquant, GPTQ that run on CPU/GPU and Mobile.


@supriyar I was hoping that people can just find quantization README here, or do you feel we want to make it more explicit?

I do agree that a getting started on the main page would be important to keep. So if GPTQ is the algorithm we feel most people would be interested in let's just show only that and then link to a broader set of algorithms as well

we get some comment from torchtune that community has moved on to other techniques now, so I feel it's fine to keep it in the separate quantization page

jerryzh168 · 2024-04-18T21:09:24Z

README.md

-2. Quantization [algorithms](./torchao/quantization) such as dynamic quant, smoothquant, GPTQ that run on CPU/GPU and Mobile.
-3. Sparsity [algorithms](./torchao/sparsity) such as Wanda that help improve accuracy of sparse networks
-4. Integration with other PyTorch native libraries like torchtune and ExecuTorch
+## Our Goals


@supriyar I also restructured the README a bit, please take a look

jerryzh168 · 2024-04-18T21:10:17Z

README.md

-2. While these techniques are designed to improve model performance, in some cases the opposite can occur. This is because quantization adds additional overhead to the model that is hopefully made up for by faster matmuls (dynamic quantization) or loading weights faster (weight-only quantization). If your matmuls are small enough or your non-quantized perf isn't bottlenecked by weight load time, these techniques may reduce performance.
-3. Use the PyTorch nightlies so you can leverage [tensor subclasses](https://pytorch.org/docs/stable/notes/extending.html#subclassing-torch-tensor) which is preferred over older module swap based methods because it doesn't modify the graph and is generally more composable and flexible.
-
+## Get Started


@msaroufim @supriyar I added a get started section here and linked to the API READEMEs

Summary: att Test Plan: / Reviewers: Subscribers: Tasks: Tags:

supriyar

lgtm

Summary: att Test Plan: / Reviewers: Subscribers: Tasks: Tags:

* fast start instructions * Update GETTING-STARTED.md Co-authored-by: Nikita Shulga <[email protected]> * Update GETTING-STARTED.md Co-authored-by: Nikita Shulga <[email protected]> * Update GETTING-STARTED.md Co-authored-by: Nikita Shulga <[email protected]> --------- Co-authored-by: Nikita Shulga <[email protected]>

jerryzh168 requested a review from HDCharles April 17, 2024 05:21

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 17, 2024

jerryzh168 requested review from cpuhrsch and supriyar April 17, 2024 05:21

cpuhrsch requested a review from msaroufim April 17, 2024 05:43

supriyar reviewed Apr 17, 2024

View reviewed changes

jerryzh168 commented Apr 17, 2024

View reviewed changes

jerryzh168 force-pushed the docs branch 2 times, most recently from 9346a94 to 64e0035 Compare April 18, 2024 21:08

jerryzh168 commented Apr 18, 2024

View reviewed changes

Move quant API to quantization README

9e6b1d6

Summary: att Test Plan: / Reviewers: Subscribers: Tasks: Tags:

jerryzh168 force-pushed the docs branch from 64e0035 to 9e6b1d6 Compare April 18, 2024 21:15

supriyar approved these changes Apr 18, 2024

View reviewed changes

msaroufim approved these changes Apr 19, 2024

View reviewed changes

msaroufim merged commit 8fc95ff into pytorch:main Apr 19, 2024
13 checks passed

dbyoung18 pushed a commit to dbyoung18/ao that referenced this pull request Jul 31, 2024

Move quant API to quantization README (pytorch#142)

b61eb2f

Summary: att Test Plan: / Reviewers: Subscribers: Tasks: Tags:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move quant API to quantization README #142

Move quant API to quantization README #142

jerryzh168 commented Apr 17, 2024

supriyar Apr 17, 2024

supriyar Apr 17, 2024

jerryzh168 Apr 17, 2024

jerryzh168 Apr 17, 2024 •

edited

Loading

msaroufim Apr 18, 2024 •

edited

Loading

jerryzh168 Apr 18, 2024

jerryzh168 Apr 18, 2024

jerryzh168 Apr 18, 2024

supriyar left a comment

		This technique works best when the torch._inductor.config.use_mixed_mm option is enabled. This avoids dequantizing the weight tensor before the matmul, instead fusing the dequantization into the matmul, thereby avoiding materialization of a large floating point weight tensor.


		## A16W4 WeightOnly Quantization

Move quant API to quantization README #142

Move quant API to quantization README #142

Conversation

jerryzh168 commented Apr 17, 2024

supriyar Apr 17, 2024

Choose a reason for hiding this comment

supriyar Apr 17, 2024

Choose a reason for hiding this comment

jerryzh168 Apr 17, 2024

Choose a reason for hiding this comment

jerryzh168 Apr 17, 2024 • edited Loading

Choose a reason for hiding this comment

msaroufim Apr 18, 2024 • edited Loading

Choose a reason for hiding this comment

jerryzh168 Apr 18, 2024

Choose a reason for hiding this comment

jerryzh168 Apr 18, 2024

Choose a reason for hiding this comment

jerryzh168 Apr 18, 2024

Choose a reason for hiding this comment

supriyar left a comment

Choose a reason for hiding this comment

jerryzh168 Apr 17, 2024 •

edited

Loading

msaroufim Apr 18, 2024 •

edited

Loading