New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Deprecate top level quantization APIs #344

Merged

jerryzh168 merged 1 commit into pytorch:main from jerryzh168:deprecate-quant-api

Jun 13, 2024

Contributor

jerryzh168 commented Jun 11, 2024 •

edited

Loading

Summary:
This PR deprecates a few quantization APIs

Deprecation summary:

deprecated for all pytorch versions (2.2.2, 2.3 and 2.4+): apply_weight_only_int8_quant and apply_dynamic_quant

also deprecated for 2.4+: change_linear_weights_to_int8_woqtensors, change_linear_weights_to_int8_dqtensors and change_linear_weights_to_int4_wotensors

BC-breaking notes

for torch version 2.3 and before, we are keeping the the change_linear_weights_... APIs, since the new quantize API needs a parametrization fix (pytorch/pytorch#124888) to work

1. int8 weight only quantization int8 weight only quant module swap API

torch 2.4+

apply_weight_only_int8_quant(model)
# or 
change_linear_weights_to_int8_woqtensors(model)

-->

quantize(model, "int8_weight_only")

torch 2.2.2 and 2.3

apply_weight_only_int8_quant(model)

-->

change_linear_weights_to_int8_woqtensors(model)

2. int8 dynamic quantization

torch 2.4+

apply_dynamic_quant(model)
# or
change_linear_weights_to_int8_dqtensors(model)

-->

quantize(model, "int8_dynamic")

torch 2.2.2 and 2.3

apply_dynamic_quant(model)

-->

change_linear_weights_to_int8_dynqtensors(model)

3. int4 weight only quantization

torch 2.4+

change_linear_weights_to_int4_wotensors(model)

-->

quantize(model, "int4_weight_only")

torch 2.2.2 and 2.3

no change

Test Plan:
python test/quantization/test_quant_api.py
python test/integration/test_integration.py

Reviewers:

Subscribers:

Tasks:

Tags:

pytorch-bot bot commented Jun 11, 2024 •

edited

Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/344

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 1648e69 with merge base 0bde6d5 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot added the CLA Signed label

jerryzh168 force-pushed the deprecate-quant-api branch from d382147 to 5f59cbb Compare

June 11, 2024 23:46

jerryzh168 requested review from HDCharles, cpuhrsch and msaroufim

June 11, 2024 23:47

Contributor

HDCharles commented Jun 11, 2024

seems ok but i would maybe check partners like torchchat/torchtune...etc for those api's since they're what had been used previously

also is it possible to check for usage of these apis and give a better error

like if someone tried to use change_linear_weight_to_int8dqtensor it'd be nice if we directly caught that error and said 'use this instead'

HDCharles approved these changes

View reviewed changes

Contributor Author

jerryzh168 commented Jun 12, 2024

torchtune has version guard so should be fine I think. executorch is not using APIs touched by the PR. torchchat is also not using these APIs yet.

yeah we could catch the usage and give a better error although that would mean we are keeping these things in the code base for a bit longer, I can add these though

Contributor Author

jerryzh168 commented Jun 12, 2024

actually I still want to remove these APIs from the list, so let's just break BC for now

jerryzh168 force-pushed the deprecate-quant-api branch 2 times, most recently from d206f3c to 51f8441 Compare

June 12, 2024 02:04

msaroufim previously requested changes

View reviewed changes

test/quantization/test_quant_api.py Outdated Show resolved Hide resolved

torchao/dtypes/aqt.py Outdated Show resolved Hide resolved

test/integration/test_integration.py Outdated Show resolved Hide resolved

test/integration/test_integration.py Outdated Show resolved Hide resolved

test/quantization/test_quant_api.py Outdated Show resolved Hide resolved

torchao/quantization/README.md Outdated Show resolved Hide resolved

torchao/quantization/README.md Outdated

+              `torch.export.export` and `torch.aot_compile` with the following workaround:
+              ```
+              from torchao.quantization.utils import unwrap_tensor_subclass
+              m_unwrapped = unwrap_tensor_subclass(m)

Member

msaroufim Jun 12, 2024

this comes out of nowhere and should either be eliminated as part of the quantize api or explained better

Contributor Author

jerryzh168 Jun 12, 2024

this is temporary I think, also user don't need to understand details for this one? can you clarify a bit on how to explain better for this one?

torchao/quantization/README.md Outdated Show resolved Hide resolved

torchao/quantization/README.md Outdated

+              torch._export.aot_compile(m_unwrapped, example_inputs)
+              ```
+              But we expect this will be integrated into the export path by default in the future.

Member

msaroufim Jun 12, 2024

Don't add todos in docs, add them in github issues and assign them to yourself

Contributor Author

jerryzh168 Jun 12, 2024

added #345

torchao/quantization/README.md Outdated Show resolved Hide resolved

msaroufim reviewed

View reviewed changes

torchao/utils.py Show resolved Hide resolved

Contributor Author

jerryzh168 commented Jun 12, 2024

so the new quant api + unwrap_tensor_subclass workaround actually only works for 2.4+ (since we have a fix pytorch/pytorch#124888)

that means we can't really remove the old implementations at this point, I'm thinking of just keep the old APIs for now as private APIs and remove these when we set our minimum support to pytorch 2.4+

jerryzh168 force-pushed the deprecate-quant-api branch from 51f8441 to b66f0cf Compare

June 12, 2024 03:21

jerryzh168 requested a review from msaroufim

June 12, 2024 03:23

supriyar reviewed

View reviewed changes

torchao/quantization/README.md Outdated Show resolved Hide resolved

supriyar reviewed

View reviewed changes

tutorials/quantize_vit/run_vit_b_quant.py Outdated Show resolved Hide resolved

jerryzh168 force-pushed the deprecate-quant-api branch 8 times, most recently from a75b199 to f2b9890 Compare

June 12, 2024 17:36

jerryzh168 dismissed msaroufim’s stale review

June 12, 2024 18:37

addressed comments, please take a look again

jerryzh168 force-pushed the deprecate-quant-api branch 2 times, most recently from f5961b2 to 209ab7a Compare

June 12, 2024 23:00


          Deprecate top level quantization APIs

1648e69

Summary:
This PR deprecates a few quantization APIs and here are the bc-breaking notes:

1. int8 weight only quantization
int8 weight only quant module swap API
```
apply_weight_only_int8_quant(model)
```

and
int8 weight only tensor subclass API
```
change_linear_weights_to_int8_woqtensors(model)
```

-->

unified tensor subclass API
```
quantize(model, get_apply_int8wo_quant()))
```

2. int8 dynamic quantization

```
apply_dynamic_quant(model)
```
or
```
change_linear_weights_to_int8_dqtensors(model)
```

-->

unified tensor subclass API
```
quantize(model, get_apply_int8dyn_quant()))
```

3. int4 weight only quantization
```
change_linear_weights_to_int4_wotensors(model)
```

-->

unified tensor subclass API
```
quantize(model, get_apply_int4wo_quant()))
```

Test Plan:
python test/quantization/test_quant_api.py
python test/integration/test_integration.py

Reviewers:

Subscribers:

Tasks:

Tags:

jerryzh168 force-pushed the deprecate-quant-api branch from 209ab7a to 1648e69 Compare

June 12, 2024 23:10

msaroufim approved these changes

View reviewed changes

msaroufim mentioned this pull request

MX types failing CI #356

Closed

jerryzh168 merged commit c2235af into pytorch:main

13 checks passed

jerryzh168 deleted the deprecate-quant-api branch

June 13, 2024 01:56

jerryzh168 added a commit to jerryzh168/ao that referenced this pull request


          Deprecate top level quantization APIs (pytorch#344)

e9fde84

Summary:
This PR deprecates a few quantization APIs and here are the bc-breaking notes:

1. int8 weight only quantization
int8 weight only quant module swap API
```
apply_weight_only_int8_quant(model)
```

and
int8 weight only tensor subclass API
```
change_linear_weights_to_int8_woqtensors(model)
```

-->

unified tensor subclass API
```
quantize(model, get_apply_int8wo_quant()))
```

2. int8 dynamic quantization

```
apply_dynamic_quant(model)
```
or
```
change_linear_weights_to_int8_dqtensors(model)
```

-->

unified tensor subclass API
```
quantize(model, get_apply_int8dyn_quant()))
```

3. int4 weight only quantization
```
change_linear_weights_to_int4_wotensors(model)
```

-->

unified tensor subclass API
```
quantize(model, get_apply_int4wo_quant()))
```

Test Plan:
python test/quantization/test_quant_api.py
python test/integration/test_integration.py

Reviewers:

Subscribers:

Tasks:

Tags:

jerryzh168 added a commit that referenced this pull request


          Deprecate top level quantization APIs (#344) (#357)

7e0027a

Summary:
This PR deprecates a few quantization APIs and here are the bc-breaking notes:

1. int8 weight only quantization
int8 weight only quant module swap API
```
apply_weight_only_int8_quant(model)
```

and
int8 weight only tensor subclass API
```
change_linear_weights_to_int8_woqtensors(model)
```

-->

unified tensor subclass API
```
quantize(model, get_apply_int8wo_quant()))
```

2. int8 dynamic quantization

```
apply_dynamic_quant(model)
```
or
```
change_linear_weights_to_int8_dqtensors(model)
```

-->

unified tensor subclass API
```
quantize(model, get_apply_int8dyn_quant()))
```

3. int4 weight only quantization
```
change_linear_weights_to_int4_wotensors(model)
```

-->

unified tensor subclass API
```
quantize(model, get_apply_int4wo_quant()))
```

Test Plan:
python test/quantization/test_quant_api.py
python test/integration/test_integration.py

Reviewers:

Subscribers:

Tasks:

Tags:

andrewor14 reviewed

View reviewed changes

torchao/quantization/README.md

+              # for torch 2.4+
+              from torchao.quantization.quant_api import quantize
+              quantize(model, "int8_dynamic_quant")

Contributor

andrewor14 Jun 13, 2024

@jerryzh168 this should be "int8_dynamic" right?

Contributor Author

jerryzh168 Jun 17, 2024

oh right

Contributor

ebsmothers commented Jun 15, 2024

Hi @jerryzh168 this breaks torchtune when we run on ao nightlies. Ref

dbyoung18 pushed a commit to dbyoung18/ao that referenced this pull request


          Deprecate top level quantization APIs (pytorch#344)

221514e

Summary:
This PR deprecates a few quantization APIs and here are the bc-breaking notes:

1. int8 weight only quantization
int8 weight only quant module swap API
```
apply_weight_only_int8_quant(model)
```

and
int8 weight only tensor subclass API
```
change_linear_weights_to_int8_woqtensors(model)
```

-->

unified tensor subclass API
```
quantize(model, get_apply_int8wo_quant()))
```

2. int8 dynamic quantization

```
apply_dynamic_quant(model)
```
or
```
change_linear_weights_to_int8_dqtensors(model)
```

-->

unified tensor subclass API
```
quantize(model, get_apply_int8dyn_quant()))
```

3. int4 weight only quantization
```
change_linear_weights_to_int4_wotensors(model)
```

-->

unified tensor subclass API
```
quantize(model, get_apply_int4wo_quant()))
```

Test Plan:
python test/quantization/test_quant_api.py
python test/integration/test_integration.py

Reviewers:

Subscribers:

Tasks:

Tags:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

supriyar supriyar left review comments

andrewor14 andrewor14 left review comments

HDCharles HDCharles approved these changes

msaroufim msaroufim approved these changes

cpuhrsch Awaiting requested review from cpuhrsch

Labels