Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate top level quantization APIs #344

Merged
merged 1 commit into from
Jun 13, 2024

Conversation

jerryzh168
Copy link
Contributor

@jerryzh168 jerryzh168 commented Jun 11, 2024

Summary:
This PR deprecates a few quantization APIs

Deprecation summary:

deprecated for all pytorch versions (2.2.2, 2.3 and 2.4+): apply_weight_only_int8_quant and apply_dynamic_quant

also deprecated for 2.4+: change_linear_weights_to_int8_woqtensors, change_linear_weights_to_int8_dqtensors and change_linear_weights_to_int4_wotensors

BC-breaking notes

for torch version 2.3 and before, we are keeping the the change_linear_weights_... APIs, since the new quantize API needs a parametrization fix (pytorch/pytorch#124888) to work

1. int8 weight only quantization int8 weight only quant module swap API

torch 2.4+

apply_weight_only_int8_quant(model)
# or 
change_linear_weights_to_int8_woqtensors(model)

-->

quantize(model, "int8_weight_only")

torch 2.2.2 and 2.3

apply_weight_only_int8_quant(model)

-->

change_linear_weights_to_int8_woqtensors(model)

2. int8 dynamic quantization

torch 2.4+

apply_dynamic_quant(model)
# or
change_linear_weights_to_int8_dqtensors(model)

-->

quantize(model, "int8_dynamic")

torch 2.2.2 and 2.3

apply_dynamic_quant(model)

-->

change_linear_weights_to_int8_dynqtensors(model)

3. int4 weight only quantization

torch 2.4+

change_linear_weights_to_int4_wotensors(model)

-->

quantize(model, "int4_weight_only")

torch 2.2.2 and 2.3

no change

Test Plan:
python test/quantization/test_quant_api.py
python test/integration/test_integration.py

Reviewers:

Subscribers:

Tasks:

Tags:

Copy link

pytorch-bot bot commented Jun 11, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/344

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 1648e69 with merge base 0bde6d5 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 11, 2024
@jerryzh168 jerryzh168 force-pushed the deprecate-quant-api branch from d382147 to 5f59cbb Compare June 11, 2024 23:46
@HDCharles
Copy link
Contributor

seems ok but i would maybe check partners like torchchat/torchtune...etc for those api's since they're what had been used previously

also is it possible to check for usage of these apis and give a better error

like if someone tried to use change_linear_weight_to_int8dqtensor it'd be nice if we directly caught that error and said 'use this instead'

@jerryzh168
Copy link
Contributor Author

torchtune has version guard so should be fine I think. executorch is not using APIs touched by the PR. torchchat is also not using these APIs yet.

yeah we could catch the usage and give a better error although that would mean we are keeping these things in the code base for a bit longer, I can add these though

@jerryzh168
Copy link
Contributor Author

actually I still want to remove these APIs from the list, so let's just break BC for now

@jerryzh168 jerryzh168 force-pushed the deprecate-quant-api branch 2 times, most recently from d206f3c to 51f8441 Compare June 12, 2024 02:04
test/quantization/test_quant_api.py Outdated Show resolved Hide resolved
torchao/dtypes/aqt.py Outdated Show resolved Hide resolved
test/integration/test_integration.py Outdated Show resolved Hide resolved
test/integration/test_integration.py Outdated Show resolved Hide resolved
test/quantization/test_quant_api.py Outdated Show resolved Hide resolved
torchao/quantization/README.md Outdated Show resolved Hide resolved
`torch.export.export` and `torch.aot_compile` with the following workaround:
```
from torchao.quantization.utils import unwrap_tensor_subclass
m_unwrapped = unwrap_tensor_subclass(m)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comes out of nowhere and should either be eliminated as part of the quantize api or explained better

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is temporary I think, also user don't need to understand details for this one? can you clarify a bit on how to explain better for this one?

torchao/quantization/README.md Outdated Show resolved Hide resolved
torch._export.aot_compile(m_unwrapped, example_inputs)
```

But we expect this will be integrated into the export path by default in the future.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't add todos in docs, add them in github issues and assign them to yourself

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added #345

torchao/quantization/README.md Outdated Show resolved Hide resolved
@jerryzh168
Copy link
Contributor Author

so the new quant api + unwrap_tensor_subclass workaround actually only works for 2.4+ (since we have a fix pytorch/pytorch#124888)

that means we can't really remove the old implementations at this point, I'm thinking of just keep the old APIs for now as private APIs and remove these when we set our minimum support to pytorch 2.4+

@jerryzh168 jerryzh168 force-pushed the deprecate-quant-api branch from 51f8441 to b66f0cf Compare June 12, 2024 03:21
@jerryzh168 jerryzh168 requested a review from msaroufim June 12, 2024 03:23
@jerryzh168 jerryzh168 force-pushed the deprecate-quant-api branch 8 times, most recently from a75b199 to f2b9890 Compare June 12, 2024 17:36
@jerryzh168 jerryzh168 dismissed msaroufim’s stale review June 12, 2024 18:37

addressed comments, please take a look again

@jerryzh168 jerryzh168 force-pushed the deprecate-quant-api branch 2 times, most recently from f5961b2 to 209ab7a Compare June 12, 2024 23:00
Summary:
This PR deprecates a few quantization APIs and here are the bc-breaking notes:

1. int8 weight only quantization
int8 weight only quant module swap API
```
apply_weight_only_int8_quant(model)
```

and
int8 weight only tensor subclass API
```
change_linear_weights_to_int8_woqtensors(model)
```

-->

unified tensor subclass API
```
quantize(model, get_apply_int8wo_quant()))
```

2. int8 dynamic quantization

```
apply_dynamic_quant(model)
```
or
```
change_linear_weights_to_int8_dqtensors(model)
```

-->

unified tensor subclass API
```
quantize(model, get_apply_int8dyn_quant()))
```

3. int4 weight only quantization
```
change_linear_weights_to_int4_wotensors(model)
```

-->

unified tensor subclass API
```
quantize(model, get_apply_int4wo_quant()))
```

Test Plan:
python test/quantization/test_quant_api.py
python test/integration/test_integration.py

Reviewers:

Subscribers:

Tasks:

Tags:
@jerryzh168 jerryzh168 force-pushed the deprecate-quant-api branch from 209ab7a to 1648e69 Compare June 12, 2024 23:10
@msaroufim msaroufim mentioned this pull request Jun 13, 2024
@jerryzh168 jerryzh168 merged commit c2235af into pytorch:main Jun 13, 2024
13 checks passed
@jerryzh168 jerryzh168 deleted the deprecate-quant-api branch June 13, 2024 01:56
jerryzh168 added a commit to jerryzh168/ao that referenced this pull request Jun 13, 2024
Summary:
This PR deprecates a few quantization APIs and here are the bc-breaking notes:

1. int8 weight only quantization
int8 weight only quant module swap API
```
apply_weight_only_int8_quant(model)
```

and
int8 weight only tensor subclass API
```
change_linear_weights_to_int8_woqtensors(model)
```

-->

unified tensor subclass API
```
quantize(model, get_apply_int8wo_quant()))
```

2. int8 dynamic quantization

```
apply_dynamic_quant(model)
```
or
```
change_linear_weights_to_int8_dqtensors(model)
```

-->

unified tensor subclass API
```
quantize(model, get_apply_int8dyn_quant()))
```

3. int4 weight only quantization
```
change_linear_weights_to_int4_wotensors(model)
```

-->

unified tensor subclass API
```
quantize(model, get_apply_int4wo_quant()))
```

Test Plan:
python test/quantization/test_quant_api.py
python test/integration/test_integration.py

Reviewers:

Subscribers:

Tasks:

Tags:
jerryzh168 added a commit that referenced this pull request Jun 13, 2024
Summary:
This PR deprecates a few quantization APIs and here are the bc-breaking notes:

1. int8 weight only quantization
int8 weight only quant module swap API
```
apply_weight_only_int8_quant(model)
```

and
int8 weight only tensor subclass API
```
change_linear_weights_to_int8_woqtensors(model)
```

-->

unified tensor subclass API
```
quantize(model, get_apply_int8wo_quant()))
```

2. int8 dynamic quantization

```
apply_dynamic_quant(model)
```
or
```
change_linear_weights_to_int8_dqtensors(model)
```

-->

unified tensor subclass API
```
quantize(model, get_apply_int8dyn_quant()))
```

3. int4 weight only quantization
```
change_linear_weights_to_int4_wotensors(model)
```

-->

unified tensor subclass API
```
quantize(model, get_apply_int4wo_quant()))
```

Test Plan:
python test/quantization/test_quant_api.py
python test/integration/test_integration.py

Reviewers:

Subscribers:

Tasks:

Tags:

# for torch 2.4+
from torchao.quantization.quant_api import quantize
quantize(model, "int8_dynamic_quant")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jerryzh168 this should be "int8_dynamic" right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh right

@ebsmothers
Copy link
Contributor

Hi @jerryzh168 this breaks torchtune when we run on ao nightlies. Ref

dbyoung18 pushed a commit to dbyoung18/ao that referenced this pull request Jul 31, 2024
Summary:
This PR deprecates a few quantization APIs and here are the bc-breaking notes:

1. int8 weight only quantization
int8 weight only quant module swap API
```
apply_weight_only_int8_quant(model)
```

and
int8 weight only tensor subclass API
```
change_linear_weights_to_int8_woqtensors(model)
```

-->

unified tensor subclass API
```
quantize(model, get_apply_int8wo_quant()))
```

2. int8 dynamic quantization

```
apply_dynamic_quant(model)
```
or
```
change_linear_weights_to_int8_dqtensors(model)
```

-->

unified tensor subclass API
```
quantize(model, get_apply_int8dyn_quant()))
```

3. int4 weight only quantization
```
change_linear_weights_to_int4_wotensors(model)
```

-->

unified tensor subclass API
```
quantize(model, get_apply_int4wo_quant()))
```

Test Plan:
python test/quantization/test_quant_api.py
python test/integration/test_integration.py

Reviewers:

Subscribers:

Tasks:

Tags:
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants