Add layout option to woq int4 api #670

Diogo-V · 2024-08-13T23:06:48Z

Summary

This PR updates the int4_weight_only API by introducing a layout_type parameter. This will allow users to quantize models to int4 while being able to choose between available layouts.

This change also breaks backwards compatibility for all users that explicitly define the inner_k_tiles parameter:

# old
def int4_weight_only(group_size=128, inner_k_tiles=8):
...

quantize_(my_model, int4_weight_only(inner_k_tiles=8))

# new
def int4_weight_only(group_size=128, layout_type=TensorCoreTiledLayoutType(inner_k_tiles=8)):
...

quantize_(my_model, int4_weight_only(layout_type=TensorCoreTiledLayoutType(inner_k_tiles=8)))

BC Breaking notes for Release statement

inner_k_tiles was deprecated in favor of layout_type, enabling users to select from various layout options #670

# for torchao 0.5
from torchao.quantization import quantize, int4_weight_only
quantize_(my_model, int4_weight_only(layout_type=TensorCoreTiledLayoutType(inner_k_tiles=8)))

# for torchao 0.4
from torchao.quantization import quantize_, int4_weight_only
quantize_(my_model, int4_weight_only(inner_k_tiles=8))

Tasks

Update the function's interface
Update the documentation
Update the tests

PRs pending on this one

Add sparse marlin AQT layout #621

Questions for Reviewers

I am not familiar with how releases and backwards compatibility is maintained in pytorch and since this is a breaking change, how should we handle delivering it? Is there a specific flag/if statement that I should add somewhere?

Let me know if there is anything that I need to update. I would be happy to perform any necessary changes.

pytorch-bot · 2024-08-13T23:06:51Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/670

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 500a456 with merge base 88a263a ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

jerryzh168 · 2024-08-13T23:43:50Z

torchao/quantization/quant_api.py

@@ -21,7 +21,11 @@
 import torch.nn.functional as F
 from typing import Any, Callable, Union, Dict, Optional

-from torchao.dtypes import PlainLayoutType
+from torchao.dtypes import (
+    to_affine_quantized, 


thanks, if there is no circular dep you can remove the import from other functions as well, e.g. int8_weight_only

jerryzh168 · 2024-08-13T23:45:27Z

thanks, although this is bc-breaking, but I don't think there are many callsites changing inner_k_tiles so it probably won't affect many people I think

jerryzh168 · 2024-08-14T00:20:06Z

could you write a bc-breaking notes for this? see format in BC-breaking section of https://github.com/pytorch/ao/releases, something like:

# for torchao 0.5
...

# for torchao 0.4
...

Diogo-V · 2024-08-14T00:25:30Z

on it! where should I place the bc-breaking notes? as a comment on this PR or someplace else?

Diogo-V · 2024-08-14T00:36:17Z

inner_k_tiles was deprecated in favor of layout_type, enabling users to select from various layout options #670

# for torchao 0.5
from torchao.quantization import quantize_, int4_weight_only
quantize_(my_model, int4_weight_only(inner_k_tiles=8))

# for torchao 0.4
from torchao.quantization import quantize, int4_weight_only
quantize_(my_model, int4_weight_only(layout_type=TensorCoreTiledLayoutType(inner_k_tiles=8)))

jerryzh168 · 2024-08-14T02:18:43Z

on it! where should I place the bc-breaking notes? as a comment on this PR or someplace else?

you can put this in PR summary

jerryzh168

looks good, thanks!

Diogo-V · 2024-08-14T07:25:30Z

done ✅
feel free to merge it or let me know if there is anything else that I might need to change

msaroufim · 2024-08-14T16:31:16Z

@jerryzh168 do we need to communicate this change to anyone?

jerryzh168 · 2024-08-14T16:48:35Z

@jerryzh168 do we need to communicate this change to anyone?

seems fine I think, I didn't see people using inner_k_tiles yet, but we can probably update huggingface when 0.5 is released

This reverts commit 009f55f.

* add ET runner to benchmark * remove spurios end * add mps runner and groupsize kludge * adjust groupsize * fortify runners * handle device for export_et

Diogo-V added 2 commits August 13, 2024 22:05

feat: add layout option to woq int4 api

cd883d8

chore: update tests

6028093

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 13, 2024

Diogo-V changed the title ~~feat: add layout option to woq int4 api~~ Add layout option to woq int4 api Aug 13, 2024

jerryzh168 reviewed Aug 13, 2024

View reviewed changes

jerryzh168 added the topic: bc-breaking Use this tag if this PR breaks backward compatibility label Aug 13, 2024

chore: move imports to top of the file

500a456

Diogo-V marked this pull request as ready for review August 14, 2024 00:21

Diogo-V requested a review from jerryzh168 August 14, 2024 00:36

jerryzh168 approved these changes Aug 14, 2024

View reviewed changes

jerryzh168 merged commit 009f55f into pytorch:main Aug 14, 2024
14 checks passed

msaroufim added a commit that referenced this pull request Aug 14, 2024

Revert "Add layout option to woq int4 api (#670)"

8b7b538

This reverts commit 009f55f.

Diogo-V mentioned this pull request Aug 14, 2024

Revert "Add layout option to woq int4 api (#670)" #678

Closed

yanbing-j pushed a commit to yanbing-j/ao that referenced this pull request Dec 9, 2024

Macos+kludge (pytorch#670)

3eb5db7

* add ET runner to benchmark * remove spurios end * add mps runner and groupsize kludge * adjust groupsize * fortify runners * handle device for export_et

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add layout option to woq int4 api #670

Add layout option to woq int4 api #670

Diogo-V commented Aug 13, 2024 •

edited by jerryzh168

Loading

pytorch-bot bot commented Aug 13, 2024 •

edited

Loading

jerryzh168 Aug 13, 2024

Diogo-V Aug 14, 2024

jerryzh168 commented Aug 13, 2024 •

edited

Loading

jerryzh168 commented Aug 14, 2024 •

edited

Loading

Diogo-V commented Aug 14, 2024 •

edited

Loading

Diogo-V commented Aug 14, 2024

jerryzh168 commented Aug 14, 2024

jerryzh168 left a comment

Diogo-V commented Aug 14, 2024

msaroufim commented Aug 14, 2024

jerryzh168 commented Aug 14, 2024

Add layout option to woq int4 api #670

Add layout option to woq int4 api #670

Conversation

Diogo-V commented Aug 13, 2024 • edited by jerryzh168 Loading

Summary

BC Breaking notes for Release statement

Tasks

PRs pending on this one

Questions for Reviewers

pytorch-bot bot commented Aug 13, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/670

✅ No Failures

jerryzh168 Aug 13, 2024

Choose a reason for hiding this comment

Diogo-V Aug 14, 2024

Choose a reason for hiding this comment

jerryzh168 commented Aug 13, 2024 • edited Loading

jerryzh168 commented Aug 14, 2024 • edited Loading

Diogo-V commented Aug 14, 2024 • edited Loading

Diogo-V commented Aug 14, 2024

jerryzh168 commented Aug 14, 2024

jerryzh168 left a comment

Choose a reason for hiding this comment

Diogo-V commented Aug 14, 2024

msaroufim commented Aug 14, 2024

jerryzh168 commented Aug 14, 2024

Diogo-V commented Aug 13, 2024 •

edited by jerryzh168

Loading

pytorch-bot bot commented Aug 13, 2024 •

edited

Loading

jerryzh168 commented Aug 13, 2024 •

edited

Loading

jerryzh168 commented Aug 14, 2024 •

edited

Loading

Diogo-V commented Aug 14, 2024 •

edited

Loading