-
Notifications
You must be signed in to change notification settings - Fork 185
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add layout option to woq int4 api #670
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/670
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 500a456 with merge base 88a263a (): This comment was automatically generated by Dr. CI and updates every 15 minutes. |
@@ -21,7 +21,11 @@ | |||
import torch.nn.functional as F | |||
from typing import Any, Callable, Union, Dict, Optional | |||
|
|||
from torchao.dtypes import PlainLayoutType | |||
from torchao.dtypes import ( | |||
to_affine_quantized, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, if there is no circular dep you can remove the import from other functions as well, e.g. int8_weight_only
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
thanks, although this is bc-breaking, but I don't think there are many callsites changing |
could you write a bc-breaking notes for this? see format in BC-breaking section of https://github.com/pytorch/ao/releases, something like:
|
on it! where should I place the bc-breaking notes? as a comment on this PR or someplace else? |
# for torchao 0.5
from torchao.quantization import quantize_, int4_weight_only
quantize_(my_model, int4_weight_only(inner_k_tiles=8))
# for torchao 0.4
from torchao.quantization import quantize, int4_weight_only
quantize_(my_model, int4_weight_only(layout_type=TensorCoreTiledLayoutType(inner_k_tiles=8))) |
you can put this in PR summary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good, thanks!
done ✅ |
@jerryzh168 do we need to communicate this change to anyone? |
seems fine I think, I didn't see people using |
This reverts commit 009f55f.
* add ET runner to benchmark * remove spurios end * add mps runner and groupsize kludge * adjust groupsize * fortify runners * handle device for export_et
Summary
This PR updates the
int4_weight_only
API by introducing alayout_type
parameter. This will allow users to quantize models to int4 while being able to choose between available layouts.This change also breaks backwards compatibility for all users that explicitly define the
inner_k_tiles
parameter:BC Breaking notes for Release statement
inner_k_tiles
was deprecated in favor oflayout_type
, enabling users to select from various layout options #670Tasks
PRs pending on this one
Questions for Reviewers
Let me know if there is anything that I need to update. I would be happy to perform any necessary changes.