-
Notifications
You must be signed in to change notification settings - Fork 185
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow cpu and gpu in int4wo and int4wo-gptq quantizer #131
Conversation
Do we need to release a 0.1.1 for this? |
it's fine, this is for torchat, and it will be using torchao-nightly. I'll still looking at some perf issue for this, I'll merge after that |
@@ -762,11 +762,15 @@ def _check_linear_int4_k(k, groupsize = 1, inner_k_tiles = None): | |||
return k_divisible_by_groupsize | |||
|
|||
def linear_forward_int4(x, weight_int4pack, scales_and_zeros, out_features, groupsize): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So these conversions to bfloat16 are primarily needed because of _weight_int4pack_mm
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, that's correct
Summary: att Test Plan: verified in torchat Reviewers: Subscribers: Tasks: Tags:
looks like there is no perf issues, I'll just merge |
Summary: att Test Plan: verified in torchat Reviewers: Subscribers: Tasks: Tags:
Summary:
att
Test Plan:
verified in torchat
Reviewers:
Subscribers:
Tasks:
Tags: