Allow cpu and gpu in int4wo and int4wo-gptq quantizer #131

jerryzh168 · 2024-04-12T04:56:16Z

Summary:
att

Test Plan:
verified in torchat

Reviewers:

Subscribers:

Tasks:

Tags:

cpuhrsch · 2024-04-12T06:19:09Z

Do we need to release a 0.1.1 for this?

jerryzh168 · 2024-04-12T17:00:16Z

Do we need to release a 0.1.1 for this?

it's fine, this is for torchat, and it will be using torchao-nightly. I'll still looking at some perf issue for this, I'll merge after that

cpuhrsch · 2024-04-12T17:02:00Z

torchao/quantization/GPTQ.py

@@ -762,11 +762,15 @@ def _check_linear_int4_k(k, groupsize = 1, inner_k_tiles = None):
    return k_divisible_by_groupsize

 def linear_forward_int4(x, weight_int4pack, scales_and_zeros, out_features, groupsize):


So these conversions to bfloat16 are primarily needed because of _weight_int4pack_mm?

yes, that's correct

Summary: att Test Plan: verified in torchat Reviewers: Subscribers: Tasks: Tags:

jerryzh168 · 2024-04-12T17:23:39Z

looks like there is no perf issues, I'll just merge

Summary: att Test Plan: verified in torchat Reviewers: Subscribers: Tasks: Tags:

…ytorch#131)

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 12, 2024

cpuhrsch requested a review from HDCharles April 12, 2024 06:18

cpuhrsch approved these changes Apr 12, 2024

View reviewed changes

cpuhrsch reviewed Apr 12, 2024

View reviewed changes

Allow cpu and gpu in int4wo and int4wo-gptq quantizer

f85e37c

Summary: att Test Plan: verified in torchat Reviewers: Subscribers: Tasks: Tags:

jerryzh168 force-pushed the device branch from af416d9 to f85e37c Compare April 12, 2024 17:14

jerryzh168 merged commit b9beaf3 into main Apr 12, 2024
7 checks passed

dbyoung18 pushed a commit to dbyoung18/ao that referenced this pull request Jul 31, 2024

Allow cpu and gpu in int4wo and int4wo-gptq quantizer (pytorch#131)

13d3ac9

Summary: att Test Plan: verified in torchat Reviewers: Subscribers: Tasks: Tags:

yanbing-j pushed a commit to yanbing-j/ao that referenced this pull request Dec 9, 2024

move slow tests to parking lot until we configure it as nightly runs (p…

786dc90

…ytorch#131)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow cpu and gpu in int4wo and int4wo-gptq quantizer #131

Allow cpu and gpu in int4wo and int4wo-gptq quantizer #131

jerryzh168 commented Apr 12, 2024

cpuhrsch commented Apr 12, 2024

jerryzh168 commented Apr 12, 2024

cpuhrsch Apr 12, 2024

jerryzh168 Apr 12, 2024

jerryzh168 commented Apr 12, 2024

		@@ -762,11 +762,15 @@ def _check_linear_int4_k(k, groupsize = 1, inner_k_tiles = None):
		return k_divisible_by_groupsize

		def linear_forward_int4(x, weight_int4pack, scales_and_zeros, out_features, groupsize):

Allow cpu and gpu in int4wo and int4wo-gptq quantizer #131

Allow cpu and gpu in int4wo and int4wo-gptq quantizer #131

Conversation

jerryzh168 commented Apr 12, 2024

cpuhrsch commented Apr 12, 2024

jerryzh168 commented Apr 12, 2024

cpuhrsch Apr 12, 2024

Choose a reason for hiding this comment

jerryzh168 Apr 12, 2024

Choose a reason for hiding this comment

jerryzh168 commented Apr 12, 2024