[Torchax] Support Running AWQ models by kyuyeunk · Pull Request #618 · vllm-project/tpu-inference

kyuyeunk · 2025-09-02T03:47:48Z

Description

Following PR adds support for running loading & running AWQ models in torchax
- Please note that this requires a fix in vLLM repo to be submitted as well: [Bugfix] Fix packed_factor missing attribute error vllm#23902
Using model Qwen/Qwen2.5-32B-Instruct-AWQ, I have verified that it'ss running correctly.
- Numeric validation
  - Command TPU_BACKEND_TYPE=jax MODEL_IMPL_TYPE=vllm python3 examples/offline_inference.py --model=Qwen/Qwen2.5-32B-Instruct-AWQ --tensor_parallel_size=8 --task=generate --max_model_len=1024 --download_dir=/mnt/disks/persist
  - Result: http://gpaste/6205721003425792
- Performance validation (degradation over baseline is expected)
  - Baseline (Qwen/Qwen2.5-32B-Instruct): 12.61 reqs/s
  - AWQ (Qwen/Qwen2.5-32B-Instruct-AWQ): 11.34 reqs/s
Limitation / Future work
- AWQ uses w4a16 subchannel quantization, and for initial implementation, I have utilized XLA instead of kernel. However, there is known performance issue with subchannel quantization in XLA. Therefore, we need to implement w4a16 subchannel support into the kernel for optimal performance.

Tests

pytest -v tests/models/vllm/layers/test_awq.py

Checklist

Before submitting this PR, please make sure:

I have performed a self-review of my code.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have made or will make corresponding changes to any relevant documentation.

github-actions · 2025-09-02T03:58:35Z

Description

Start with a short description of what the PR does and how this is a change from
the past.

The rest of the description includes relevant details and context, examples:

why is this change being made,
the problem being solved and any relevant context,
why this is a good solution,
some information about the specific implementation,
shortcomings of the solution and possible future improvements.

If the change fixes a bug or a Github issue, please include a link, e.g.,:
FIXES: b/123456
FIXES: #123456

Tests

Please describe how you tested this change, and include any instructions and/or
commands to reproduce.

Checklist

Before submitting this PR, please make sure:

I have performed a self-review of my code.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have made or will make corresponding changes to any relevant documentation.

tpu_commons/models/vllm/quantization/awq.py

vanbasten23 · 2025-09-02T23:55:36Z

tests/models/vllm/layers/test_awq.py

+    # AWQ packs 8 uint4 into 32-bits in this order.
+    awq_order = (0, 2, 4, 6, 1, 3, 5, 7)
+
+    orig_shape = weight.shape


add this test "pytest -v tests/models/vllm/layers/test_awq.py" to the CI?

This gets run in Jax unit test: https://github.com/vllm-project/tpu_commons/blob/4470371755e316bf7c3a5e5940133923d7ebc325/.buildkite/pipeline_jax.yml#L76-L88

In a (failed...) CI, I have confirmed that is the case: https://buildkite.com/tpu-commons/tpu-commons-ci/builds/2546#01990bbf-4525-4e7c-ad33-667076f9482d

Is it a better idea to add this in torch ci?: https://github.com/vllm-project/tpu_commons/blob/main/.buildkite/pipeline_torch.yml

I see. Thanks. If it's framework agnostic, then it's fine to leave it as is.

kyuyeunk

vllm-project/vllm#23902 has been merged. This PR is ready to be submitted.

tpu_commons/models/vllm/quantization/awq.py

kyuyeunk · 2025-09-03T02:33:07Z

tests/models/vllm/layers/test_awq.py

+    # AWQ packs 8 uint4 into 32-bits in this order.
+    awq_order = (0, 2, 4, 6, 1, 3, 5, 7)
+
+    orig_shape = weight.shape


This gets run in Jax unit test: https://github.com/vllm-project/tpu_commons/blob/4470371755e316bf7c3a5e5940133923d7ebc325/.buildkite/pipeline_jax.yml#L76-L88

In a (failed...) CI, I have confirmed that is the case: https://buildkite.com/tpu-commons/tpu-commons-ci/builds/2546#01990bbf-4525-4e7c-ad33-667076f9482d

Is it a better idea to add this in torch ci?: https://github.com/vllm-project/tpu_commons/blob/main/.buildkite/pipeline_torch.yml

Signed-off-by: Kyuyeun Kim <kyuyeunk@google.com>

kyuyeunk requested a review from hfan September 2, 2025 03:56

kyuyeunk force-pushed the support_awq branch 2 times, most recently from aa27141 to 2f54950 Compare September 2, 2025 13:39

hfan approved these changes Sep 2, 2025

View reviewed changes

tpu_commons/models/vllm/quantization/awq.py Outdated Show resolved Hide resolved

vanbasten23 reviewed Sep 2, 2025

View reviewed changes

kyuyeunk force-pushed the support_awq branch from 2f54950 to b72f54e Compare September 3, 2025 02:22

kyuyeunk commented Sep 3, 2025

View reviewed changes

kyuyeunk force-pushed the support_awq branch 2 times, most recently from bea8d41 to b43020f Compare September 3, 2025 03:35

[Torchax] Support Running AWQ models

cbcdc60

Signed-off-by: Kyuyeun Kim <kyuyeunk@google.com>

kyuyeunk force-pushed the support_awq branch from b43020f to cbcdc60 Compare September 3, 2025 05:05

kyuyeunk merged commit 753da7f into main Sep 3, 2025
1 of 2 checks passed

kyuyeunk deleted the support_awq branch September 3, 2025 07:30

kyuyeunk mentioned this pull request Nov 18, 2025

[Feature]: Support "fp8" Quantization Method for Torchax #1121

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Torchax] Support Running AWQ models#618

[Torchax] Support Running AWQ models#618
kyuyeunk merged 1 commit intomainfrom
support_awq

kyuyeunk commented Sep 2, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Sep 2, 2025

Uh oh!

Uh oh!

vanbasten23 Sep 2, 2025

Uh oh!

kyuyeunk Sep 3, 2025

Uh oh!

vanbasten23 Sep 3, 2025

Uh oh!

kyuyeunk left a comment

Uh oh!

Uh oh!

kyuyeunk Sep 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

kyuyeunk commented Sep 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Tests

Checklist

Uh oh!

github-actions bot commented Sep 2, 2025

Description

Tests

Checklist

Uh oh!

Uh oh!

vanbasten23 Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

kyuyeunk Sep 3, 2025

Choose a reason for hiding this comment

Uh oh!

vanbasten23 Sep 3, 2025

Choose a reason for hiding this comment

Uh oh!

kyuyeunk left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kyuyeunk Sep 3, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kyuyeunk commented Sep 2, 2025 •

edited

Loading