Skip to content

[Torchax] Support Running AWQ models#618

Merged
kyuyeunk merged 1 commit intomainfrom
support_awq
Sep 3, 2025
Merged

[Torchax] Support Running AWQ models#618
kyuyeunk merged 1 commit intomainfrom
support_awq

Conversation

@kyuyeunk
Copy link
Collaborator

@kyuyeunk kyuyeunk commented Sep 2, 2025

Description

  • Following PR adds support for running loading & running AWQ models in torchax
  • Using model Qwen/Qwen2.5-32B-Instruct-AWQ, I have verified that it'ss running correctly.
    • Numeric validation
      • Command TPU_BACKEND_TYPE=jax MODEL_IMPL_TYPE=vllm python3 examples/offline_inference.py --model=Qwen/Qwen2.5-32B-Instruct-AWQ --tensor_parallel_size=8 --task=generate --max_model_len=1024 --download_dir=/mnt/disks/persist
      • Result: http://gpaste/6205721003425792
    • Performance validation (degradation over baseline is expected)
  • Limitation / Future work
    • AWQ uses w4a16 subchannel quantization, and for initial implementation, I have utilized XLA instead of kernel. However, there is known performance issue with subchannel quantization in XLA. Therefore, we need to implement w4a16 subchannel support into the kernel for optimal performance.

Tests

pytest -v tests/models/vllm/layers/test_awq.py

Checklist

Before submitting this PR, please make sure:

  • I have performed a self-review of my code.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have made or will make corresponding changes to any relevant documentation.

@kyuyeunk kyuyeunk requested a review from hfan September 2, 2025 03:56
@github-actions
Copy link

github-actions bot commented Sep 2, 2025

Description

Start with a short description of what the PR does and how this is a change from
the past.

The rest of the description includes relevant details and context, examples:

  • why is this change being made,
  • the problem being solved and any relevant context,
  • why this is a good solution,
  • some information about the specific implementation,
  • shortcomings of the solution and possible future improvements.

If the change fixes a bug or a Github issue, please include a link, e.g.,:
FIXES: b/123456
FIXES: #123456

Tests

Please describe how you tested this change, and include any instructions and/or
commands to reproduce.

Checklist

Before submitting this PR, please make sure:

  • I have performed a self-review of my code.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have made or will make corresponding changes to any relevant documentation.

@kyuyeunk kyuyeunk force-pushed the support_awq branch 2 times, most recently from aa27141 to 2f54950 Compare September 2, 2025 13:39
# AWQ packs 8 uint4 into 32-bits in this order.
awq_order = (0, 2, 4, 6, 1, 3, 5, 7)

orig_shape = weight.shape
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add this test "pytest -v tests/models/vllm/layers/test_awq.py" to the CI?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Thanks. If it's framework agnostic, then it's fine to leave it as is.

Copy link
Collaborator Author

@kyuyeunk kyuyeunk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vllm-project/vllm#23902 has been merged. This PR is ready to be submitted.

# AWQ packs 8 uint4 into 32-bits in this order.
awq_order = (0, 2, 4, 6, 1, 3, 5, 7)

orig_shape = weight.shape
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kyuyeunk kyuyeunk force-pushed the support_awq branch 2 times, most recently from bea8d41 to b43020f Compare September 3, 2025 03:35
Signed-off-by: Kyuyeun Kim <kyuyeunk@google.com>
@kyuyeunk kyuyeunk merged commit 753da7f into main Sep 3, 2025
1 of 2 checks passed
@kyuyeunk kyuyeunk deleted the support_awq branch September 3, 2025 07:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants