Conversation
DescriptionStart with a short description of what the PR does and how this is a change from The rest of the description includes relevant details and context, examples:
If the change fixes a bug or a Github issue, please include a link, e.g.,: TestsPlease describe how you tested this change, and include any instructions and/or ChecklistBefore submitting this PR, please make sure:
|
aa27141 to
2f54950
Compare
| # AWQ packs 8 uint4 into 32-bits in this order. | ||
| awq_order = (0, 2, 4, 6, 1, 3, 5, 7) | ||
|
|
||
| orig_shape = weight.shape |
There was a problem hiding this comment.
add this test "pytest -v tests/models/vllm/layers/test_awq.py" to the CI?
There was a problem hiding this comment.
This gets run in Jax unit test: https://github.com/vllm-project/tpu_commons/blob/4470371755e316bf7c3a5e5940133923d7ebc325/.buildkite/pipeline_jax.yml#L76-L88
In a (failed...) CI, I have confirmed that is the case: https://buildkite.com/tpu-commons/tpu-commons-ci/builds/2546#01990bbf-4525-4e7c-ad33-667076f9482d
Is it a better idea to add this in torch ci?: https://github.com/vllm-project/tpu_commons/blob/main/.buildkite/pipeline_torch.yml
There was a problem hiding this comment.
I see. Thanks. If it's framework agnostic, then it's fine to leave it as is.
2f54950 to
b72f54e
Compare
kyuyeunk
left a comment
There was a problem hiding this comment.
vllm-project/vllm#23902 has been merged. This PR is ready to be submitted.
| # AWQ packs 8 uint4 into 32-bits in this order. | ||
| awq_order = (0, 2, 4, 6, 1, 3, 5, 7) | ||
|
|
||
| orig_shape = weight.shape |
There was a problem hiding this comment.
This gets run in Jax unit test: https://github.com/vllm-project/tpu_commons/blob/4470371755e316bf7c3a5e5940133923d7ebc325/.buildkite/pipeline_jax.yml#L76-L88
In a (failed...) CI, I have confirmed that is the case: https://buildkite.com/tpu-commons/tpu-commons-ci/builds/2546#01990bbf-4525-4e7c-ad33-667076f9482d
Is it a better idea to add this in torch ci?: https://github.com/vllm-project/tpu_commons/blob/main/.buildkite/pipeline_torch.yml
bea8d41 to
b43020f
Compare
Signed-off-by: Kyuyeun Kim <kyuyeunk@google.com>
b43020f to
cbcdc60
Compare
Description
TPU_BACKEND_TYPE=jax MODEL_IMPL_TYPE=vllm python3 examples/offline_inference.py --model=Qwen/Qwen2.5-32B-Instruct-AWQ --tensor_parallel_size=8 --task=generate --max_model_len=1024 --download_dir=/mnt/disks/persistTests
Checklist
Before submitting this PR, please make sure: