[Hardware][Power] Enable compressed tensor W8A8 INT8 quantization for POWER#17153
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
49f09d6 to
c93da8a
Compare
Signed-off-by: Akash Kaothalkar <akash.kaothalkar@ibm.com>
Signed-off-by: Akash Kaothalkar <akash.kaothalkar@ibm.com>
c93da8a to
f89176e
Compare
|
Can you make a test for this or at least confirm you've manually tested it? Specifically a kernel test would be great |
|
Hi @mgoin,
I’ll plan to add kernel-level or architecture-specific tests as a follow-up. Logs from model tests on POWERThe output of `python collect_env.py` |
|
Hi @mgoin @DarkLight1337, |
|
Hi @mgoin , Thanks for approving the changes. |
… POWER (vllm-project#17153) Signed-off-by: Akash Kaothalkar <akash.kaothalkar@ibm.com> Co-authored-by: Akash Kaothalkar <akash.kaothalkar@ibm.com> Co-authored-by: mgoin <mgoin64@gmail.com> Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
… POWER (vllm-project#17153) Signed-off-by: Akash Kaothalkar <akash.kaothalkar@ibm.com> Co-authored-by: Akash Kaothalkar <akash.kaothalkar@ibm.com> Co-authored-by: mgoin <mgoin64@gmail.com> Signed-off-by: Mu Huai <tianbowen.tbw@antgroup.com>
… POWER (vllm-project#17153) Signed-off-by: Akash Kaothalkar <akash.kaothalkar@ibm.com> Co-authored-by: Akash Kaothalkar <akash.kaothalkar@ibm.com> Co-authored-by: mgoin <mgoin64@gmail.com>
… POWER (vllm-project#17153) Signed-off-by: Akash Kaothalkar <akash.kaothalkar@ibm.com> Co-authored-by: Akash Kaothalkar <akash.kaothalkar@ibm.com> Co-authored-by: mgoin <mgoin64@gmail.com> Signed-off-by: Yuqi Zhang <yuqizhang@google.com>
This PR adds support for compressed tensor W8A8 INT8 quantization on POWER architecture using oneDNN.
Key changes include: