[Quantization] Pool model support bitsandbytes#18087
[Quantization] Pool model support bitsandbytes#18087vllm-bot merged 4 commits intovllm-project:mainfrom
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
|
Is there any existing model we can use to test this? |
We can use |
|
Can you add this test to the CI? |
|
The reason I didn't add it was due to concerns about CI pressure. If you think related tests should be added, I'll implement it ASAP. |
|
The quantization model test is conditional so it should be fine to add it |
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
|
||
| hf_model_kwargs = {"load_in_4bit": True} | ||
| hf_model_kwargs = dict(quantization_config=BitsAndBytesConfig( | ||
| load_in_4bit=True)) |
There was a problem hiding this comment.
This change is to avoid the warning below
The `load_in_4bit` and `load_in_8bit` arguments are deprecated and will be removed in the future versions. Please, pass a `BitsAndBytesConfig` object in `quantization_config` argument instead
I have added the test @DarkLight1337 |
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
It looks like we need to force merge |
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Signed-off-by: Yuqi Zhang <yuqizhang@google.com>
Test snippet