-
Notifications
You must be signed in to change notification settings - Fork 485
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support Qwen2-VL with pytorch backend #2449
Conversation
环境:lmdeploy@9ee6abe,py311,cuda121,V100-32GB 运行 qwen2 官方提供的 Qwen2-VL-7B 报错:
|
Currently, the pytorch backend does not support loading awq models, which is still being worked on. Only support Qwen2-VL-2B-Instruct by now. |
okay 辛苦 |
@@ -128,6 +133,14 @@ def _fill_inputs(self, input_ids: torch.Tensor, position_ids: torch.Tensor, | |||
self.input_buffers['inputs_embeds'] = inputs_embeds.new_zeros( | |||
1, max_num_tokens, emb_size) | |||
self.input_buffers['inputs_embeds'][:, :num_tokens] = inputs_embeds | |||
if mrope_position_ids is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is mrope_position_ids always exist?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For qwen2-vl model, it will always exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For qwen2-vl model, it will always exist.
If there is no image input, it can be None. Just raised an error if you use lmdeploy chat
command. However, pipline worked even if there is no image input.
Hi, thanks for the exciting work you did, but I encountered this problem while using: 'Qwen2VLForConditionalGeneration' object has no attribute 'lm_head' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got following error:
cannot import name 'Qwen2VLForConditionalGeneration' from 'transformers'
However, I used the required version of transformers in config.json.
Are there any plans regarding the Turbomind Engine? |
@ldknight @AllentDan @chenzhengda |
Thanks for your reply. I have tried to reinstall transformers, but the problem still occurs. Currently I use torch==2.4.0, transformers==4.45.0.dev0, accelerate==0.34.0, and qwen-vl-utils==0.0.4. |
Hi, @ldknight, so I was able to perform inference with Qwen/Qwen2-VL-7B-Instruct model using @irexyc's git repo. So would like to share some steps I followed:
|
chatglm3 failed with transformers=4.45.0.dev0
|
It seems the latest version of transformers has changed their api. https://github.com/huggingface/transformers/blob/v4.44.2/src/transformers/tokenization_utils_base.py#L3526 |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As for transformers version, shall we add restriction inside qwen2-vl codes for users?
may also update README, support_models.md |
Hi, I see that the list of supported models does not include Qwen2-VL-72B (https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct). Is it being added too? |
+1 |
I compared the config of Qwen2-VL-72B with Qwen2-VL-7B, only some layers dim input/output dimensions are different. So I think the current code shoud support 72B model. I'll check it, but it will take some time to download the model. |
Hi, @irexyc just to confirm, the current code indeed supports 72B param model, I am able to perform inference with this Model (following same steps mentioned here) |
An issue though, I am encountering, where offline inference works fine, but during online inference using the lmdeploy server (api_server), I consistently run into a CUDA out-of-memory error, even though the model is distributed across 8 H100 GPUs. |
What You can use |
@PiyushSawarkar Hello, I tried to deploy Qwen2-VL 7B on four 4090 GPUs (24GB) using the code below, but it failed. |
When you using pytorch backend with tp > 1, you have to put your code in
|
@irexyc I can successfully run |
Motivation
Support https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct with pytorch backend
Currently, it only support image input and video input support should wait after refactoring of vision model.
#2436
#2415
#2411