Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shape mismatch when parallel requesting to a vllm ray serve base on openai chat completion apis and qwen2-vl #50033

Open
javasy opened this issue Jan 23, 2025 · 0 comments

Comments

@javasy
Copy link

javasy commented Jan 23, 2025

Parallel requests to a ray serve ‘OpenAI Chat Completions API’ based on this instruction: Serve a Large Language Model with vLLM — Ray 2.41.0

The model is qwen2-vl, and the request contains both text and image prompts.

It is normal when call one request at one time, but get error when parallel requesting with ‘max_ongoing_requests >= 2’.

The error stack shows below:

ERROR 2025-01-23 00:22:21,963 vl_VLLMDeployment 4gtvteb2 e1d433cc-e551-4e5e-b10e-986dea9fe1ad /v1/chat/completions llm.py:128 - Error in generate()
Traceback (most recent call last):
File “/home/ray/anaconda3/lib/python3.9/site-packages/vllm/worker/model_runner_base.py”, line 116, in _wrapper
return func(*args, **kwargs)
File “/home/ray/anaconda3/lib/python3.9/site-packages/vllm/worker/model_runner.py”, line 1654, in execute_model
hidden_or_intermediate_states = model_executable(
File “/home/ray/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py”, line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File “/home/ray/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py”, line 1747, in _call_impl
return forward_call(*args, **kwargs)
File “/home/ray/anaconda3/lib/python3.9/site-packages/vllm/model_executor/models/qwen2_vl.py”, line 1287, in forward
inputs_embeds = self._merge_multimodal_embeddings(
File “/home/ray/anaconda3/lib/python3.9/site-packages/vllm/model_executor/models/qwen2_vl.py”, line 1237, in _merge_multimodal_embeddings
inputs_embeds[mask, :] = multimodal_embeddings
RuntimeError: shape mismatch: value tensor of shape [644, 3584] cannot be broadcast to indexing result of shape [322, 3584]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant