Conversation
Signed-off-by: 李少鹏 <lishaopeng21@huawei.com>
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
There was a problem hiding this comment.
Code Review
This pull request updates the qwen3vl model to support torch.Tensor as an input type for grid_thw, which was previously only a list. While the change is functionally correct, I've identified a performance issue in the implementation. The conversion from a torch.Tensor to a NumPy array and back to a tensor is inefficient and can be optimized. I've provided a suggestion to refactor this logic for better performance.
| if isinstance(grid_thw, list): | ||
| grid_thw_list = grid_thw | ||
| grid_thw = np.array(grid_thw, dtype=np.int32) | ||
| else: | ||
| grid_thw_list = grid_thw.tolist() | ||
| grid_thw = grid_thw.numpy() | ||
|
|
||
| pos_embeds = self.fast_pos_embed_interpolate(grid_thw_list) | ||
| hidden_states = hidden_states + pos_embeds | ||
| rotary_pos_emb = self.rot_pos_emb(grid_thw) | ||
| rotary_pos_emb = self.rot_pos_emb(grid_thw_list) | ||
| grid_thw_tensor = torch.tensor(grid_thw, | ||
| device=self.device, | ||
| dtype=torch.int32) |
There was a problem hiding this comment.
The current implementation for handling grid_thw when it is a torch.Tensor is inefficient. It converts the tensor to a NumPy array using .numpy() (which can cause a GPU-to-CPU data transfer) and then converts it back to a tensor using torch.tensor(). This can be optimized by handling the list and torch.Tensor cases separately to avoid unnecessary conversions.
if isinstance(grid_thw, list):
grid_thw_list = grid_thw
grid_thw_tensor = torch.tensor(grid_thw,
device=self.device,
dtype=torch.int32)
else:
grid_thw_list = grid_thw.tolist()
grid_thw_tensor = grid_thw.to(device=self.device, dtype=torch.int32)
pos_embeds = self.fast_pos_embed_interpolate(grid_thw_list)
hidden_states = hidden_states + pos_embeds
rotary_pos_emb = self.rot_pos_emb(grid_thw_list)
|
LGTM. |
Signed-off-by: 李少鹏 <lishaopeng21@huawei.com>
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
|
We won't allow to change model files any more. Please use another way or contribute to vLLM directly. |
What this PR does / why we need it?
Follow the interface changes from the upstream vllm repository ‘
[Core][MM] Add mechanism to configure multimodal fields which should stay on CPU (vllm-project/vllm#28168)
Does this PR introduce any user-facing change?
How was this patch tested?