Conversation
Signed-off-by: wangli <wangli858794774@gmail.com>
There was a problem hiding this comment.
Code Review
This pull request addresses a potential RuntimeError in _merge_multimodal_embeddings by ensuring dtype consistency between the input embeddings and the multimodal embeddings. The change correctly casts the flattened multimodal embeddings to the dtype of the input embeddings before performing the in-place assignment. This is a robust fix that prevents crashes due to dtype mismatches. The implementation is correct and I approve of this change.
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
|
Will there be any accuracy problems without this PR? So is this a Bugfix or a Misc? BTW, typo in PR title: |
|
@ApsarasX For some models, eg: |
|
also cc @booker123456 @gcanlin @shen-shanshan |
|
Are there any issues had recorded the error without this PR? I'm not sure about the background of this PR. |
|
test locally with the script https://github.com/vllm-project/vllm/blob/main/examples/offline_inference/vision_language.py |
|
The root cause is in this |
|
LGTM. Compatible with data types is reasonable. |
| @@ -37,8 +37,9 @@ def _merge_multimodal_embeddings( | |||
| This updates ``inputs_embeds`` in place. | |||
There was a problem hiding this comment.
any plan to remove this patch?
There was a problem hiding this comment.
I think that we should request for torch_npu/CANN team to support torch.Tensor.masked_scatter_ then we can remove this patch.
There was a problem hiding this comment.
After communicating offline with the author of this patch, I learned that it was added for performance reasons. The original mask_scatter operator has no functional issues. Therefore, we may need to push for the addition of a new ascend branch upstream.
There was a problem hiding this comment.
After testing on NPU, it really doesn't have functional issues. @booker123456 is there any performance test for this patch change?
There was a problem hiding this comment.
I suggest we consider directly removing this patch to reduce the maintaining cost. It seems that it can't take much performance. @booker123456 WDYT?
There was a problem hiding this comment.
I think this patch is still necessary until torch_npu's masked_scatter_ performance catches up with index put.
|
So can we merge it? @wangxiyuan |
…to FIA_rebase * 'main' of https://github.com/vllm-project/vllm-ascend: [feature] mooncake support pcp/dcp in common conditions (vllm-project#5224) [Bugfix] Fix mm_merge (vllm-project#5249) [Main2Main] Upgrade vllm commit to 1230 (vllm-project#5495) [Feature] Refactor PCP &DCP related code (vllm-project#5214) [main][test] Refactor the mtp and eagle test case (vllm-project#5326) [smoke][bugfix] moe_init_routing_v2 active_expert_range use int type (vllm-project#5521) [2/N] Upgrade nightly doc (vllm-project#5534) [Doc] Add new contributors. (vllm-project#5537) [3/N][Nightly] Move ops tests to nightly (vllm-project#5538)
### What this PR does / why we need it? We should transfer the mm_embed to the dtype of input_embed before performing the in-place assignment - vLLM version: release/v0.13.0 - vLLM main: vllm-project/vllm@ad32e3e Signed-off-by: wangli <wangli858794774@gmail.com>
### What this PR does / why we need it? We should transfer the mm_embed to the dtype of input_embed before performing the in-place assignment - vLLM version: release/v0.13.0 - vLLM main: vllm-project/vllm@ad32e3e Signed-off-by: wangli <wangli858794774@gmail.com>
### What this PR does / why we need it? We should transfer the mm_embed to the dtype of input_embed before performing the in-place assignment - vLLM version: release/v0.13.0 - vLLM main: vllm-project/vllm@ad32e3e Signed-off-by: wangli <wangli858794774@gmail.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
### What this PR does / why we need it? We should transfer the mm_embed to the dtype of input_embed before performing the in-place assignment - vLLM version: release/v0.13.0 - vLLM main: vllm-project/vllm@ad32e3e Signed-off-by: wangli <wangli858794774@gmail.com>
### What this PR does / why we need it? We should transfer the mm_embed to the dtype of input_embed before performing the in-place assignment - vLLM version: release/v0.13.0 - vLLM main: vllm-project/vllm@ad32e3e Signed-off-by: wangli <wangli858794774@gmail.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
What this PR does / why we need it?
We should transfer the mm_embed to the dtype of input_embed before performing the in-place assignment
Does this PR introduce any user-facing change?
How was this patch tested?