Skip to content

[Bugfix] Fix mm_merge#5249

Merged
wangxiyuan merged 1 commit intovllm-project:mainfrom
Potabk:fix_mm_merge_2
Dec 31, 2025
Merged

[Bugfix] Fix mm_merge#5249
wangxiyuan merged 1 commit intovllm-project:mainfrom
Potabk:fix_mm_merge_2

Conversation

@Potabk
Copy link
Copy Markdown
Collaborator

@Potabk Potabk commented Dec 22, 2025

What this PR does / why we need it?

We should transfer the mm_embed to the dtype of input_embed before performing the in-place assignment

Does this PR introduce any user-facing change?

How was this patch tested?

Signed-off-by: wangli <wangli858794774@gmail.com>
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a potential RuntimeError in _merge_multimodal_embeddings by ensuring dtype consistency between the input embeddings and the multimodal embeddings. The change correctly casts the flattened multimodal embeddings to the dtype of the input embeddings before performing the in-place assignment. This is a robust fix that prevents crashes due to dtype mismatches. The implementation is correct and I approve of this change.

@github-actions
Copy link
Copy Markdown
Contributor

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

@ApsarasX
Copy link
Copy Markdown
Collaborator

ApsarasX commented Dec 22, 2025

Will there be any accuracy problems without this PR?

So is this a Bugfix or a Misc?

BTW, typo in PR title: Msic -> Misc

@Potabk Potabk changed the title [Msic] Fix mm_merge [Bugfix] Fix mm_merge Dec 22, 2025
@Potabk
Copy link
Copy Markdown
Collaborator Author

Potabk commented Dec 22, 2025

@ApsarasX For some models, eg: bagel, the mm_embeds's dtype are different from input_embeds, so without this pr, there exist functional issue

@Potabk
Copy link
Copy Markdown
Collaborator Author

Potabk commented Dec 23, 2025

also cc @booker123456 @gcanlin @shen-shanshan

@shen-shanshan
Copy link
Copy Markdown
Collaborator

shen-shanshan commented Dec 23, 2025

Are there any issues had recorded the error without this PR? I'm not sure about the background of this PR.

@Potabk
Copy link
Copy Markdown
Collaborator Author

Potabk commented Dec 23, 2025

test locally with the script https://github.com/vllm-project/vllm/blob/main/examples/offline_inference/vision_language.py
python examples/offline_inference/vision_language.py --model-type bagel not work

@Potabk
Copy link
Copy Markdown
Collaborator Author

Potabk commented Dec 23, 2025

The root cause is in this bagel modeling implementation, the dtype of mm_embeds and input_embed not match

Copy link
Copy Markdown
Collaborator

@gcanlin gcanlin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM if CI can pass, vllm also has this dtype convert.

@booker123456
Copy link
Copy Markdown
Contributor

LGTM. Compatible with data types is reasonable.

@Potabk Potabk requested a review from wangxiyuan December 23, 2025 03:51
@@ -37,8 +37,9 @@ def _merge_multimodal_embeddings(
This updates ``inputs_embeds`` in place.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any plan to remove this patch?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we should request for torch_npu/CANN team to support torch.Tensor.masked_scatter_ then we can remove this patch.

Copy link
Copy Markdown
Collaborator Author

@Potabk Potabk Dec 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After communicating offline with the author of this patch, I learned that it was added for performance reasons. The original mask_scatter operator has no functional issues. Therefore, we may need to push for the addition of a new ascend branch upstream.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After testing on NPU, it really doesn't have functional issues. @booker123456 is there any performance test for this patch change?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest we consider directly removing this patch to reduce the maintaining cost. It seems that it can't take much performance. @booker123456 WDYT?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this patch is still necessary until torch_npu's masked_scatter_ performance catches up with index put.

@Potabk
Copy link
Copy Markdown
Collaborator Author

Potabk commented Dec 31, 2025

So can we merge it? @wangxiyuan

@wangxiyuan wangxiyuan merged commit a5ae07a into vllm-project:main Dec 31, 2025
15 checks passed
@Potabk Potabk deleted the fix_mm_merge_2 branch December 31, 2025 02:10
845473182 pushed a commit to 845473182/vllm-ascend that referenced this pull request Dec 31, 2025
…to FIA_rebase

* 'main' of https://github.com/vllm-project/vllm-ascend:
  [feature] mooncake support pcp/dcp in common conditions (vllm-project#5224)
  [Bugfix] Fix mm_merge (vllm-project#5249)
  [Main2Main] Upgrade vllm commit to 1230 (vllm-project#5495)
  [Feature] Refactor PCP &DCP related code (vllm-project#5214)
  [main][test] Refactor the mtp and eagle test case (vllm-project#5326)
  [smoke][bugfix] moe_init_routing_v2 active_expert_range use int type (vllm-project#5521)
  [2/N] Upgrade nightly doc (vllm-project#5534)
  [Doc] Add new contributors. (vllm-project#5537)
  [3/N][Nightly] Move ops tests to nightly (vllm-project#5538)
wangyibo1005 pushed a commit to wangyibo1005/vllm-ascend that referenced this pull request Dec 31, 2025
### What this PR does / why we need it?
We should transfer the mm_embed to the dtype of input_embed before
performing the in-place assignment

- vLLM version: release/v0.13.0
- vLLM main:
vllm-project/vllm@ad32e3e

Signed-off-by: wangli <wangli858794774@gmail.com>
Rozwel-dx pushed a commit to Rozwel-dx/vllm-ascend that referenced this pull request Jan 8, 2026
### What this PR does / why we need it?
We should transfer the mm_embed to the dtype of input_embed before
performing the in-place assignment

- vLLM version: release/v0.13.0
- vLLM main:
vllm-project/vllm@ad32e3e

Signed-off-by: wangli <wangli858794774@gmail.com>
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Feb 28, 2026
### What this PR does / why we need it?
We should transfer the mm_embed to the dtype of input_embed before
performing the in-place assignment

- vLLM version: release/v0.13.0
- vLLM main:
vllm-project/vllm@ad32e3e

Signed-off-by: wangli <wangli858794774@gmail.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
maoxx241 pushed a commit to maoxx241/vllm-ascend that referenced this pull request Mar 2, 2026
### What this PR does / why we need it?
We should transfer the mm_embed to the dtype of input_embed before
performing the in-place assignment

- vLLM version: release/v0.13.0
- vLLM main:
vllm-project/vllm@ad32e3e

Signed-off-by: wangli <wangli858794774@gmail.com>
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Mar 4, 2026
### What this PR does / why we need it?
We should transfer the mm_embed to the dtype of input_embed before
performing the in-place assignment

- vLLM version: release/v0.13.0
- vLLM main:
vllm-project/vllm@ad32e3e

Signed-off-by: wangli <wangli858794774@gmail.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants