[Bugfix] Correctly handle the output shape in multimodal attention by Potabk · Pull Request #5443 · vllm-project/vllm-ascend

Potabk · 2025-12-27T10:06:16Z

What this PR does / why we need it?

Fix #5297, for AscendMMEncoderAttention forward, we should keep the output shape consistence with the input

Does this PR introduce any user-facing change?

How was this patch tested?

vLLM version: release/v0.13.0
vLLM main: vllm-project/vllm@81786c8

Signed-off-by: wangli <wangli858794774@gmail.com>

gemini-code-assist

Code Review

This pull request corrects the output shape handling in AscendMMEncoderAttention to be consistent with the input query's dimensions, which fixes a bug for certain multimodal models. The logic seems correct. However, I've raised a critical concern about the removal of .contiguous() which could potentially break callers expecting a contiguous tensor. Please see the detailed comment.

Signed-off-by: wangli <wangli858794774@gmail.com>

github-actions · 2025-12-27T10:27:35Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

…to eplb_refactor * 'main' of https://github.com/vllm-project/vllm-ascend: (46 commits) [Feature] Support to use fullgraph with eagle (vllm-project#5118) [EPLB][refactor] Modification of the initialization logic for expert_map and log2phy（depend on pr5285） (vllm-project#5311) [Refactor]6/N Extract common code of class AscendMLAImpl (vllm-project#5314) [Refactor] cache cos/sin in mla & remove parameter model in builder. (vllm-project#5277) update vllm pin to 12.27 (vllm-project#5412) [ReleaseNote] Add release note for v0.13.0rc1 (vllm-project#5334) [Bugfix] Correctly handle the output shape in multimodal attention (vllm-project#5443) Fix nightly (vllm-project#5413) [bugfix] fix typo of _skip_all_reduce_across_dp_group (vllm-project#5435) [Doc]modify pcp tutorial doc (vllm-project#5440) [Misc] fast fail for exiting if tools/install_flash_infer_attention_score_ops_a2.sh (vllm-project#5422) [Doc] Update DeepSeek V3.1/R1 2P1D doc (vllm-project#5387) [DOC]Fix model weight download links (vllm-project#5436) [Doc] Modify DeepSeek-R1/V3.1 documentation (vllm-project#5426) Revert "[feat] enable hierarchical mc2 ops on A2 by default (vllm-project#5300)" (vllm-project#5434) [Bugfix] fix greedy temperature detection (vllm-project#5417) [doc] Update Qwen3-235B doc for reproducing latest performance (vllm-project#5323) [feat] enable hierarchical mc2 ops on A2 by default (vllm-project#5300) [Doc] delete environment variable HCCL_OP_EXPANSION_MODE in DeepSeekV3.1/R1 (vllm-project#5419) [Doc] add long_sequence feature user guide (vllm-project#5343) ...

…llm-project#5443) ### What this PR does / why we need it? Fix vllm-project#5297, for `AscendMMEncoderAttention` forward, we should keep the output shape consistence with the input - vLLM version: release/v0.13.0 - vLLM main: vllm-project/vllm@81786c8 --------- Signed-off-by: wangli <wangli858794774@gmail.com> Signed-off-by: Che Ruan <cr623@ic.ac.uk>

…llm-project#5443) ### What this PR does / why we need it? Fix vllm-project#5297, for `AscendMMEncoderAttention` forward, we should keep the output shape consistence with the input - vLLM version: release/v0.13.0 - vLLM main: vllm-project/vllm@81786c8 --------- Signed-off-by: wangli <wangli858794774@gmail.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

…llm-project#5443) ### What this PR does / why we need it? Fix vllm-project#5297, for `AscendMMEncoderAttention` forward, we should keep the output shape consistence with the input - vLLM version: release/v0.13.0 - vLLM main: vllm-project/vllm@81786c8 --------- Signed-off-by: wangli <wangli858794774@gmail.com>

…llm-project#5443) ### What this PR does / why we need it? Fix vllm-project#5297, for `AscendMMEncoderAttention` forward, we should keep the output shape consistence with the input - vLLM version: release/v0.13.0 - vLLM main: vllm-project/vllm@81786c8 --------- Signed-off-by: wangli <wangli858794774@gmail.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

Potabk added 2 commits December 27, 2025 18:01

reshape output as input

1fac069

Signed-off-by: wangli <wangli858794774@gmail.com>

recover hunyuan tests

c544f8b

Signed-off-by: wangli <wangli858794774@gmail.com>

This was referenced Dec 27, 2025

[Model] Fix hunyuan-vl shape mismatch vllm-project/vllm#31403

Closed

[Bug]: Tencent-Hunyuan/HunyuanOCR model execute failed with linear op input shape wrong #5297

Closed

gemini-code-assist bot reviewed Dec 27, 2025

View reviewed changes

Comment thread vllm_ascend/ops/mm_encoder_attention.py Outdated

Potabk requested a review from shen-shanshan December 27, 2025 10:09

use contiguous

3d42c65

Signed-off-by: wangli <wangli858794774@gmail.com>

wangxiyuan approved these changes Dec 27, 2025

View reviewed changes

github-actions bot added module:tests module:ops labels Dec 27, 2025

wangxiyuan merged commit 58adf7c into vllm-project:main Dec 27, 2025
18 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bugfix] Correctly handle the output shape in multimodal attention#5443

[Bugfix] Correctly handle the output shape in multimodal attention#5443
wangxiyuan merged 3 commits intovllm-project:mainfrom
Potabk:fix_hunyuan

Potabk commented Dec 27, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

github-actions bot commented Dec 27, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Potabk commented Dec 27, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

github-actions bot commented Dec 27, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Potabk commented Dec 27, 2025 •

edited by github-actions bot

Loading