[Model] Fix hunyuan-vl shape mismatch by Potabk · Pull Request #31403 · vllm-project/vllm

Potabk · 2025-12-27T08:45:59Z

Purpose

After mmencoderattn, we got the out shaple like :[B, S, N, D], We should merge the multi-head before entering the O matrix.

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: wangli <wangli858794774@gmail.com>

gemini-code-assist

Code Review

This pull request addresses a shape mismatch in HunYuanVisionAttention by correctly reshaping the attention output before the final projection. The fix appears to be correct given the context of a 4D tensor output from the attention layer. I have one suggestion to make the implementation more maintainable and robust.

gemini-code-assist · 2025-12-27T08:47:35Z

+        out = out.view(
+            x.size(0),
+            -1,
+            self.num_attention_heads_per_partition
+            * self.hidden_size_per_attention_head,
+        )


For better maintainability and readability, you can simplify the calculation of the last dimension in the view operation. Instead of re-calculating the partitioned hidden size from the number of heads and head size, you can directly use the shape of the weight from the subsequent projection layer o_proj. This makes the code more robust to future changes as it directly references the expected input dimension of the next layer.

out = out.view( x.size(0), -1, self.o_proj.weight.shape[1], )

Isotr0py · 2025-12-27T09:16:50Z

+        out = out.view(
+            x.size(0),
+            -1,
+            self.num_attention_heads_per_partition
+            * self.hidden_size_per_attention_head,
+        )


After mmencoderattn, we got the out shaple like :[B, S, N, D], We should merge the multi-head before entering the O matrix.

Not really, we will automatically reshape output to make sure it align with q, k, v input:

vllm/vllm/attention/layers/mm_encoder_attention.py

Lines 160 to 177 in 8711b21

is_reshaped = query.dim() != 4

query, key, value = self.maybe_reshape_qkv_to_4d(

query, key, value, bsz, q_len, kv_len

)

output = vit_flash_attn_wrapper(

q=query,

k=key,

v=value,

cu_seqlens=cu_seqlens,

max_seqlen=max_seqlen,

batch_size=bsz,

is_rocm_aiter=(self.attn_backend == AttentionBackendEnum.ROCM_AITER_FA),

fa_version=self._fa_version,

)

if is_reshaped:

output = output.reshape(bsz, q_len, -1)

So I think we won't hit the shape mismatch issue here. 🤔

So, it seems the issue for ascend, I'll have a further check

Thx, Confirmed it is the issue for forward_oot vllm-project/vllm-ascend#5443

fix hunyuan

5dd1256

Signed-off-by: wangli <wangli858794774@gmail.com>

Potabk mentioned this pull request Dec 27, 2025

[Bug]: Tencent-Hunyuan/HunyuanOCR model execute failed with linear op input shape wrong vllm-project/vllm-ascend#5297

Closed

gemini-code-assist bot reviewed Dec 27, 2025

View reviewed changes

Isotr0py reviewed Dec 27, 2025

View reviewed changes

Potabk closed this Dec 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Model] Fix hunyuan-vl shape mismatch#31403

[Model] Fix hunyuan-vl shape mismatch#31403
Potabk wants to merge 1 commit intovllm-project:mainfrom
Potabk:hunyuan

Potabk commented Dec 27, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Dec 27, 2025

Uh oh!

Isotr0py Dec 27, 2025

Uh oh!

Potabk Dec 27, 2025 •

edited

Loading

Uh oh!

Potabk Dec 27, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	is_reshaped = query.dim() != 4

	query, key, value = self.maybe_reshape_qkv_to_4d(
	query, key, value, bsz, q_len, kv_len
	)

	output = vit_flash_attn_wrapper(
	q=query,
	k=key,
	v=value,
	cu_seqlens=cu_seqlens,
	max_seqlen=max_seqlen,
	batch_size=bsz,
	is_rocm_aiter=(self.attn_backend == AttentionBackendEnum.ROCM_AITER_FA),
	fa_version=self._fa_version,
	)
	if is_reshaped:
	output = output.reshape(bsz, q_len, -1)

Uh oh!

Conversation

Potabk commented Dec 27, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Dec 27, 2025

Choose a reason for hiding this comment

Uh oh!

Isotr0py Dec 27, 2025

Choose a reason for hiding this comment

Uh oh!

Potabk Dec 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Potabk Dec 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Potabk commented Dec 27, 2025 •

edited by github-actions bot

Loading

Potabk Dec 27, 2025 •

edited

Loading

Potabk Dec 27, 2025 •

edited

Loading