Skip to content

Update qwen2_5_vl attention forward#908

Merged
mgawarkiewicz-intel merged 1 commit into
vllm-project:releases/v0.14.1from
shepark:shepark/update_qwen2_5_vl_attention
Feb 2, 2026
Merged

Update qwen2_5_vl attention forward#908
mgawarkiewicz-intel merged 1 commit into
vllm-project:releases/v0.14.1from
shepark:shepark/update_qwen2_5_vl_attention

Conversation

@shepark
Copy link
Copy Markdown
Contributor

@shepark shepark commented Jan 31, 2026

  • Prevent cu_seqlens/mask mix-ups that can trigger performance regressions or incorrect attention behavior.
  • Remove the lens = (cu_seqlens[1:] - cu_seqlens[:-1]).tolist() computation from the Qwen2.5 path.

This calculation is not required for Qwen2.5 and was causing a performance regression after PR #884. Removing it restores the previous performance without changing model behavior.

@github-actions
Copy link
Copy Markdown

✅ CI Passed

All checks passed successfully against the following vllm commit:
d7de043d55d1dd629554467e23874097e1c48993

@mgawarkiewicz-intel mgawarkiewicz-intel merged commit f4a4797 into vllm-project:releases/v0.14.1 Feb 2, 2026
53 checks passed
slokesha pushed a commit to slokesha/vllm-gaudi that referenced this pull request Feb 2, 2026
* Prevent cu_seqlens/mask mix-ups that can trigger performance
regressions or incorrect attention behavior.
* Remove the lens = (cu_seqlens[1:] - cu_seqlens[:-1]).tolist()
computation from the Qwen2.5 path.

This calculation is not required for Qwen2.5 and was causing a
performance regression after PR
vllm-project#884. Removing it
restores the previous performance without changing model behavior.
@michalkuligowski
Copy link
Copy Markdown
Collaborator

Is this needed on main?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants