Skip to content

Slokesha/update qwen from v0.14.1#5

Merged
slokesha merged 8 commits into
libinta:libinta/remove_gather_scatterfrom
slokesha:slokesha/Update_qwen_from_v0.14.1
Feb 9, 2026
Merged

Slokesha/update qwen from v0.14.1#5
slokesha merged 8 commits into
libinta:libinta/remove_gather_scatterfrom
slokesha:slokesha/Update_qwen_from_v0.14.1

Conversation

@slokesha
Copy link
Copy Markdown
Collaborator

@slokesha slokesha commented Feb 2, 2026

No description provided.

libinta and others added 4 commits February 2, 2026 22:31
for qwen3 vl, there is accuracy issue with multi-images within 1
request, this PR is to fix that. After fix, there are 3 paths for vision
attention depending on the images count inside 1 request
1. single image, use fusedsdpa without attn mask
3. multi-images with threshold use fusedsdpa without attn_mask one by
one
This pr also enables qwen3vl moe

---------

Signed-off-by: slokesha <slokeshappa@habana.ai>
Signed-off-by: Jakub Byczkowski <jbyczkowski@habana.ai>
Signed-off-by: Agata Dobrzyniewicz <adobrzyniewicz@habana.ai>
Signed-off-by: Radoslaw Smyrek <radoslawx.smyrek@intel.com>
Signed-off-by: linoy buchnik <lbuchnik@habana.ai>
Signed-off-by: Iryna Boiko <iboiko@habana.ai>
Signed-off-by: Artur Fierka <artur.fierka@intel.com>
Signed-off-by: Luca Calabria <luca.calabria@intel.com>
Co-authored-by: Seunghyuk Park <separk@habana.ai>
Co-authored-by: Jakub Byczkowski <jbyczkowski@habana.ai>
Co-authored-by: Agata Dobrzyniewicz <160237065+adobrzyn@users.noreply.github.com>
Co-authored-by: Chendi.Xue <chendi.xue@intel.com>
Co-authored-by: Radosław Smyrek <radoslawx.smyrek@intel.com>
Co-authored-by: Linoy Buchnik <linoybu@gmail.com>
Co-authored-by: Iryna Boiko <iboiko@habana.ai>
Co-authored-by: Artur Fierka <artur.fierka@intel.com>
Co-authored-by: Luca Calabria <luca.calabria@intel.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: slokesha <slokeshappa@habana.ai>
Co-authored-by: Seunghyuk Park (shepark) <seunghyuk.h.park@intel.com>
Co-authored-by: Michał Kuligowski <michal.kuligowski@intel.com>
Co-authored-by: Katarzyna Fojcik <kfojcik@habana.ai>
Co-authored-by: Krzysztof Smusz <ksmusz@habana.ai>
Co-authored-by: Jozef Mamza <jmamzax@habana.ai>
Signed-off-by: slokesha <spurthi.lokeshappa@intel.com>
* Prevent cu_seqlens/mask mix-ups that can trigger performance
regressions or incorrect attention behavior.
* Remove the lens = (cu_seqlens[1:] - cu_seqlens[:-1]).tolist()
computation from the Qwen2.5 path.

This calculation is not required for Qwen2.5 and was causing a
performance regression after PR
vllm-project#884. Removing it
restores the previous performance without changing model behavior.
Signed-off-by: slokesha <spurthi.lokeshappa@intel.com>
@slokesha slokesha marked this pull request as ready for review February 3, 2026 18:44
@slokesha slokesha merged commit 816ac11 into libinta:libinta/remove_gather_scatter Feb 9, 2026
slokesha added a commit that referenced this pull request Feb 9, 2026
* Qwen3vl accuracy fixes (vllm-project#884)

for qwen3 vl, there is accuracy issue with multi-images within 1
request, this PR is to fix that. After fix, there are 3 paths for vision
attention depending on the images count inside 1 request
1. single image, use fusedsdpa without attn mask
3. multi-images with threshold use fusedsdpa without attn_mask one by
one
This pr also enables qwen3vl moe

Signed-off-by: slokesha <slokeshappa@habana.ai>
slokesha added a commit that referenced this pull request Feb 9, 2026
* Qwen3vl accuracy fixes (vllm-project#884)

for qwen3 vl, there is accuracy issue with multi-images within 1
request, this PR is to fix that. After fix, there are 3 paths for vision
attention depending on the images count inside 1 request
1. single image, use fusedsdpa without attn mask
3. multi-images with threshold use fusedsdpa without attn_mask one by
one
This pr also enables qwen3vl moe

Signed-off-by: slokesha <slokeshappa@habana.ai>
slokesha added a commit that referenced this pull request Feb 9, 2026
* Qwen3vl accuracy fixes (vllm-project#884)

for qwen3 vl, there is accuracy issue with multi-images within 1
request, this PR is to fix that. After fix, there are 3 paths for vision
attention depending on the images count inside 1 request
1. single image, use fusedsdpa without attn mask
3. multi-images with threshold use fusedsdpa without attn_mask one by
one
This pr also enables qwen3vl moe

Signed-off-by: slokesha <slokeshappa@habana.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants