Slokesha/update qwen from v0.14.1 by slokesha · Pull Request #5 · libinta/vllm-gaudi

slokesha · 2026-02-02T22:42:22Z

No description provided.

for qwen3 vl, there is accuracy issue with multi-images within 1 request, this PR is to fix that. After fix, there are 3 paths for vision attention depending on the images count inside 1 request 1. single image, use fusedsdpa without attn mask 3. multi-images with threshold use fusedsdpa without attn_mask one by one This pr also enables qwen3vl moe --------- Signed-off-by: slokesha <slokeshappa@habana.ai> Signed-off-by: Jakub Byczkowski <jbyczkowski@habana.ai> Signed-off-by: Agata Dobrzyniewicz <adobrzyniewicz@habana.ai> Signed-off-by: Radoslaw Smyrek <radoslawx.smyrek@intel.com> Signed-off-by: linoy buchnik <lbuchnik@habana.ai> Signed-off-by: Iryna Boiko <iboiko@habana.ai> Signed-off-by: Artur Fierka <artur.fierka@intel.com> Signed-off-by: Luca Calabria <luca.calabria@intel.com> Co-authored-by: Seunghyuk Park <separk@habana.ai> Co-authored-by: Jakub Byczkowski <jbyczkowski@habana.ai> Co-authored-by: Agata Dobrzyniewicz <160237065+adobrzyn@users.noreply.github.com> Co-authored-by: Chendi.Xue <chendi.xue@intel.com> Co-authored-by: Radosław Smyrek <radoslawx.smyrek@intel.com> Co-authored-by: Linoy Buchnik <linoybu@gmail.com> Co-authored-by: Iryna Boiko <iboiko@habana.ai> Co-authored-by: Artur Fierka <artur.fierka@intel.com> Co-authored-by: Luca Calabria <luca.calabria@intel.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: slokesha <slokeshappa@habana.ai> Co-authored-by: Seunghyuk Park (shepark) <seunghyuk.h.park@intel.com> Co-authored-by: Michał Kuligowski <michal.kuligowski@intel.com> Co-authored-by: Katarzyna Fojcik <kfojcik@habana.ai> Co-authored-by: Krzysztof Smusz <ksmusz@habana.ai> Co-authored-by: Jozef Mamza <jmamzax@habana.ai>

Signed-off-by: slokesha <spurthi.lokeshappa@intel.com>

* Prevent cu_seqlens/mask mix-ups that can trigger performance regressions or incorrect attention behavior. * Remove the lens = (cu_seqlens[1:] - cu_seqlens[:-1]).tolist() computation from the Qwen2.5 path. This calculation is not required for Qwen2.5 and was causing a performance regression after PR vllm-project#884. Removing it restores the previous performance without changing model behavior.

Signed-off-by: slokesha <spurthi.lokeshappa@intel.com>

…n_from_v0.14.1

* Qwen3vl accuracy fixes (vllm-project#884) for qwen3 vl, there is accuracy issue with multi-images within 1 request, this PR is to fix that. After fix, there are 3 paths for vision attention depending on the images count inside 1 request 1. single image, use fusedsdpa without attn mask 3. multi-images with threshold use fusedsdpa without attn_mask one by one This pr also enables qwen3vl moe Signed-off-by: slokesha <slokeshappa@habana.ai>

libinta and others added 4 commits February 2, 2026 22:31

Resolved conflict in HPU_model_runner

fb9a0f8

Signed-off-by: slokesha <spurthi.lokeshappa@intel.com>

Fixed MultiModalprofiler Import failure

daeb39b

Signed-off-by: slokesha <spurthi.lokeshappa@intel.com>

slokesha marked this pull request as ready for review February 3, 2026 18:44

slokesha added 4 commits February 3, 2026 10:46

Merge branch 'vllm-project:main' into slokesha/Update_qwen_from_v0.14.1

6376480

Merge branch 'libinta/remove_gather_scatter' into slokesha/Update_qwe…

8d92d88

…n_from_v0.14.1

Merge branch 'vllm-project:main' into slokesha/Update_qwen_from_v0.14.1

78168e6

Merge branch 'libinta/remove_gather_scatter' into slokesha/Update_qwe…

29aa3d3

…n_from_v0.14.1

slokesha merged commit 816ac11 into libinta:libinta/remove_gather_scatter Feb 9, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slokesha/update qwen from v0.14.1#5

Slokesha/update qwen from v0.14.1#5
slokesha merged 8 commits into
libinta:libinta/remove_gather_scatterfrom
slokesha:slokesha/Update_qwen_from_v0.14.1

slokesha commented Feb 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

slokesha commented Feb 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants