Skip to content

Qwen3vl accuracy fixes#884

Merged
wpyszka merged 40 commits into
vllm-project:releases/v0.14.1from
libinta:libinta/add_mask
Jan 30, 2026
Merged

Qwen3vl accuracy fixes#884
wpyszka merged 40 commits into
vllm-project:releases/v0.14.1from
libinta:libinta/add_mask

Conversation

@libinta
Copy link
Copy Markdown
Contributor

@libinta libinta commented Jan 27, 2026

for qwen3 vl, there is accuracy issue with multi-images within 1 request, this PR is to fix that. After fix, there are 3 paths for vision attention depending on the images count inside 1 request

  1. single image, use fusedsdpa without attn mask
  2. multi-images with threshold use fusedsdpa without attn_mask one by one
    This pr also enables qwen3vl moe

@github-actions
Copy link
Copy Markdown

✅ CI Passed

All checks passed successfully against the following vllm commit:
d7de043d55d1dd629554467e23874097e1c48993

@github-actions
Copy link
Copy Markdown

🚧 CI Blocked

The main CI workflow was not started for the following reason:

Your branch is behind the base branch. Please merge or rebase to get the latest changes.

@github-actions
Copy link
Copy Markdown

✅ CI Passed

All checks passed successfully against the following vllm commit:
d7de043d55d1dd629554467e23874097e1c48993

@github-actions
Copy link
Copy Markdown

🚧 CI Blocked

The main CI workflow was not started for the following reason:

Your branch is behind the base branch. Please merge or rebase to get the latest changes.

1 similar comment
@github-actions
Copy link
Copy Markdown

🚧 CI Blocked

The main CI workflow was not started for the following reason:

Your branch is behind the base branch. Please merge or rebase to get the latest changes.

libinta and others added 20 commits January 29, 2026 17:15
Signed-off-by: slokesha <slokeshappa@habana.ai>
Signed-off-by: slokesha <slokeshappa@habana.ai>
Signed-off-by: slokesha <slokeshappa@habana.ai>
Signed-off-by: slokesha <slokeshappa@habana.ai>
Signed-off-by: slokesha <slokeshappa@habana.ai>
Signed-off-by: slokesha <slokeshappa@habana.ai>
Signed-off-by: slokesha <slokeshappa@habana.ai>
Signed-off-by: slokesha <slokeshappa@habana.ai>
Signed-off-by: slokesha <slokeshappa@habana.ai>
Signed-off-by: slokesha <slokeshappa@habana.ai>
Signed-off-by: slokesha <slokeshappa@habana.ai>
Signed-off-by: slokesha <slokeshappa@habana.ai>
Signed-off-by: slokesha <slokeshappa@habana.ai>
Signed-off-by: slokesha <slokeshappa@habana.ai>
Signed-off-by: slokesha <slokeshappa@habana.ai>
Signed-off-by: slokesha <slokeshappa@habana.ai>
Signed-off-by: slokesha <slokeshappa@habana.ai>
…ct#885)

Due to MambaMixer2 implementation requirements, all buckets used for
mamba must be a multiple of mamba chunk size.

Signed-off-by: Jakub Byczkowski <jbyczkowski@habana.ai>
Signed-off-by: slokesha <slokeshappa@habana.ai>
…roject#888)

Reverts vllm-project#780

---------

Signed-off-by: Agata Dobrzyniewicz <adobrzyniewicz@habana.ai>
Co-authored-by: Chendi.Xue <chendi.xue@intel.com>
Signed-off-by: slokesha <slokeshappa@habana.ai>
1. vllm-project#805
2. vllm-project#837
3. vllm-project#855
4. vllm-project#862

---------

Signed-off-by: Radoslaw Smyrek <radoslawx.smyrek@intel.com>
Signed-off-by: linoy buchnik <lbuchnik@habana.ai>
Signed-off-by: Iryna Boiko <iboiko@habana.ai>
Signed-off-by: Artur Fierka <artur.fierka@intel.com>
Co-authored-by: Linoy Buchnik <linoybu@gmail.com>
Co-authored-by: Iryna Boiko <iboiko@habana.ai>
Co-authored-by: Artur Fierka <artur.fierka@intel.com>
Signed-off-by: slokesha <slokeshappa@habana.ai>
Signed-off-by: slokesha <slokeshappa@habana.ai>
@github-actions
Copy link
Copy Markdown

🚧 CI Blocked

The main CI workflow was not started for the following reason:

Your branch is behind the base branch. Please merge or rebase to get the latest changes.

@github-actions
Copy link
Copy Markdown

✅ CI Passed

All checks passed successfully against the following vllm commit:
d7de043d55d1dd629554467e23874097e1c48993

Copy link
Copy Markdown
Collaborator

@wpyszka wpyszka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved for 0.14.1

@wpyszka wpyszka merged commit 20703dd into vllm-project:releases/v0.14.1 Jan 30, 2026
53 checks passed
mgawarkiewicz-intel pushed a commit that referenced this pull request Feb 2, 2026
* Prevent cu_seqlens/mask mix-ups that can trigger performance
regressions or incorrect attention behavior.
* Remove the lens = (cu_seqlens[1:] - cu_seqlens[:-1]).tolist()
computation from the Qwen2.5 path.

This calculation is not required for Qwen2.5 and was causing a
performance regression after PR
#884. Removing it
restores the previous performance without changing model behavior.
slokesha added a commit to slokesha/vllm-gaudi that referenced this pull request Feb 2, 2026
for qwen3 vl, there is accuracy issue with multi-images within 1
request, this PR is to fix that. After fix, there are 3 paths for vision
attention depending on the images count inside 1 request
1. single image, use fusedsdpa without attn mask
3. multi-images with threshold use fusedsdpa without attn_mask one by
one
This pr also enables qwen3vl moe

---------

Signed-off-by: slokesha <slokeshappa@habana.ai>
Signed-off-by: Jakub Byczkowski <jbyczkowski@habana.ai>
Signed-off-by: Agata Dobrzyniewicz <adobrzyniewicz@habana.ai>
Signed-off-by: Radoslaw Smyrek <radoslawx.smyrek@intel.com>
Signed-off-by: linoy buchnik <lbuchnik@habana.ai>
Signed-off-by: Iryna Boiko <iboiko@habana.ai>
Signed-off-by: Artur Fierka <artur.fierka@intel.com>
Signed-off-by: Luca Calabria <luca.calabria@intel.com>
Co-authored-by: Seunghyuk Park <separk@habana.ai>
Co-authored-by: Jakub Byczkowski <jbyczkowski@habana.ai>
Co-authored-by: Agata Dobrzyniewicz <160237065+adobrzyn@users.noreply.github.com>
Co-authored-by: Chendi.Xue <chendi.xue@intel.com>
Co-authored-by: Radosław Smyrek <radoslawx.smyrek@intel.com>
Co-authored-by: Linoy Buchnik <linoybu@gmail.com>
Co-authored-by: Iryna Boiko <iboiko@habana.ai>
Co-authored-by: Artur Fierka <artur.fierka@intel.com>
Co-authored-by: Luca Calabria <luca.calabria@intel.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: slokesha <slokeshappa@habana.ai>
Co-authored-by: Seunghyuk Park (shepark) <seunghyuk.h.park@intel.com>
Co-authored-by: Michał Kuligowski <michal.kuligowski@intel.com>
Co-authored-by: Katarzyna Fojcik <kfojcik@habana.ai>
Co-authored-by: Krzysztof Smusz <ksmusz@habana.ai>
Co-authored-by: Jozef Mamza <jmamzax@habana.ai>
slokesha added a commit to slokesha/vllm-gaudi that referenced this pull request Feb 2, 2026
for qwen3 vl, there is accuracy issue with multi-images within 1
request, this PR is to fix that. After fix, there are 3 paths for vision
attention depending on the images count inside 1 request
1. single image, use fusedsdpa without attn mask
3. multi-images with threshold use fusedsdpa without attn_mask one by
one
This pr also enables qwen3vl moe

---------

Signed-off-by: slokesha <slokeshappa@habana.ai>
Signed-off-by: Jakub Byczkowski <jbyczkowski@habana.ai>
Signed-off-by: Agata Dobrzyniewicz <adobrzyniewicz@habana.ai>
Signed-off-by: Radoslaw Smyrek <radoslawx.smyrek@intel.com>
Signed-off-by: linoy buchnik <lbuchnik@habana.ai>
Signed-off-by: Iryna Boiko <iboiko@habana.ai>
Signed-off-by: Artur Fierka <artur.fierka@intel.com>
Signed-off-by: Luca Calabria <luca.calabria@intel.com>
Co-authored-by: Seunghyuk Park <separk@habana.ai>
Co-authored-by: Jakub Byczkowski <jbyczkowski@habana.ai>
Co-authored-by: Agata Dobrzyniewicz <160237065+adobrzyn@users.noreply.github.com>
Co-authored-by: Chendi.Xue <chendi.xue@intel.com>
Co-authored-by: Radosław Smyrek <radoslawx.smyrek@intel.com>
Co-authored-by: Linoy Buchnik <linoybu@gmail.com>
Co-authored-by: Iryna Boiko <iboiko@habana.ai>
Co-authored-by: Artur Fierka <artur.fierka@intel.com>
Co-authored-by: Luca Calabria <luca.calabria@intel.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: slokesha <slokeshappa@habana.ai>
Co-authored-by: Seunghyuk Park (shepark) <seunghyuk.h.park@intel.com>
Co-authored-by: Michał Kuligowski <michal.kuligowski@intel.com>
Co-authored-by: Katarzyna Fojcik <kfojcik@habana.ai>
Co-authored-by: Krzysztof Smusz <ksmusz@habana.ai>
Co-authored-by: Jozef Mamza <jmamzax@habana.ai>
slokesha pushed a commit to slokesha/vllm-gaudi that referenced this pull request Feb 2, 2026
* Prevent cu_seqlens/mask mix-ups that can trigger performance
regressions or incorrect attention behavior.
* Remove the lens = (cu_seqlens[1:] - cu_seqlens[:-1]).tolist()
computation from the Qwen2.5 path.

This calculation is not required for Qwen2.5 and was causing a
performance regression after PR
vllm-project#884. Removing it
restores the previous performance without changing model behavior.
slokesha added a commit to libinta/vllm-gaudi that referenced this pull request Feb 9, 2026
* Qwen3vl accuracy fixes (vllm-project#884)

for qwen3 vl, there is accuracy issue with multi-images within 1
request, this PR is to fix that. After fix, there are 3 paths for vision
attention depending on the images count inside 1 request
1. single image, use fusedsdpa without attn mask
3. multi-images with threshold use fusedsdpa without attn_mask one by
one
This pr also enables qwen3vl moe
slokesha added a commit to libinta/vllm-gaudi that referenced this pull request Feb 9, 2026
* Qwen3vl accuracy fixes (vllm-project#884)

for qwen3 vl, there is accuracy issue with multi-images within 1
request, this PR is to fix that. After fix, there are 3 paths for vision
attention depending on the images count inside 1 request
1. single image, use fusedsdpa without attn mask
3. multi-images with threshold use fusedsdpa without attn_mask one by
one
This pr also enables qwen3vl moe

Signed-off-by: slokesha <slokeshappa@habana.ai>
wpyszka pushed a commit that referenced this pull request Feb 9, 2026
Cherry pick Llama4 missing fixes from #881 #862 #884 on releases/ branch

Signed-off-by: Luca Calabria <luca.calabria@intel.com>
wpyszka added a commit that referenced this pull request Feb 9, 2026
Added Llama4 missing fixes from #881 #862 #884 on main branch

---------

Signed-off-by: Luca Calabria <luca.calabria@intel.com>
Co-authored-by: Wojciech Pyszka <wpyszka@habana.ai>
slokesha added a commit to libinta/vllm-gaudi that referenced this pull request Feb 9, 2026
* Qwen3vl accuracy fixes (vllm-project#884)

for qwen3 vl, there is accuracy issue with multi-images within 1
request, this PR is to fix that. After fix, there are 3 paths for vision
attention depending on the images count inside 1 request
1. single image, use fusedsdpa without attn mask
3. multi-images with threshold use fusedsdpa without attn_mask one by
one
This pr also enables qwen3vl moe

Signed-off-by: slokesha <slokeshappa@habana.ai>
slokesha added a commit to libinta/vllm-gaudi that referenced this pull request Feb 9, 2026
* Qwen3vl accuracy fixes (vllm-project#884)

for qwen3 vl, there is accuracy issue with multi-images within 1
request, this PR is to fix that. After fix, there are 3 paths for vision
attention depending on the images count inside 1 request
1. single image, use fusedsdpa without attn mask
3. multi-images with threshold use fusedsdpa without attn_mask one by
one
This pr also enables qwen3vl moe

Signed-off-by: slokesha <slokeshappa@habana.ai>
adobrzyn pushed a commit that referenced this pull request Mar 31, 2026
Added Llama4 missing fixes from #881 #862 #884 on main branch

---------

Signed-off-by: Luca Calabria <luca.calabria@intel.com>
Co-authored-by: Wojciech Pyszka <wpyszka@habana.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants