[v1] Add PrefixLM support to FlexAttention backend by Isotr0py · Pull Request #27938 · vllm-project/vllm

Isotr0py · 2025-11-02T11:16:48Z

Purpose

Currently, there is no attention backend supports image-bidirectional attention in vLLM, so Gemma3 and paligemma can't generate correct outputs. And models like moondream are blocked due to missing attention backend support.
This PR adds image-bidirectional support to FlexAttention backend to fill the void.

Test Plan

pytest -s -v tests/models/multimodal/generation/test_common.py -k gemma3

Test Result

vllm (both native and transformers backend) results should have converged results with HF now.

PR results

tests/models/multimodal/generation/test_common.py::test_single_image_models[gemma3-test_case0]
  /home/mozf/develop-projects/vllm/tests/models/multimodal/generation/vlm_utils/core.py:157: UserWarning: Test0:
  hf:   "Here's what's in the center of the image:\n\nIt's a traditional Chinese gate or archway. It's red and gold, with Chinese characters written on it. It's a prominent feature of the street scene.<end_of_turn>"
  vllm: "Here's what's in the center of the image:\n\nIt's a traditional Chinese gate or archway. It's red and gold, with Chinese characters written on it. It's a prominent feature of the street scene."
    comparator(

tests/models/multimodal/generation/test_common.py::test_single_image_models[gemma3-test_case0]
  /home/mozf/develop-projects/vllm/tests/models/multimodal/generation/vlm_utils/core.py:157: UserWarning: Test1:
  hf:   'The center of the image features a vibrant Chinese-themed archway or gate. It\'s decorated with red and gold colors, traditional Chinese characters (likely meaning "Chinese Town"), and red lanterns.  There are also two white stone lion statues flanking the entrance.<end_of_turn>'
  vllm: 'The center of the image features a vibrant Chinese-themed archway or gate. It\'s decorated with red and gold colors, traditional Chinese characters (likely meaning "Chinese Town"), and red lanterns.  There are also two white stone lion statues flanking the entrance.'
    comparator(

Main branch

tests/models/multimodal/generation/test_common.py::test_single_image_models[gemma3-test_case0]
  /home/mozf/develop-projects/vllm/tests/models/multimodal/generation/vlm_utils/core.py:157: UserWarning: Test0:
  Matched tokens:       [8291, 236789, 236751, 1144, 236789, 236751, 528, 506, 3988, 529, 506, 2471, 236787, 108]
  hf:   "Here's what's in the center of the image:\n\nIt's a traditional Chinese gate or archway. It's red and gold, with Chinese characters written on it. It's a prominent feature of the street scene.<end_of_turn>"      {1509: -0.03537141531705856, 236776: -3.7853713035583496, 818: -4.53537130355835, 3810: -6.78537130355835, 236829: -8.535371780395508}
  vllm: 'Here\'s what\'s in the center of the image:\n\nA vibrant Chinese-themed archway with red and gold decorations, featuring the Chinese characters "中华" (Zhōnghuá - meaning "China"). It\'s part of a Chinatown area.'       {236776: Logprob(logprob=-0.6554893255233765, rank=1, decoded_token='A'), 818: Logprob(logprob=-1.1554893255233765, rank=2, decoded_token='The'), 1509: Logprob(logprob=-2.155489444732666, rank=3, decoded_token='It'), 236829: Logprob(logprob=-3.155489444732666, rank=4, decoded_token='*'), 3810: Logprob(logprob=-4.905489444732666, rank=5, decoded_token='There')}
    comparator(

tests/models/multimodal/generation/test_common.py::test_single_image_models[gemma3-test_case0]
  /home/mozf/develop-projects/vllm/tests/models/multimodal/generation/vlm_utils/core.py:157: UserWarning: Test1:
  Matched tokens:       []
  hf:   'The center of the image features a vibrant Chinese-themed archway or gate. It\'s decorated with red and gold colors, traditional Chinese characters (likely meaning "Chinese Town"), and red lanterns.  There are also two white stone lion statues flanking the entrance.<end_of_turn>'    {818: -0.5989671945571899, 8291: -0.8489671945571899, 117494: -4.5989670753479, 6481: -5.3489670753479, 19058: -5.5989670753479}
  vllm: 'Here\'s what\'s in the center of the image:\n\n*   **A large, ornate Chinese gate or archway.** It\'s painted red and features traditional Chinese characters ("中华" - meaning "China") and decorative elements.\n*   **Two white stone lion statues** flanking the gate.\n*   **A black SUV** is parked in front of the gate.\n\nLet me know if you want me to describe any other specific elements in the image!'     {8291: Logprob(logprob=-0.3420378267765045, rank=1, decoded_token='Here'), 818: Logprob(logprob=-1.3420377969741821, rank=2, decoded_token='The'), 117494: Logprob(logprob=-4.342037677764893, rank=3, decoded_token='Certainly'), 6481: Logprob(logprob=-4.842037677764893, rank=4, decoded_token='Let'), 19058: Logprob(logprob=-5.592037677764893, rank=5, decoded_token='Okay')}
    comparator(

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

mergify · 2025-11-21T17:03:19Z

Documentation preview: https://vllm--27938.org.readthedocs.build/en/27938/

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

DarkLight1337 · 2025-12-06T13:09:19Z

Let's get this merged then, can you fix the merge conflicts?

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

mergify · 2025-12-07T12:25:52Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @Isotr0py.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

Culprit: vllm-project/vllm#29665 and vllm-project/vllm#27938 --------- Signed-off-by: Dobrzyniewicz, Agata <agata.dobrzyniewicz@intel.com>

### What this PR does / why we need it? ### Does this PR introduce _any_ user-facing change? 1. fix vllm-project/vllm#27938 2. fix vllm-project/vllm#27145 pooling models now supports chunked prefill and prefix caching, 3. fix vllm-project/vllm#30181 define the CPU fields in the field config where they really belong. 4. fix vllm-project/vllm#28168 define the CPU fields in the field config where they really belong. 5. fix vllm-project/vllm#30201 some moudle rename 6. fix vllm-project/vllm#29067 fusedmoe moudle refactor 7. fix vllm-project/vllm#29066 fusedmoe moudle refactor 8. fix vllm-project/vllm#29624 ### How was this patch tested? - vLLM version: v0.12.0 - vLLM main: vllm-project/vllm@ad32e3e --------- Signed-off-by: wangli <wangli858794774@gmail.com>

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>

### What this PR does / why we need it? ### Does this PR introduce _any_ user-facing change? 1. fix vllm-project/vllm#27938 2. fix vllm-project/vllm#27145 pooling models now supports chunked prefill and prefix caching, 3. fix vllm-project/vllm#30181 define the CPU fields in the field config where they really belong. 4. fix vllm-project/vllm#28168 define the CPU fields in the field config where they really belong. 5. fix vllm-project/vllm#30201 some moudle rename 6. fix vllm-project/vllm#29067 fusedmoe moudle refactor 7. fix vllm-project/vllm#29066 fusedmoe moudle refactor 8. fix vllm-project/vllm#29624 ### How was this patch tested? - vLLM version: v0.12.0 - vLLM main: vllm-project/vllm@ad32e3e --------- Signed-off-by: wangli <wangli858794774@gmail.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

Isotr0py added 2 commits November 2, 2025 16:29

init

1f9f474

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

clean

342ba6c

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

mergify bot added the v1 label Nov 2, 2025

update

ad498e6

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

mergify bot added the multi-modality Related to multi-modality (#4194) label Nov 3, 2025

Isotr0py mentioned this pull request Nov 4, 2025

[Model] Add Gemma3 GGUF multimodal support #27772

Merged

4 tasks

Isotr0py and others added 9 commits November 20, 2025 15:14

Merge remote-tracking branch 'upstream/main' into flex-prefixlm

8e2de9f

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

fix

a0e0d81

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

update

32cf7dc

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

clean

bf1328c

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

move

e42d8d2

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

fix batching

311b4e4

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

Merge branch 'vllm-project:main' into flex-prefixlm

91c517f

update

d5279c9

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

auto select flex attn

3dd6aa1

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

mergify bot added nvidia rocm Related to AMD ROCm labels Nov 21, 2025

github-project-automation bot added this to NVIDIA Nov 21, 2025

mergify bot added the tpu Related to Google TPUs label Nov 21, 2025

Isotr0py added 2 commits November 22, 2025 00:28

fix ima

6adc4d4

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

update doc and test

bf6b90d

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

mergify bot added the documentation Improvements or additions to documentation label Nov 21, 2025

Isotr0py added 2 commits November 22, 2025 01:19

Merge branch 'main' into flex-prefixlm

ca1723a

code format

1df15b8

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

Isotr0py changed the title ~~[Draft][v1] Add PrefixLM support to FlexAttention backend~~ [v1] Add PrefixLM support to FlexAttention backend Nov 21, 2025

Isotr0py marked this pull request as ready for review November 21, 2025 17:29

Isotr0py requested review from NickLucche, jikunshang and tjtanaa as code owners November 21, 2025 17:29

Merge remote-tracking branch 'upstream/main' into flex-prefixlm

7788226

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

Isotr0py added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 6, 2025

Isotr0py added 2 commits December 7, 2025 00:28

fix

128d655

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

fix

6100eac

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

mergify bot added the needs-rebase label Dec 7, 2025

Merge remote-tracking branch 'upstream/main' into flex-prefixlm

080d8ad

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

mergify bot removed the needs-rebase label Dec 7, 2025

Isotr0py enabled auto-merge (squash) December 7, 2025 12:27

fix

cbdf718

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

Isotr0py merged commit b952f4d into vllm-project:main Dec 7, 2025
59 of 60 checks passed

github-project-automation bot moved this from In review to Done in NVIDIA Dec 7, 2025

Isotr0py deleted the flex-prefixlm branch December 7, 2025 15:58

penfree pushed a commit to penfree/vllm that referenced this pull request Dec 8, 2025

[v1] Add PrefixLM support to FlexAttention backend (vllm-project#27938)

f8766c0

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

Potabk mentioned this pull request Dec 8, 2025

[Misc] Upgrade vllm commit to 12_08 vllm-project/vllm-ascend#4781

Closed

adobrzyn mentioned this pull request Dec 8, 2025

[FIX_FOR_VLLM_LATEST] Fix for hourly vllm-project/vllm-gaudi#697

Merged

iboiko-habana pushed a commit to vllm-project/vllm-gaudi that referenced this pull request Dec 8, 2025

[FIX_FOR_VLLM_LATEST] Fix for hourly (#697)

de92b87

Culprit: vllm-project/vllm#29665 and vllm-project/vllm#27938 --------- Signed-off-by: Dobrzyniewicz, Agata <agata.dobrzyniewicz@intel.com>

QiliangCui mentioned this pull request Dec 8, 2025

Add an argument to TpuPlatform.get_attn_backend_cls to adopt interfac… vllm-project/tpu-inference#1263

Merged

This was referenced Dec 11, 2025

[Misc] Upgrade vllm hash to 1210 vllm-project/vllm-ascend#4906

Closed

[Misc] Upgrade vllm hash to 12_14 vllm-project/vllm-ascend#5000

Merged

hmellor mentioned this pull request Dec 17, 2025

[Bug]: Gemma3 reporting low image accuracy with v1 engine #19763

Closed

1 task

dsuhinin pushed a commit to dsuhinin/vllm that referenced this pull request Jan 21, 2026

[v1] Add PrefixLM support to FlexAttention backend (vllm-project#27938)

331a58a

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[v1] Add PrefixLM support to FlexAttention backend#27938

[v1] Add PrefixLM support to FlexAttention backend#27938
Isotr0py merged 25 commits intovllm-project:mainfrom
Isotr0py:flex-prefixlm

Isotr0py commented Nov 2, 2025 •

edited by github-actions bot

Loading

Uh oh!

mergify bot commented Nov 21, 2025

Uh oh!

DarkLight1337 commented Dec 6, 2025

Uh oh!

mergify bot commented Dec 7, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

Isotr0py commented Nov 2, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

mergify bot commented Nov 21, 2025

Uh oh!

DarkLight1337 commented Dec 6, 2025

Uh oh!

mergify bot commented Dec 7, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Isotr0py commented Nov 2, 2025 •

edited by github-actions bot

Loading