[Frontend][4/n] Improve pooling entrypoints | pooling. by noooop · Pull Request #39153 · vllm-project/vllm

noooop · 2026-04-07T04:34:20Z

Purpose

Improve pooling entrypoints

Test Plan

keep ci green

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>

gemini-code-assist

Code Review

This pull request refactors the pooling IO processor architecture by introducing task-specific processors and a plugin-based system, moving logic out of the main LLM engine. Key changes include the addition of token-level classification and embedding tasks, and the delegation of pre/post-processing to specialized classes. Feedback highlights several critical bugs, including a missing parameter in the offline context and an incorrect attribute access on a sequence object. Further improvements are needed to remove duplicate code blocks, fix malformed function signatures, and provide more descriptive error messages for better user feedback.

Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>

mergify · 2026-04-09T07:31:48Z

Hi @noooop, the pre-commit checks have failed. Please run:

uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?

mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>

noooop · 2026-04-09T07:48:03Z

/gemini review

gemini-code-assist

Code Review

This pull request refactors the pooling and IO processor architecture to improve modularity and consistency between offline and online entrypoints. Key changes include the introduction of specialized PoolingIOProcessor implementations and the removal of direct io_processor dependencies from core serving classes. The review feedback highlights several critical issues: the use of assert for public API input validation which could lead to regressions, a logic bug in the plugin processor that breaks deprecated compatibility by unconditionally overwriting responses, and a potential KeyError when handling unsupported pooling tasks in the serving layer.

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: wang.yuqi <noooop@126.com>

Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>

noooop · 2026-04-09T08:33:10Z

/gemini review

gemini-code-assist

Code Review

This pull request significantly refactors the handling of IO processors and pooling tasks within the vLLM serving infrastructure. Key changes include centralizing IO processor management within a new PoolingServing class hierarchy, removing direct io_processor attributes from core engine and rendering components, and introducing specialized PoolingIOProcessor implementations for different tasks, including a new 'plugin' task. The review highlights several critical issues: a regression in the offline encode API's inference of the 'plugin' task for {"data": ...} prompts, the bypassing of error checks for missing IOProcessor plugins due to a dummy class registration, and the presence of assertions that should be replaced with explicit ValueErrors for improved error handling in production.

- Replace _get_offline_token_limits with _params_to_single + _get_token_limits (compatible with upcoming _params_to_seq from vllm-project#39153) - Remove duplicate validation from base/serving.py (now only in io_processor) - Validate negative values (!=0 check instead of >0) - Restore original comments for cross-encoder and LLM-as-reranker paths Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Jesus Federico <jefp@amazon.com>

…39153) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>

_params_to_single was a bridge helper added before vllm-project#39153 landed. Now that vllm-project#39153 is merged, ctx.pooling_params is always a single PoolingParams in the offline path (enforced by assert). Removed the helper and simplified _get_token_limits signature. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Jesus Federico <jefp@amazon.com>

- Replace _get_offline_token_limits with _params_to_single + _get_token_limits (compatible with upcoming _params_to_seq from vllm-project#39153) - Remove duplicate validation from base/serving.py (now only in io_processor) - Validate negative values (!=0 check instead of >0) - Restore original comments for cross-encoder and LLM-as-reranker paths Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Jesus Federico <jefp@amazon.com>

_params_to_single was a bridge helper added before vllm-project#39153 landed. Now that vllm-project#39153 is merged, ctx.pooling_params is always a single PoolingParams in the offline path (enforced by assert). Removed the helper and simplified _get_token_limits signature. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Jesus Federico <jefp@amazon.com>

…39153) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>

offline part

effacdd

Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>

mergify bot added frontend v1 labels Apr 7, 2026

gemini-code-assist bot reviewed Apr 7, 2026

View reviewed changes

noooop added 12 commits April 7, 2026 13:54

fix plugin offline

a566641

Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>

fix plugin offline

4844609

Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>

Merge branch 'main' into Refactor_pooling_api

5ce9ecf

online part 1

41e4e45

Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>

refine

60cef42

Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>

refine

1078300

Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>

refine

1602836

Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>

Merge branch 'main' into Refactor_pooling_api

5a449c0

mypy

7a65d08

Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>

refine

44e9f96

Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>

refine

bc0916c

Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>

Merge branch 'main' into Refactor_pooling_api

4573192

noooop marked this pull request as ready for review April 7, 2026 10:41

noooop requested review from DarkLight1337, aarnphm, chaunceyjiang, njhill and russellb as code owners April 7, 2026 10:41

noooop added the ready ONLY add when PR is ready to merge/full CI is needed label Apr 7, 2026

noooop changed the title ~~[Frontend][4/n] Improve pooling entrypoints | pooling.~~ [WIP][Frontend][4/n] Improve pooling entrypoints | pooling. Apr 7, 2026

DarkLight1337 reviewed Apr 7, 2026

View reviewed changes

Comment thread vllm/entrypoints/pooling/base/serving.py

DarkLight1337 reviewed Apr 7, 2026

View reviewed changes

Comment thread vllm/entrypoints/pooling/embed/serving.py Outdated

DarkLight1337 reviewed Apr 7, 2026

View reviewed changes

Comment thread vllm/entrypoints/pooling/pooling/serving.py Outdated

DarkLight1337 reviewed Apr 7, 2026

View reviewed changes

Comment thread vllm/entrypoints/pooling/typing.py Outdated

noooop commented Apr 7, 2026

View reviewed changes

Comment thread vllm/engine/protocol.py

noooop mentioned this pull request Apr 8, 2026

[New Model]: jinaai/jina-reranker-v3 #38800

Merged

5 tasks

mypy

124d33c

Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>

gemini-code-assist bot reviewed Apr 9, 2026

View reviewed changes

Comment thread vllm/entrypoints/llm.py

Comment thread vllm/entrypoints/pooling/pooling/io_processor.py Outdated

Comment thread vllm/entrypoints/pooling/pooling/serving.py Outdated

Update vllm/entrypoints/pooling/pooling/io_processor.py

0158255

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: wang.yuqi <noooop@126.com>

noooop removed the ready ONLY add when PR is ready to merge/full CI is needed label Apr 9, 2026

noooop added 5 commits April 9, 2026 16:03

Merge branch 'main' into Refactor_pooling_api

145b1bf

refine

a76447f

Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>

refine

03938fa

Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>

refine

3dd3326

Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>

Merge branch 'main' into Refactor_pooling_api

af511a0

noooop added the ready ONLY add when PR is ready to merge/full CI is needed label Apr 9, 2026

gemini-code-assist bot reviewed Apr 9, 2026

View reviewed changes

Comment thread vllm/entrypoints/llm.py

Comment thread vllm/entrypoints/pooling/io_processor_factories.py

Comment thread vllm/entrypoints/pooling/pooling/io_processor.py

Comment thread vllm/entrypoints/pooling/pooling/io_processor.py

Comment thread vllm/entrypoints/pooling/pooling/io_processor.py

noooop mentioned this pull request Apr 9, 2026

feat: add max_tokens_per_doc in rerank request. #38827

Merged

7 tasks

noooop merged commit 66c079a into vllm-project:main Apr 9, 2026
62 checks passed

noooop deleted the Refactor_pooling_api branch April 9, 2026 10:17

mtparet pushed a commit to blackfuel-ai/vllm that referenced this pull request Apr 9, 2026

[Frontend][4/n] Improve pooling entrypoints | pooling. (vllm-project#…

2b4f0cd

…39153) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>

panpan0000 mentioned this pull request Apr 14, 2026

Introduce De-dup/Similarity-Check in CI Workflow for PR/Issue #39695

Open

5 tasks

lisp19 pushed a commit to lisp19/vllm that referenced this pull request Apr 20, 2026

[Frontend][4/n] Improve pooling entrypoints | pooling. (vllm-project#…

cf62e61

…39153) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>

Uh oh!

Conversation

noooop commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mergify bot commented Apr 9, 2026

Uh oh!

noooop commented Apr 9, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

noooop commented Apr 9, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

noooop commented Apr 7, 2026 •

edited

Loading