Skip to content

[Frontend][4/n] Improve pooling entrypoints | pooling.#39153

Merged
noooop merged 29 commits intovllm-project:mainfrom
noooop:Refactor_pooling_api
Apr 9, 2026
Merged

[Frontend][4/n] Improve pooling entrypoints | pooling.#39153
noooop merged 29 commits intovllm-project:mainfrom
noooop:Refactor_pooling_api

Conversation

@noooop
Copy link
Copy Markdown
Collaborator

@noooop noooop commented Apr 7, 2026

Purpose

Improve pooling entrypoints

Test Plan

keep ci green

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the pooling IO processor architecture by introducing task-specific processors and a plugin-based system, moving logic out of the main LLM engine. Key changes include the addition of token-level classification and embedding tasks, and the delegation of pre/post-processing to specialized classes. Feedback highlights several critical bugs, including a missing parameter in the offline context and an incorrect attribute access on a sequence object. Further improvements are needed to remove duplicate code blocks, fix malformed function signatures, and provide more descriptive error messages for better user feedback.

Comment thread vllm/entrypoints/llm.py Outdated
Comment thread vllm/entrypoints/pooling/pooling/io_processor.py
Comment thread vllm/entrypoints/llm.py Outdated
Comment thread vllm/entrypoints/pooling/base/io_processor.py Outdated
Comment thread vllm/entrypoints/pooling/io_processor_factories.py Outdated
Comment thread vllm/plugins/io_processors/__init__.py Outdated
noooop added 12 commits April 7, 2026 13:54
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
@noooop noooop marked this pull request as ready for review April 7, 2026 10:41
@noooop noooop added the ready ONLY add when PR is ready to merge/full CI is needed label Apr 7, 2026
@noooop noooop changed the title [Frontend][4/n] Improve pooling entrypoints | pooling. [WIP][Frontend][4/n] Improve pooling entrypoints | pooling. Apr 7, 2026
Comment thread vllm/entrypoints/pooling/base/serving.py
Comment thread vllm/entrypoints/pooling/embed/serving.py Outdated
Comment thread vllm/entrypoints/pooling/pooling/serving.py Outdated
Comment thread vllm/entrypoints/pooling/typing.py Outdated
Comment thread vllm/engine/protocol.py
@noooop noooop mentioned this pull request Apr 8, 2026
5 tasks
@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Apr 9, 2026

Hi @noooop, the pre-commit checks have failed. Please run:

uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?
mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
@noooop
Copy link
Copy Markdown
Collaborator Author

noooop commented Apr 9, 2026

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the pooling and IO processor architecture to improve modularity and consistency between offline and online entrypoints. Key changes include the introduction of specialized PoolingIOProcessor implementations and the removal of direct io_processor dependencies from core serving classes. The review feedback highlights several critical issues: the use of assert for public API input validation which could lead to regressions, a logic bug in the plugin processor that breaks deprecated compatibility by unconditionally overwriting responses, and a potential KeyError when handling unsupported pooling tasks in the serving layer.

Comment thread vllm/entrypoints/llm.py
Comment thread vllm/entrypoints/pooling/pooling/io_processor.py Outdated
Comment thread vllm/entrypoints/pooling/pooling/serving.py Outdated
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: wang.yuqi <noooop@126.com>
@noooop noooop removed the ready ONLY add when PR is ready to merge/full CI is needed label Apr 9, 2026
noooop added 5 commits April 9, 2026 16:03
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
@noooop
Copy link
Copy Markdown
Collaborator Author

noooop commented Apr 9, 2026

/gemini review

@noooop noooop added the ready ONLY add when PR is ready to merge/full CI is needed label Apr 9, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request significantly refactors the handling of IO processors and pooling tasks within the vLLM serving infrastructure. Key changes include centralizing IO processor management within a new PoolingServing class hierarchy, removing direct io_processor attributes from core engine and rendering components, and introducing specialized PoolingIOProcessor implementations for different tasks, including a new 'plugin' task. The review highlights several critical issues: a regression in the offline encode API's inference of the 'plugin' task for {"data": ...} prompts, the bypassing of error checks for missing IOProcessor plugins due to a dummy class registration, and the presence of assertions that should be replaced with explicit ValueErrors for improved error handling in production.

Comment thread vllm/entrypoints/llm.py
Comment thread vllm/entrypoints/pooling/io_processor_factories.py
Comment thread vllm/entrypoints/pooling/pooling/io_processor.py
Comment thread vllm/entrypoints/pooling/pooling/io_processor.py
Comment thread vllm/entrypoints/pooling/pooling/io_processor.py
@noooop noooop merged commit 66c079a into vllm-project:main Apr 9, 2026
62 checks passed
@noooop noooop deleted the Refactor_pooling_api branch April 9, 2026 10:17
jefp added a commit to jefp/vllm that referenced this pull request Apr 9, 2026
- Replace _get_offline_token_limits with _params_to_single + _get_token_limits
  (compatible with upcoming _params_to_seq from vllm-project#39153)
- Remove duplicate validation from base/serving.py (now only in io_processor)
- Validate negative values (!=0 check instead of >0)
- Restore original comments for cross-encoder and LLM-as-reranker paths

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Jesus Federico <jefp@amazon.com>
mtparet pushed a commit to blackfuel-ai/vllm that referenced this pull request Apr 9, 2026
jefp added a commit to jefp/vllm that referenced this pull request Apr 10, 2026
_params_to_single was a bridge helper added before vllm-project#39153 landed.
Now that vllm-project#39153 is merged, ctx.pooling_params is always a single
PoolingParams in the offline path (enforced by assert). Removed
the helper and simplified _get_token_limits signature.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Jesus Federico <jefp@amazon.com>
jefp added a commit to jefp/vllm that referenced this pull request Apr 10, 2026
- Replace _get_offline_token_limits with _params_to_single + _get_token_limits
  (compatible with upcoming _params_to_seq from vllm-project#39153)
- Remove duplicate validation from base/serving.py (now only in io_processor)
- Validate negative values (!=0 check instead of >0)
- Restore original comments for cross-encoder and LLM-as-reranker paths

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Jesus Federico <jefp@amazon.com>
jefp added a commit to jefp/vllm that referenced this pull request Apr 10, 2026
_params_to_single was a bridge helper added before vllm-project#39153 landed.
Now that vllm-project#39153 is merged, ctx.pooling_params is always a single
PoolingParams in the offline path (enforced by assert). Removed
the helper and simplified _get_token_limits signature.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Jesus Federico <jefp@amazon.com>
lisp19 pushed a commit to lisp19/vllm that referenced this pull request Apr 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

frontend ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants