[Frontend][4/n] Improve pooling entrypoints | pooling.#39153
[Frontend][4/n] Improve pooling entrypoints | pooling.#39153noooop merged 29 commits intovllm-project:mainfrom
Conversation
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
There was a problem hiding this comment.
Code Review
This pull request refactors the pooling IO processor architecture by introducing task-specific processors and a plugin-based system, moving logic out of the main LLM engine. Key changes include the addition of token-level classification and embedding tasks, and the delegation of pre/post-processing to specialized classes. Feedback highlights several critical bugs, including a missing parameter in the offline context and an incorrect attribute access on a sequence object. Further improvements are needed to remove duplicate code blocks, fix malformed function signatures, and provide more descriptive error messages for better user feedback.
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
|
Hi @noooop, the pre-commit checks have failed. Please run: uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request refactors the pooling and IO processor architecture to improve modularity and consistency between offline and online entrypoints. Key changes include the introduction of specialized PoolingIOProcessor implementations and the removal of direct io_processor dependencies from core serving classes. The review feedback highlights several critical issues: the use of assert for public API input validation which could lead to regressions, a logic bug in the plugin processor that breaks deprecated compatibility by unconditionally overwriting responses, and a potential KeyError when handling unsupported pooling tasks in the serving layer.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: wang.yuqi <noooop@126.com>
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request significantly refactors the handling of IO processors and pooling tasks within the vLLM serving infrastructure. Key changes include centralizing IO processor management within a new PoolingServing class hierarchy, removing direct io_processor attributes from core engine and rendering components, and introducing specialized PoolingIOProcessor implementations for different tasks, including a new 'plugin' task. The review highlights several critical issues: a regression in the offline encode API's inference of the 'plugin' task for {"data": ...} prompts, the bypassing of error checks for missing IOProcessor plugins due to a dummy class registration, and the presence of assertions that should be replaced with explicit ValueErrors for improved error handling in production.
- Replace _get_offline_token_limits with _params_to_single + _get_token_limits (compatible with upcoming _params_to_seq from vllm-project#39153) - Remove duplicate validation from base/serving.py (now only in io_processor) - Validate negative values (!=0 check instead of >0) - Restore original comments for cross-encoder and LLM-as-reranker paths Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Jesus Federico <jefp@amazon.com>
…39153) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
_params_to_single was a bridge helper added before vllm-project#39153 landed. Now that vllm-project#39153 is merged, ctx.pooling_params is always a single PoolingParams in the offline path (enforced by assert). Removed the helper and simplified _get_token_limits signature. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Jesus Federico <jefp@amazon.com>
- Replace _get_offline_token_limits with _params_to_single + _get_token_limits (compatible with upcoming _params_to_seq from vllm-project#39153) - Remove duplicate validation from base/serving.py (now only in io_processor) - Validate negative values (!=0 check instead of >0) - Restore original comments for cross-encoder and LLM-as-reranker paths Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Jesus Federico <jefp@amazon.com>
_params_to_single was a bridge helper added before vllm-project#39153 landed. Now that vllm-project#39153 is merged, ctx.pooling_params is always a single PoolingParams in the offline path (enforced by assert). Removed the helper and simplified _get_token_limits signature. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Jesus Federico <jefp@amazon.com>
…39153) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Purpose
Improve pooling entrypoints
Test Plan
keep ci green
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.