pass raw request to io_process_plugin by staugust · Pull Request #34419 · vllm-project/vllm

staugust · 2026-02-12T10:06:14Z

Purpose

parse_request assumes parameter is request before parse_data is introduced. When parsing data, io_process_plugin may need to get extra params from raw request, e.g. truncate_prompt_tokens.

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: augusto.yjh <augusto.yjh@antgroup.com>

gemini-code-assist

Code Review

The pull request modifies the create_pooling function in vllm/entrypoints/pooling/pooling/serving.py to pass the entire request object to the io_processor.parse_data method, instead of just request.data. This change aims to enable IOProcessor plugins to access additional parameters from the raw request, such as truncate_prompt_tokens, thereby enhancing plugin flexibility.

gemini-code-assist · 2026-02-12T10:07:54Z

vllm/entrypoints/pooling/pooling/serving.py

                    )

-                validated_prompt = self.io_processor.parse_data(request.data)
+                validated_prompt = self.io_processor.parse_data(request)


This change passes the entire request object (which is an IOProcessorRequest) to self.io_processor.parse_data. This alters the expected input type for IOProcessor.parse_data implementations. Previously, plugins would receive request.data (type T from IOProcessorRequest[T]), but now they will receive the IOProcessorRequest object itself. This is a breaking change for existing plugins that implement IOProcessor.parse_data expecting the previous data type. To prevent runtime errors and ensure type safety, the IOProcessor.parse_data method's signature in vllm/plugins/io_processors/interface.py should be updated to parse_data(self, request: IOProcessorRequest) -> IOProcessorInput: to reflect this new expectation.

noooop · 2026-02-12T10:16:29Z

cc @christian-pinto @DarkLight1337

Please help review this.

christian-pinto · 2026-02-12T10:37:14Z

I believe this change was introduced so that in the function parse_request, now parse_data we would not need to cater for the case when it is invoked via the offline interface (data dict as input) and the case where is is invoked from the online API (with IOProcessorRequest) as input.

Personally, I can see the value of getting the full request and I would be fine reverting to passing the full request to the plugin, and provide a default implementation of parse_request in the interface that distinguishes between the two cases. People can then reimplement as they please in their plugins.

@DarkLight1337 any thoughts?

@staugust do you have a specific use-case in mind?

staugust · 2026-02-12T10:43:58Z

@christian-pinto I was using request to get truncated prompt tokens. I’d check api changes later to make sure no extra params need to be passed with request object.

DarkLight1337 · 2026-02-12T10:45:40Z

I am ok with having parse_request only handle the online case and call parse_data internally in the default implementation. But we need to be careful to keep back compatibility.

christian-pinto · 2026-02-12T10:48:51Z

@christian-pinto I was using request to get truncated prompt tokens. I’d check api changes later to make sure no extra params need to be passed with request object.

I just saw #34214 sorry for missing it, I was not included in the conversation. I am happy to see other people trying to use the IO Processor Plugin concept.

noooop · 2026-02-12T10:53:03Z

PTAL https://github.com/vllm-project/vllm/pull/28631/changes#r2798137823

I attempted to modify the pooling to utilize an IOProcessor （not “the” IOProcessor）. While it is not yet compatible with “the” IOProcessor, we can gradually align the two.

staugust · 2026-02-13T03:56:02Z

Pr #34214 intends to add sparse embedding output of bge_m3 in sparse format [{token_id, weight, token_text}], which just updates IOProcessorRequest, call engine_client.encode with pooling_task == token_classify, and updates Poolingoutput to sparse format mentioned before. bge_m3_sparse_plugin reuses pooling related parameters like model, user, request_id, embed_type, encoding_format defined in IOProcessorRequest instead of parsing those params from field IOProcessorRequest.data. Thus, I prefer to pass raw IOProcessorRequest instead of request.data, so that we can reuse those parameters. It's ok to just pass request.data, cause we can parse those parameters from request.data as well.

Both approaches are feasible, so passing either request or request.data depends on our design of pooling API and the io_processor_plugin. Which one do you prefer to pass: request or just request.data?

And I have a few questions/suggestions around current io_processor_plugin design.

Backgroud: Right now there is a noticeable behavioral difference between:

Online pooling endpoint
It does not run _preprocess_chat or _preprocess_completion for IOProcessorRequest.
Offline: llm.encode
It does run _preprocess_completion for prompts in a unified way.
For prompts that require io_processor_plugin, the flow is roughly:
io_processor.parse_data, io_processor.pre_process, io_processor.merge_pooling_params.
Later in _run_completion, call llm._preprocess_completion and apply render_cmpl uniformly to prompts.

This leads to another question: when we design the io_processor_plugin API, what are the intended semantics and expected return formats of parse_data, pre_process ? And for plugin authors, it would help to know:
Is the plugin expected to be mode-agnostic (i.e. same logic for online/offline), with the framework ensuring consistent preprocessing paths.
Or should the plugin explicitly be aware of the mode (e.g. via a flag or context) and adapt behavior accordingly?

Right now, without a clear contract, it’s tricky to implement an io_processor_plugin that behaves correctly and consistently across both online and offline code paths. Maybe we could write an RFC for the io_processor_plugin to clarify the design? Perhaps the preprocessing of multimodal data could also reuse the io_processor_plugin mechanism.
@noooop @christian-pinto @DarkLight1337

DarkLight1337 · 2026-02-13T04:31:30Z

process_data is intended to be mode-agnostic, yes. I think we can add a TokenizeParams parameter to it so you can get the truncate_prompt_tokens information regardless of offline or online API.

DarkLight1337 · 2026-02-13T04:32:28Z

Online pooling endpoint
It does not run _preprocess_chat or _preprocess_completion for IOProcessorRequest.

This is an oversight, let me fix that

christian-pinto · 2026-02-13T09:07:22Z

This leads to another question: when we design the io_processor_plugin API, what are the intended semantics and expected return formats of parse_data, pre_process ? And for plugin authors, it would help to know:
Is the plugin expected to be mode-agnostic (i.e. same logic for online/offline), with the framework ensuring consistent preprocessing paths.

The way it is right now is that we expect the pre_process to generate prompts to be fed to the engine encode. The assumption is that when building this, we wanted to enable people pass virtually anything to their model and let the pre_process while vLLM only understands prompts.

On the parse_data, +1 to @DarkLight1337, it's expected to be mode agnostic and it's thought as a way for the plugin to make sure the user has sent data formatted properly. A sort of fail fast and fail early.

staugust · 2026-02-25T07:27:03Z

As discussed above, both online serving mode and offline mode pass request.data as the input of parse_data, the issue is fixed in #34618

pass raw request to io_process_plugin

e9c056b

Signed-off-by: augusto.yjh <augusto.yjh@antgroup.com>

staugust requested a review from noooop as a code owner February 12, 2026 10:06

mergify bot added the frontend label Feb 12, 2026

gemini-code-assist bot reviewed Feb 12, 2026

View reviewed changes

DarkLight1337 mentioned this pull request Feb 13, 2026

[Refactor] Call renderer for online IO processor request #34490

Merged

5 tasks

vllm-bot closed this in #34490 Feb 13, 2026

DarkLight1337 reopened this Feb 13, 2026

staugust mentioned this pull request Feb 14, 2026

add io_process_plugin for sparse embedding #34214

Merged

5 tasks

staugust closed this Feb 25, 2026

Uh oh!

Conversation

staugust commented Feb 12, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

noooop commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

christian-pinto commented Feb 12, 2026

Uh oh!

staugust commented Feb 12, 2026

Uh oh!

DarkLight1337 commented Feb 12, 2026

Uh oh!

christian-pinto commented Feb 12, 2026

Uh oh!

noooop commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

staugust commented Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DarkLight1337 commented Feb 13, 2026

Uh oh!

DarkLight1337 commented Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

christian-pinto commented Feb 13, 2026

Uh oh!

staugust commented Feb 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

staugust commented Feb 12, 2026 •

edited by github-actions bot

Loading

noooop commented Feb 12, 2026 •

edited

Loading

noooop commented Feb 12, 2026 •

edited

Loading

staugust commented Feb 13, 2026 •

edited

Loading

DarkLight1337 commented Feb 13, 2026 •

edited

Loading