Skip to content

pass raw request to io_process_plugin#34419

Closed
staugust wants to merge 1 commit intovllm-project:mainfrom
staugust:io_process_plugin_parse_data
Closed

pass raw request to io_process_plugin#34419
staugust wants to merge 1 commit intovllm-project:mainfrom
staugust:io_process_plugin_parse_data

Conversation

@staugust
Copy link
Contributor

@staugust staugust commented Feb 12, 2026

Purpose

parse_request assumes parameter is request before parse_data is introduced. When parsing data, io_process_plugin may need to get extra params from raw request, e.g. truncate_prompt_tokens.

image

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: augusto.yjh <augusto.yjh@antgroup.com>
@staugust staugust requested a review from noooop as a code owner February 12, 2026 10:06
@mergify mergify bot added the frontend label Feb 12, 2026
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request modifies the create_pooling function in vllm/entrypoints/pooling/pooling/serving.py to pass the entire request object to the io_processor.parse_data method, instead of just request.data. This change aims to enable IOProcessor plugins to access additional parameters from the raw request, such as truncate_prompt_tokens, thereby enhancing plugin flexibility.

)

validated_prompt = self.io_processor.parse_data(request.data)
validated_prompt = self.io_processor.parse_data(request)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This change passes the entire request object (which is an IOProcessorRequest) to self.io_processor.parse_data. This alters the expected input type for IOProcessor.parse_data implementations. Previously, plugins would receive request.data (type T from IOProcessorRequest[T]), but now they will receive the IOProcessorRequest object itself. This is a breaking change for existing plugins that implement IOProcessor.parse_data expecting the previous data type. To prevent runtime errors and ensure type safety, the IOProcessor.parse_data method's signature in vllm/plugins/io_processors/interface.py should be updated to parse_data(self, request: IOProcessorRequest) -> IOProcessorInput: to reflect this new expectation.

@noooop
Copy link
Collaborator

noooop commented Feb 12, 2026

cc @christian-pinto @DarkLight1337

Please help review this.

@christian-pinto
Copy link
Contributor

I believe this change was introduced so that in the function parse_request, now parse_data we would not need to cater for the case when it is invoked via the offline interface (data dict as input) and the case where is is invoked from the online API (with IOProcessorRequest) as input.

Personally, I can see the value of getting the full request and I would be fine reverting to passing the full request to the plugin, and provide a default implementation of parse_request in the interface that distinguishes between the two cases. People can then reimplement as they please in their plugins.

@DarkLight1337 any thoughts?

@staugust do you have a specific use-case in mind?

@staugust
Copy link
Contributor Author

@christian-pinto I was using request to get truncated prompt tokens. I’d check api changes later to make sure no extra params need to be passed with request object.

@DarkLight1337
Copy link
Member

I am ok with having parse_request only handle the online case and call parse_data internally in the default implementation. But we need to be careful to keep back compatibility.

@christian-pinto
Copy link
Contributor

@christian-pinto I was using request to get truncated prompt tokens. I’d check api changes later to make sure no extra params need to be passed with request object.

I just saw #34214 sorry for missing it, I was not included in the conversation. I am happy to see other people trying to use the IO Processor Plugin concept.

@noooop
Copy link
Collaborator

noooop commented Feb 12, 2026

PTAL https://github.com/vllm-project/vllm/pull/28631/changes#r2798137823

I attempted to modify the pooling to utilize an IOProcessor (not “the” IOProcessor). While it is not yet compatible with “the” IOProcessor, we can gradually align the two.

@staugust
Copy link
Contributor Author

staugust commented Feb 13, 2026

Pr #34214 intends to add sparse embedding output of bge_m3 in sparse format [{token_id, weight, token_text}], which just updates IOProcessorRequest, call engine_client.encode with pooling_task == token_classify, and updates Poolingoutput to sparse format mentioned before. bge_m3_sparse_plugin reuses pooling related parameters like model, user, request_id, embed_type, encoding_format defined in IOProcessorRequest instead of parsing those params from field IOProcessorRequest.data. Thus, I prefer to pass raw IOProcessorRequest instead of request.data, so that we can reuse those parameters. It's ok to just pass request.data, cause we can parse those parameters from request.data as well.

Both approaches are feasible, so passing either request or request.data depends on our design of pooling API and the io_processor_plugin. Which one do you prefer to pass: request or just request.data?

And I have a few questions/suggestions around current io_processor_plugin design.

Backgroud: Right now there is a noticeable behavioral difference between:

  • Online pooling endpoint
    It does not run _preprocess_chat or _preprocess_completion for IOProcessorRequest.
  • Offline: llm.encode
    It does run _preprocess_completion for prompts in a unified way.
    For prompts that require io_processor_plugin, the flow is roughly:
    io_processor.parse_data, io_processor.pre_process, io_processor.merge_pooling_params.
    Later in _run_completion, call llm._preprocess_completion and apply render_cmpl uniformly to prompts.

This leads to another question: when we design the io_processor_plugin API, what are the intended semantics and expected return formats of parse_data, pre_process ? And for plugin authors, it would help to know:
Is the plugin expected to be mode-agnostic (i.e. same logic for online/offline), with the framework ensuring consistent preprocessing paths.
Or should the plugin explicitly be aware of the mode (e.g. via a flag or context) and adapt behavior accordingly?

Right now, without a clear contract, it’s tricky to implement an io_processor_plugin that behaves correctly and consistently across both online and offline code paths. Maybe we could write an RFC for the io_processor_plugin to clarify the design? Perhaps the preprocessing of multimodal data could also reuse the io_processor_plugin mechanism.
@noooop @christian-pinto @DarkLight1337

@DarkLight1337
Copy link
Member

process_data is intended to be mode-agnostic, yes. I think we can add a TokenizeParams parameter to it so you can get the truncate_prompt_tokens information regardless of offline or online API.

@DarkLight1337
Copy link
Member

DarkLight1337 commented Feb 13, 2026

Online pooling endpoint
It does not run _preprocess_chat or _preprocess_completion for IOProcessorRequest.

This is an oversight, let me fix that

@christian-pinto
Copy link
Contributor

This leads to another question: when we design the io_processor_plugin API, what are the intended semantics and expected return formats of parse_data, pre_process ? And for plugin authors, it would help to know:
Is the plugin expected to be mode-agnostic (i.e. same logic for online/offline), with the framework ensuring consistent preprocessing paths.

The way it is right now is that we expect the pre_process to generate prompts to be fed to the engine encode. The assumption is that when building this, we wanted to enable people pass virtually anything to their model and let the pre_process while vLLM only understands prompts.

On the parse_data, +1 to @DarkLight1337, it's expected to be mode agnostic and it's thought as a way for the plugin to make sure the user has sent data formatted properly. A sort of fail fast and fail early.

@staugust
Copy link
Contributor Author

As discussed above, both online serving mode and offline mode pass request.data as the input of parse_data, the issue is fixed in #34618

@staugust staugust closed this Feb 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants