pass raw request to io_process_plugin#34419
pass raw request to io_process_plugin#34419staugust wants to merge 1 commit intovllm-project:mainfrom
Conversation
Signed-off-by: augusto.yjh <augusto.yjh@antgroup.com>
There was a problem hiding this comment.
Code Review
The pull request modifies the create_pooling function in vllm/entrypoints/pooling/pooling/serving.py to pass the entire request object to the io_processor.parse_data method, instead of just request.data. This change aims to enable IOProcessor plugins to access additional parameters from the raw request, such as truncate_prompt_tokens, thereby enhancing plugin flexibility.
| ) | ||
|
|
||
| validated_prompt = self.io_processor.parse_data(request.data) | ||
| validated_prompt = self.io_processor.parse_data(request) |
There was a problem hiding this comment.
This change passes the entire request object (which is an IOProcessorRequest) to self.io_processor.parse_data. This alters the expected input type for IOProcessor.parse_data implementations. Previously, plugins would receive request.data (type T from IOProcessorRequest[T]), but now they will receive the IOProcessorRequest object itself. This is a breaking change for existing plugins that implement IOProcessor.parse_data expecting the previous data type. To prevent runtime errors and ensure type safety, the IOProcessor.parse_data method's signature in vllm/plugins/io_processors/interface.py should be updated to parse_data(self, request: IOProcessorRequest) -> IOProcessorInput: to reflect this new expectation.
|
cc @christian-pinto @DarkLight1337 Please help review this. |
|
I believe this change was introduced so that in the function Personally, I can see the value of getting the full request and I would be fine reverting to passing the full request to the plugin, and provide a default implementation of parse_request in the interface that distinguishes between the two cases. People can then reimplement as they please in their plugins. @DarkLight1337 any thoughts? @staugust do you have a specific use-case in mind? |
|
@christian-pinto I was using request to get truncated prompt tokens. I’d check api changes later to make sure no extra params need to be passed with request object. |
|
I am ok with having |
I just saw #34214 sorry for missing it, I was not included in the conversation. I am happy to see other people trying to use the IO Processor Plugin concept. |
|
PTAL https://github.com/vllm-project/vllm/pull/28631/changes#r2798137823 I attempted to modify the pooling to utilize an IOProcessor (not “the” IOProcessor). While it is not yet compatible with “the” IOProcessor, we can gradually align the two. |
|
Pr #34214 intends to add sparse embedding output of Both approaches are feasible, so passing either And I have a few questions/suggestions around current Backgroud: Right now there is a noticeable behavioral difference between:
This leads to another question: when we design the Right now, without a clear contract, it’s tricky to implement an |
|
|
This is an oversight, let me fix that |
The way it is right now is that we expect the pre_process to generate prompts to be fed to the engine encode. The assumption is that when building this, we wanted to enable people pass virtually anything to their model and let the pre_process while vLLM only understands prompts. On the parse_data, +1 to @DarkLight1337, it's expected to be mode agnostic and it's thought as a way for the plugin to make sure the user has sent data formatted properly. A sort of fail fast and fail early. |
|
As discussed above, both |
Purpose
parse_requestassumes parameter isrequestbeforeparse_datais introduced. When parsing data,io_process_pluginmay need to get extra params from raw request, e.g.truncate_prompt_tokens.Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.