feat(pooling): Add dedicated async preprocessing support to PluginWithIOProcessorPlugins#40030
feat(pooling): Add dedicated async preprocessing support to PluginWithIOProcessorPlugins#40030mgazz wants to merge 5 commits intovllm-project:mainfrom
Conversation
Signed-off-by: Michele Gazzetti <michele.gazzetti1@ibm.com>
Signed-off-by: Michele Gazzetti <michele.gazzetti1@ibm.com>
There was a problem hiding this comment.
Code Review
This pull request introduces asynchronous preprocessing support by adding pre_process_async to the processor and pre_process_online_async to the IO processor. While the changes aim to improve concurrency, the current implementation of pre_process_online_async blocks the asyncio event loop by calling synchronous rendering logic, which negates the benefits of an async entry point. Additionally, the new async method is missing a request type assertion present in the synchronous version.
| async def pre_process_online_async(self, ctx: PoolingServeContext): | ||
| validated_prompt = self.io_processor.parse_data(ctx.request.data) | ||
|
|
||
| raw_prompts = await self.io_processor.pre_process_async( | ||
| prompt=validated_prompt, request_id=ctx.request_id | ||
| ) | ||
|
|
||
| self._set_engine_inputs_and_params(raw_prompts, ctx) |
There was a problem hiding this comment.
The pre_process_online_async method is missing the assert isinstance(ctx.request, IOProcessorRequest) check that is present in pre_process_online. More importantly, it calls _set_engine_inputs_and_params, which is a synchronous helper that performs blocking operations like tokenization and multimodal processing via self.renderer.render_cmpl. This blocks the asyncio event loop, defeating the purpose of an async entry point. To maintain responsiveness, you should use self.renderer.render_cmpl_async in the async path. Consider refactoring the shared logic into smaller helpers to avoid duplication while allowing both sync and async rendering.
|
Thanks for your contribution
Do you have any test data showing that using pre_process_async is faster than offloading blocking preprocessing and postprocessing ops to a thread pool? Also try increasing renderer_num_workers and api-server-count together. After all, async only uses a single thread, while renderer_num_workers uses multiple threads, and api-server-count uses multiple processes. They can effectively bypass the GIL. |
|
Thank you for the reference. I see the PR is merged, I will test with main and check if the we still have issues loading Terratorch IOProcessors. |
|
PTAL #34789 (comment) Because the GIL makes it nearly impossible for python multi-threading to speed up preprocessing, ultimately we need api-server-count to use multi-processing to accelerate preprocessing. |
|
Can you also fix https://buildkite.com/vllm/ci/builds/61639/steps/table?sid=019d96c9-ca44-4bed-92d4-140c1581a99e&tab=output in this PR? Thanks |
Please help fix plugins_tests/test_terratorch_io_processor_plugins.py, it caused by #39763, and now the main branch is failing. |
|
Otherwise @noooop can fix it tomorrow, see who got time |
|
You could try having the entrypoint directly call pre_process_async again. Since pre_process is essentially running loop.run_until_complete(self.pre_process_async(prompt, request_id, **kwargs)), I don’t think it’s very compatible with the current thread pool. |
…mpl_async Signed-off-by: Michele Gazzetti <michele.gazzetti1@ibm.com>
Good question, we did some performance testing with the previous IOProcessor implementatino, but we have not evaluated the thread pool approach yet. This PR aimed at extending the current IOProcessor wrapping layer to support the async interface that plugins like the Terratorch one already implement, so existing code continues to work properly. Terratorch IOProcessors are designed for I/O-bound operations: downloading satellite imagery over the network and reading large geospatial files. For these workloads, async was the natural fit because when most time is spent downloading data (not CPU processing), async can handle many more concurrent operations efficiently. That said our performance evaluation showed that, if downloads are fast, we experienced diminishing returns with an async implementation. In this case, using thread pool should allow us to get better performance so it's great to have it already as an option. Another important aspect is that TerraTorch IOProcessors implement the plugin
Thank you for the suggestion, I’m happy to look into it. FIY @christian-pinto |
Signed-off-by: Michele Gazzetti <michele.gazzetti1@ibm.com>
|
Hi @mgazz, the pre-commit checks have failed. Please run: uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
|
I cherry-picked your last commit to unblock the CI. |
|
vllm/vllm/entrypoints/pooling/base/serving.py Lines 72 to 101 in 4c47710 Currently, only the sync interface (pre_process_online) is used in _preprocessing, and the async interface (pre_process_async) is not used. Therefore, you need to modify vllm/entrypoints/pooling/base/serving.py to re-enable this path. We will accept this if you find through testing that calling the async interface is faster. We are optimizing the vLLM Observability system to make this kind of performance comparison easier. |
|
I tested the main branch and Why it works? def pre_process(
self,
prompt: IOProcessorInput,
request_id: str | None = None,
**kwargs,
) -> PromptType | Sequence[PromptType]:
#loop = asyncio.get_event_loop()
# return loop.run_until_complete(self.pre_process_async(prompt, request_id, **kwargs))
return asyncio.run(self.pre_process_async(prompt, request_id, **kwargs))With the thread pool,
This is a feature that was also in our roadmap and I am happy to help. |
Purpose
Adds asynchronous preprocessing support to
PluginWithIOProcessorPluginsto enable IOProcessor plugins that perform async operations, such as asynchronous data loading in Terratorch plugins. Here an example of plugin using pre_process_asyncTest Plan
The test is the same as before
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.