[Performance] implement async_scheduling in single process mode by Ronald1995 · Pull Request #23914 · vllm-project/vllm

Ronald1995 · 2025-08-29T08:16:49Z

Purpose

async_scheduling don't support UniProcExecutor and ExecutorWithExternalLauncher, this PR aims to implement async_scheduling in single process mode.
i make a new thread to do the execute_model task, when copy sample_token_ids from device to host, it will block the new thread, so let the new thread notify main thread to make scheduling of next step, it can overlap schedule operations with the data copy operations.

Test Plan

test script refer to：https://github.com/vllm-project/vllm-ascend/blob/main/examples/offline_external_launcher.py
In A100, we use exeternal_launcher method to initialize LLM, make configurations as:
TP=2, max_num_seqs=512,max_model_len=4096,max_tokens=512,ignore_eos=True,prm800k_500.jsonl as dataset.

Test Result

gbs	sync_scheduling(tps)	async_scheduling(tps)	speedup(%)
32	1014	1025	1.1%
64	1853	1896	2.3%
128	3048	3177	4.2%
256	4039	4258	5.4%

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

gemini-code-assist

Code Review

This pull request introduces asynchronous scheduling for the single-process executor, which is a great step towards improving performance. The core idea of using a separate thread for model execution and overlapping D2H copy with scheduling is sound.

However, I've identified a critical issue with the implementation in vllm/executor/uniproc_executor.py. The new background thread for _execute_model_loop lacks proper exception handling. If an error occurs during model execution, the thread will die silently, causing the main thread to hang in a deadlock. I've provided suggestions to propagate exceptions to the main thread.

Additionally, for a robust implementation, a graceful shutdown mechanism for the new thread should be added to the shutdown method of UniProcExecutor. This would involve sending a sentinel value to the input queue and joining the thread.

vllm/executor/uniproc_executor.py

Ronald1995 · 2025-08-30T10:25:18Z

@WoosukKwon @njhill would you please review this PR

mergify · 2025-09-01T01:54:44Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @Ronald1995.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Ronald1995 · 2025-09-02T11:21:12Z

@njhill Hello, I'm looking forward for your review, thanks!

Signed-off-by: Ronald1995 <ronaldautomobile@163.com>

njhill · 2025-09-03T14:57:13Z

Thanks for this @Ronald1995, I will review today

Signed-off-by: Ronald1995 <ronaldautomobile@163.com>

njhill · 2025-09-04T06:56:16Z

@Ronald1995 I think it would make most sense to implement this on top of #23569, where we can have a unified approach and make similar adjustments to the uniproc executor as that PR has for the multiproc executor. I opened a draft PR #24219 with that (this is the commit). Let me know what you think!

However, it looks like there may not be much advantage any longer to using uniproc over multiproc anyhow so these may have limited benefit.

Ronald1995 · 2025-09-05T02:05:32Z

@Ronald1995 I think it would make most sense to implement this on top of #23569, where we can have a unified approach and make similar adjustments to the uniproc executor as that PR has for the multiproc executor. I opened a draft PR #24219 with that (this is the commit). Let me know what you think!

However, it looks like there may not be much advantage any longer to using uniproc over multiproc anyhow so these may have limited benefit.

@njhill it's ok for me to combine my PR with 23569. The main purpose of my PR is to support async_scheduling in RL trainning scenario which use extenal_launcher method, external launcher executor is a subclass of uniExcutor, which is also a single process. in RL trainning scenario, the batch_size maybe large, it will get a ideal speedup performance(5%), so please also support external launcher executor in your draft PR. Thanks.

Ronald1995 requested review from WoosukKwon, alexm-redhat, comaniac, njhill, robertgshaw2-redhat and ywang96 as code owners August 29, 2025 08:16

mergify bot added the v1 label Aug 29, 2025

Ronald1995 changed the title ~~implement async_scheduling in single process mode~~ [Performance] implement async_scheduling in single process mode Aug 29, 2025

gemini-code-assist bot reviewed Aug 29, 2025

View reviewed changes

vllm/executor/uniproc_executor.py Outdated Show resolved Hide resolved

vllm/executor/uniproc_executor.py Outdated Show resolved Hide resolved

Ronald1995 force-pushed the feature_async_inproc branch from b1ca2b2 to e0f563e Compare August 30, 2025 10:15

njhill self-assigned this Aug 30, 2025

Ronald1995 force-pushed the feature_async_inproc branch from e0f563e to af35fd0 Compare September 1, 2025 01:54

mergify bot added the needs-rebase label Sep 1, 2025

Ronald1995 force-pushed the feature_async_inproc branch 4 times, most recently from 3e2c78b to 8a59758 Compare September 2, 2025 11:18

Ronald1995 force-pushed the feature_async_inproc branch from 8a59758 to a8123e9 Compare September 3, 2025 01:07

mergify bot removed the needs-rebase label Sep 3, 2025

Ronald1995 force-pushed the feature_async_inproc branch 2 times, most recently from 7a6c74c to 926e024 Compare September 3, 2025 06:56

Ronald1995 added 3 commits September 3, 2025 16:55

implement async_scheduling in single process mode

5f560ee

Signed-off-by: Ronald1995 <ronaldautomobile@163.com>

fix errors after local validation

fe3c277

Signed-off-by: Ronald1995 <ronaldautomobile@163.com>

fix error of lint

313b583

Signed-off-by: Ronald1995 <ronaldautomobile@163.com>

Ronald1995 force-pushed the feature_async_inproc branch from ebc7207 to a6ddcbb Compare September 3, 2025 08:55

fix typehint error of batch_queue

1552e60

Signed-off-by: Ronald1995 <ronaldautomobile@163.com>

Ronald1995 force-pushed the feature_async_inproc branch from a6ddcbb to 1552e60 Compare September 3, 2025 08:57

fix error of mypy

f06a2b6

Signed-off-by: Ronald1995 <ronaldautomobile@163.com>

Ronald1995 force-pushed the feature_async_inproc branch from 8ae32b3 to f06a2b6 Compare September 4, 2025 03:13

Ronald1995 closed this Sep 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Performance] implement async_scheduling in single process mode#23914

[Performance] implement async_scheduling in single process mode#23914
Ronald1995 wants to merge 5 commits intovllm-project:mainfrom
Ronald1995:feature_async_inproc

Ronald1995 commented Aug 29, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Ronald1995 commented Aug 30, 2025

Uh oh!

mergify bot commented Sep 1, 2025

Uh oh!

Ronald1995 commented Sep 2, 2025

Uh oh!

njhill commented Sep 3, 2025

Uh oh!

njhill commented Sep 4, 2025 •

edited

Loading

Uh oh!

Ronald1995 commented Sep 5, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Ronald1995 commented Aug 29, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Ronald1995 commented Aug 30, 2025

Uh oh!

mergify bot commented Sep 1, 2025

Uh oh!

Ronald1995 commented Sep 2, 2025

Uh oh!

njhill commented Sep 3, 2025

Uh oh!

njhill commented Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Ronald1995 commented Sep 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Ronald1995 commented Aug 29, 2025 •

edited by github-actions bot

Loading

njhill commented Sep 4, 2025 •

edited

Loading

Ronald1995 commented Sep 5, 2025 •

edited

Loading