Skip to content

[Performance] implement async_scheduling in single process mode#23914

Closed
Ronald1995 wants to merge 5 commits intovllm-project:mainfrom
Ronald1995:feature_async_inproc
Closed

[Performance] implement async_scheduling in single process mode#23914
Ronald1995 wants to merge 5 commits intovllm-project:mainfrom
Ronald1995:feature_async_inproc

Conversation

@Ronald1995
Copy link
Contributor

@Ronald1995 Ronald1995 commented Aug 29, 2025

Purpose

async_scheduling don't support UniProcExecutor and ExecutorWithExternalLauncher, this PR aims to implement async_scheduling in single process mode.
i make a new thread to do the execute_model task, when copy sample_token_ids from device to host, it will block the new thread, so let the new thread notify main thread to make scheduling of next step, it can overlap schedule operations with the data copy operations.

Test Plan

test script refer to:https://github.com/vllm-project/vllm-ascend/blob/main/examples/offline_external_launcher.py
In A100, we use exeternal_launcher method to initialize LLM, make configurations as:
TP=2, max_num_seqs=512,max_model_len=4096,max_tokens=512,ignore_eos=True,prm800k_500.jsonl as dataset.

Test Result

gbs sync_scheduling(tps) async_scheduling(tps) speedup(%)
32 1014 1025 1.1%
64 1853 1896 2.3%
128 3048 3177 4.2%
256 4039 4258 5.4%

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

@mergify mergify bot added the v1 label Aug 29, 2025
@Ronald1995 Ronald1995 changed the title implement async_scheduling in single process mode [Performance] implement async_scheduling in single process mode Aug 29, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces asynchronous scheduling for the single-process executor, which is a great step towards improving performance. The core idea of using a separate thread for model execution and overlapping D2H copy with scheduling is sound.

However, I've identified a critical issue with the implementation in vllm/executor/uniproc_executor.py. The new background thread for _execute_model_loop lacks proper exception handling. If an error occurs during model execution, the thread will die silently, causing the main thread to hang in a deadlock. I've provided suggestions to propagate exceptions to the main thread.

Additionally, for a robust implementation, a graceful shutdown mechanism for the new thread should be added to the shutdown method of UniProcExecutor. This would involve sending a sentinel value to the input queue and joining the thread.

@Ronald1995 Ronald1995 force-pushed the feature_async_inproc branch from b1ca2b2 to e0f563e Compare August 30, 2025 10:15
@Ronald1995
Copy link
Contributor Author

@WoosukKwon @njhill would you please review this PR

@njhill njhill self-assigned this Aug 30, 2025
@mergify
Copy link

mergify bot commented Sep 1, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @Ronald1995.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Sep 1, 2025
@Ronald1995 Ronald1995 force-pushed the feature_async_inproc branch 4 times, most recently from 3e2c78b to 8a59758 Compare September 2, 2025 11:18
@Ronald1995
Copy link
Contributor Author

@njhill Hello, I'm looking forward for your review, thanks!

@mergify mergify bot removed the needs-rebase label Sep 3, 2025
@Ronald1995 Ronald1995 force-pushed the feature_async_inproc branch 2 times, most recently from 7a6c74c to 926e024 Compare September 3, 2025 06:56
Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
@njhill
Copy link
Member

njhill commented Sep 3, 2025

Thanks for this @Ronald1995, I will review today

Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
@njhill
Copy link
Member

njhill commented Sep 4, 2025

@Ronald1995 I think it would make most sense to implement this on top of #23569, where we can have a unified approach and make similar adjustments to the uniproc executor as that PR has for the multiproc executor. I opened a draft PR #24219 with that (this is the commit). Let me know what you think!

However, it looks like there may not be much advantage any longer to using uniproc over multiproc anyhow so these may have limited benefit.

@Ronald1995
Copy link
Contributor Author

Ronald1995 commented Sep 5, 2025

@Ronald1995 I think it would make most sense to implement this on top of #23569, where we can have a unified approach and make similar adjustments to the uniproc executor as that PR has for the multiproc executor. I opened a draft PR #24219 with that (this is the commit). Let me know what you think!

However, it looks like there may not be much advantage any longer to using uniproc over multiproc anyhow so these may have limited benefit.

@njhill it's ok for me to combine my PR with 23569. The main purpose of my PR is to support async_scheduling in RL trainning scenario which use extenal_launcher method, external launcher executor is a subclass of uniExcutor, which is also a single process. in RL trainning scenario, the batch_size maybe large, it will get a ideal speedup performance(5%), so please also support external launcher executor in your draft PR. Thanks.

@Ronald1995 Ronald1995 closed this Sep 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants