[Performance] implement async_scheduling in single process mode#23914
[Performance] implement async_scheduling in single process mode#23914Ronald1995 wants to merge 5 commits intovllm-project:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces asynchronous scheduling for the single-process executor, which is a great step towards improving performance. The core idea of using a separate thread for model execution and overlapping D2H copy with scheduling is sound.
However, I've identified a critical issue with the implementation in vllm/executor/uniproc_executor.py. The new background thread for _execute_model_loop lacks proper exception handling. If an error occurs during model execution, the thread will die silently, causing the main thread to hang in a deadlock. I've provided suggestions to propagate exceptions to the main thread.
Additionally, for a robust implementation, a graceful shutdown mechanism for the new thread should be added to the shutdown method of UniProcExecutor. This would involve sending a sentinel value to the input queue and joining the thread.
b1ca2b2 to
e0f563e
Compare
|
@WoosukKwon @njhill would you please review this PR |
e0f563e to
af35fd0
Compare
|
This pull request has merge conflicts that must be resolved before it can be |
3e2c78b to
8a59758
Compare
|
@njhill Hello, I'm looking forward for your review, thanks! |
8a59758 to
a8123e9
Compare
7a6c74c to
926e024
Compare
Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
ebc7207 to
a6ddcbb
Compare
Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
a6ddcbb to
1552e60
Compare
|
Thanks for this @Ronald1995, I will review today |
Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
8ae32b3 to
f06a2b6
Compare
|
@Ronald1995 I think it would make most sense to implement this on top of #23569, where we can have a unified approach and make similar adjustments to the uniproc executor as that PR has for the multiproc executor. I opened a draft PR #24219 with that (this is the commit). Let me know what you think! However, it looks like there may not be much advantage any longer to using uniproc over multiproc anyhow so these may have limited benefit. |
@njhill it's ok for me to combine my PR with 23569. The main purpose of my PR is to support async_scheduling in RL trainning scenario which use extenal_launcher method, external launcher executor is a subclass of uniExcutor, which is also a single process. in RL trainning scenario, the batch_size maybe large, it will get a ideal speedup performance(5%), so please also support external launcher executor in your draft PR. Thanks. |
Purpose
async_scheduling don't support UniProcExecutor and ExecutorWithExternalLauncher, this PR aims to implement async_scheduling in single process mode.
i make a new thread to do the execute_model task, when copy sample_token_ids from device to host, it will block the new thread, so let the new thread notify main thread to make scheduling of next step, it can overlap schedule operations with the data copy operations.
Test Plan
test script refer to:https://github.com/vllm-project/vllm-ascend/blob/main/examples/offline_external_launcher.py
In A100, we use exeternal_launcher method to initialize LLM, make configurations as:
TP=2, max_num_seqs=512,max_model_len=4096,max_tokens=512,ignore_eos=True,prm800k_500.jsonl as dataset.
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.