[Hardware][TPU] Add supports_async_scheduling() method to Executor interface so that it can be extended for Executor implementations.#36924
Conversation
…can be extended for different Platforms Signed-off-by: Guangxiang Du <gxd@google.com>
There was a problem hiding this comment.
Code Review
This pull request introduces a new executors_supports_async_scheduling() method to the Platform interface, which allows different platforms to specify which distributed executor backends are compatible with asynchronous scheduling. This change replaces a hardcoded list of executors in VllmConfig with a call to this new method, making the system more extensible for platforms like TPU. The default implementation preserves the existing behavior, and a unit test has been added to verify this. The changes are well-structured and correctly implemented.
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run You ask your reviewers to trigger select CI tests on top of Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add If you have any questions, please reach out to us on Slack at https://slack.vllm.ai. 🚀 |
|
I understand your requirement and what you want, I just feel a little strange. |
|
Thank you, Kunshang for the quick review! |
…ew comment suggested Signed-off-by: Guangxiang Du <gxd@google.com>
29a415c to
2e1ca56
Compare
vllm/config/vllm.py
Outdated
| "uni", | ||
| "external_launcher", | ||
| ) | ||
| executor_supports_async_sched = Executor.get_class( |
There was a problem hiding this comment.
I am not certain whether we should do this here.
cc @njhill @LucasWilkinson PTAL
There was a problem hiding this comment.
Yeah it is kind of circular but I'm not sure what the alternative is.
|
Hi @mgoin @robertgshaw2-redhat! Could you take a look? This is blocking TPU multi-host perf optimization. |
vllm/config/vllm.py
Outdated
| "uni", | ||
| "external_launcher", | ||
| ) | ||
| executor_supports_async_sched = Executor.get_class( |
There was a problem hiding this comment.
Yeah it is kind of circular but I'm not sure what the alternative is.
Signed-off-by: Guangxiang Du <gxd@google.com>
Thanks Nick for the review! |
|
I add run full ci label, in case any corner case break. |
…terface so that it can be extended for Executor implementations. (vllm-project#36924) Signed-off-by: Guangxiang Du <gxd@google.com>
…terface so that it can be extended for Executor implementations. (vllm-project#36924) Signed-off-by: Guangxiang Du <gxd@google.com>
…terface so that it can be extended for Executor implementations. (vllm-project#36924) Signed-off-by: Guangxiang Du <gxd@google.com>
…terface so that it can be extended for Executor implementations. (vllm-project#36924) Signed-off-by: Guangxiang Du <gxd@google.com>
…terface so that it can be extended for Executor implementations. (vllm-project#36924) Signed-off-by: Guangxiang Du <gxd@google.com> Signed-off-by: Monishver Chandrasekaran <monishverchandrasekaran@gmail.com>
…terface so that it can be extended for Executor implementations. (vllm-project#36924) Signed-off-by: Guangxiang Du <gxd@google.com>
…terface so that it can be extended for Executor implementations. (vllm-project#36924) Signed-off-by: Guangxiang Du <gxd@google.com> Signed-off-by: Vinay Damodaran <vrdn@hey.com>
…terface so that it can be extended for Executor implementations. (vllm-project#36924) Signed-off-by: Guangxiang Du <gxd@google.com> Signed-off-by: EricccYang <yangyang4991@gmail.com>
[Hardware][TPU] Add supports_async_scheduling() method to Executor interface so that it can be extended for Executor implementations
Purpose
TPU-inference has a custom Executor class that we want to make compatible with async scheduling.
In TPU-inference's custom Executor implementation, we plan to override
supports_async_scheduling()to True.Test Plan
This PR should have no behavior change.
Added a unit test.
Test Result
unit test passed.