[TPU] Use Ray for default distributed backend #8389

WoosukKwon · 2024-09-12T01:12:28Z

No description provided.

github-actions · 2024-09-12T01:13:28Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

njhill · 2024-09-12T01:14:49Z

@WoosukKwon I'm curious of the reason not to use multiprocessing distributed backend for this?

WoosukKwon · 2024-09-12T01:20:22Z

@njhill Good question. Actually, the MP backend would also work for TPUs. However, I think users such as GKE prefer Ray, because 1) they are interested in multi-host inference (which TPU is quite good at), and 2) they are already familiar with Ray.

vllm/config.py

youkaichao · 2024-09-12T01:26:58Z

vllm/config.py

@@ -869,6 +869,13 @@ def __init__(
                                 f"distributed executor backend "
                                 f"'{self.distributed_executor_backend}'.")

+        if current_platform.is_tpu() and self.world_size > 1:
+            if self.distributed_executor_backend is None:
+                self.distributed_executor_backend = "ray"


I mean, I think this line should be enough to change the default backend to ray in tpu case.

Oh the error is for those who use distributed_executor_backend="mp".

why do we need to error if users explicitly specify the mp backend?

The MP backend is not supported for TPUs at the moment. Without this line, the user will get the error:

"/vllm/engine/llm_engine.py", line 505, in _get_executor_cls assert distributed_executor_backend is None

I'm confused by Actually, the MP backend would also work for TPUs

so MP backend for TPU is actually not implemented yet.

Yes. We don't have a executor for tpu + mp.

cc @njhill if you have any misunderstanding. it is because we currently only have ray backend supported in tpu.

Signed-off-by: Alvant <[email protected]>

Signed-off-by: LeiWang1999 <[email protected]>

[TPU] Use Ray for default distributed backend

1b31984

WoosukKwon added the tpu Related to Google TPUs label Sep 12, 2024

Minor

999d6df

youkaichao reviewed Sep 12, 2024

View reviewed changes

vllm/config.py Show resolved Hide resolved

youkaichao reviewed Sep 12, 2024

View reviewed changes

youkaichao approved these changes Sep 12, 2024

View reviewed changes

WoosukKwon merged commit b71c956 into main Sep 12, 2024
28 of 29 checks passed

WoosukKwon deleted the tpu-ray branch September 12, 2024 03:31

Alvant pushed a commit to compressa-ai/vllm that referenced this pull request Oct 26, 2024

[TPU] Use Ray for default distributed backend (vllm-project#8389)

b3b2136

Signed-off-by: Alvant <[email protected]>

LeiWang1999 pushed a commit to LeiWang1999/vllm-bitblas that referenced this pull request Mar 26, 2025

[TPU] Use Ray for default distributed backend (vllm-project#8389)

d22ee5c

Signed-off-by: LeiWang1999 <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TPU] Use Ray for default distributed backend #8389

[TPU] Use Ray for default distributed backend #8389

WoosukKwon commented Sep 12, 2024

github-actions bot commented Sep 12, 2024

njhill commented Sep 12, 2024

WoosukKwon commented Sep 12, 2024

youkaichao Sep 12, 2024

WoosukKwon Sep 12, 2024 •

edited

Loading

youkaichao Sep 12, 2024

WoosukKwon Sep 12, 2024

youkaichao Sep 12, 2024

WoosukKwon Sep 12, 2024

youkaichao Sep 12, 2024

[TPU] Use Ray for default distributed backend #8389

[TPU] Use Ray for default distributed backend #8389

Conversation

WoosukKwon commented Sep 12, 2024

github-actions bot commented Sep 12, 2024

njhill commented Sep 12, 2024

WoosukKwon commented Sep 12, 2024

youkaichao Sep 12, 2024

Choose a reason for hiding this comment

WoosukKwon Sep 12, 2024 • edited Loading

Choose a reason for hiding this comment

youkaichao Sep 12, 2024

Choose a reason for hiding this comment

WoosukKwon Sep 12, 2024

Choose a reason for hiding this comment

youkaichao Sep 12, 2024

Choose a reason for hiding this comment

WoosukKwon Sep 12, 2024

Choose a reason for hiding this comment

youkaichao Sep 12, 2024

Choose a reason for hiding this comment

WoosukKwon Sep 12, 2024 •

edited

Loading