[Feature] Support Pipeline Parallism in torchrun SPMD offline inference for V1 by luccafong · Pull Request #17827 · vllm-project/vllm

luccafong · 2025-05-08T04:20:03Z

This PR add support for PP on torchrun offline inference, note this now does not support overlapping mircobatches so not the most efficient way to ublock PP use cases, will improve as follow ups.

torchrun --nnodes 2 --nproc-per-node 2 --rdzv-id=random12345 --rdzv-backend=c10d --rdzv-endpoint=<host:port>  examples/offline_inference/torchrun_example.py

output

--------------------------------------------------
Prompt: 'Hello, my name is'
Generated text: ' Hilary and I have been a full time Licensed Massage Therapist since 199'

--------------------------------------------------
Prompt: 'The president of the United States is'
Generated text: ' a leader of the free world. His actions and statements have a direct impact on'

--------------------------------------------------
Prompt: 'The capital of France is'
Generated text: ' the largest city in the country and a major European metropolis. It is also'

--------------------------------------------------
Prompt: 'The future of AI is'
Generated text: ' not as promising as you might think\nWith a preponderance of artificial intelligence'

github-actions · 2025-05-08T04:20:10Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

vllm/config.py

comaniac

Overall LGTM. The main question is not sure why we need pipeline_parallel_broadcast_output and cannot make it always true.

vllm/config.py

vllm/engine/arg_utils.py

youkaichao

for efficient pp, we would want different stages of pp from different batches run concurrently. would this broadcast make them serialized? like there will be only one batch and one stage running for the full engine.

houseroad · 2025-05-12T05:46:33Z

vllm/v1/worker/gpu_model_runner.py

should we return earlier if it's not first rank?

still need it for complete sync in current solution

mergify · 2025-05-12T16:48:07Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @luccafong.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Lucia Fang <fanglu@fb.com>

luccafong · 2025-05-12T17:30:31Z

for efficient pp, we would want different stages of pp from different batches run concurrently. would this broadcast make them serialized? like there will be only one batch and one stage running for the full engine.

thanks @youkaichao, right now running torch.run in a pure synced way to align with SPMD, so this is not the ideal solution for efficient PP, only one batch and one stage. I will think about how to design and improve it so we have multi-batches can overlap while compatible with torch.run as a follow up.

houseroad

As discussed with @luccafong , landing this first to unblock some use case.

Will enhance the perf in the following PR.

Signed-off-by: Lucia Fang <fanglu@fb.com>

ruisearch42

LG with a couple comments

ruisearch42 · 2025-05-12T23:25:01Z

vllm/v1/worker/gpu_model_runner.py

+        last_rank_in_group = pp_group_ranks - 1
+        if self.parallel_config.distributed_executor_backend \
+            == "external_launcher" and len(get_pp_group().ranks) > 0:
+            model_output_broadcast_data = get_pp_group().broadcast_tensor_dict(


Could you explain why we need this broadcast?

Yes, added as comments, now we enable by sync all ranks, will improve to reduce pp bubles in following PR.

vllm/engine/arg_utils.py

Signed-off-by: Lucia Fang <fanglu@fb.com>

mergify · 2025-05-14T03:04:12Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @luccafong.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Lucia Fang <fanglu@fb.com>

luccafong · 2025-05-15T16:23:26Z

CI build failure not related, also pulled latest change, @houseroad

houseroad · 2025-05-15T17:57:39Z

Actually v1 test failure may be related: E AssertionError: pipeline model parallel group is not initialized

cc: @luccafong

luccafong · 2025-05-15T19:23:13Z

Actually v1 test failure may be related: E AssertionError: pipeline model parallel group is not initialized

cc: @luccafong

this is a trunk failure

Added a fix here:
#18223

…ce for V1 (vllm-project#17827) Signed-off-by: Lucia Fang <fanglu@fb.com> Signed-off-by: Yuqi Zhang <yuqizhang@google.com>

mergify bot added documentation Improvements or additions to documentation v1 labels May 8, 2025

luccafong marked this pull request as ready for review May 8, 2025 16:29

luccafong requested review from WoosukKwon, alexm-redhat, comaniac, njhill, robertgshaw2-redhat and ywang96 as code owners May 8, 2025 16:29

luccafong changed the title ~~Support Pipeline Parallism in torchrun SPMD offline inference for V1~~ [Feature] Support Pipeline Parallism in torchrun SPMD offline inference for V1 May 8, 2025

luccafong requested a review from youkaichao May 9, 2025 17:19

houseroad reviewed May 9, 2025

View reviewed changes

vllm/config.py Outdated Show resolved Hide resolved

comaniac reviewed May 9, 2025

View reviewed changes

vllm/config.py Outdated Show resolved Hide resolved

vllm/engine/arg_utils.py Outdated Show resolved Hide resolved

youkaichao reviewed May 10, 2025

View reviewed changes

houseroad reviewed May 12, 2025

View reviewed changes

mergify bot added the needs-rebase label May 12, 2025

luccafong added 4 commits May 12, 2025 10:02

Support Pipeline Parallism in torchrun SPMD offline inference

d7d6071

Signed-off-by: Lucia Fang <fanglu@fb.com>

format

e4d6671

Signed-off-by: Lucia Fang <fanglu@fb.com>

remove config pipeline_parallel_broadcast_output

1c95626

Signed-off-by: Lucia Fang <fanglu@fb.com>

fix backend type

157c747

Signed-off-by: Lucia Fang <fanglu@fb.com>

luccafong force-pushed the torch_run_pp branch from 8ab862d to 157c747 Compare May 12, 2025 17:19

mergify bot removed the needs-rebase label May 12, 2025

houseroad added the ready ONLY add when PR is ready to merge/full CI is needed label May 12, 2025

luccafong mentioned this pull request May 12, 2025

[Feature]: Support torchrun PP in an efficient way #18019

Closed

1 task

houseroad approved these changes May 12, 2025

View reviewed changes

luccafong added 2 commits May 12, 2025 15:23

fix ray path

0a4da1d

Signed-off-by: Lucia Fang <fanglu@fb.com>

fix assertion

dca6e00

Signed-off-by: Lucia Fang <fanglu@fb.com>

ruisearch42 reviewed May 12, 2025

View reviewed changes

add test case for PP torchrun

f4e29f3

Signed-off-by: Lucia Fang <fanglu@fb.com>

mergify bot added the ci/build label May 13, 2025

luccafong added 2 commits May 13, 2025 10:42

Merge remote-tracking branch 'origin/main' into torch_run_pp

9472df9

fix test example

352404e

Signed-off-by: Lucia Fang <fanglu@fb.com>

mergify bot added the needs-rebase label May 14, 2025

Merge remote-tracking branch 'origin/main' into torch_run_pp

0546258

Signed-off-by: Lucia Fang <fanglu@fb.com>

mergify bot removed the needs-rebase label May 15, 2025

DarkLight1337 added this to the v0.9.0 milestone May 15, 2025

houseroad enabled auto-merge (squash) May 15, 2025 22:59

simon-mo disabled auto-merge May 16, 2025 05:28

simon-mo merged commit 3d2779c into vllm-project:main May 16, 2025
86 of 90 checks passed

zzzyq pushed a commit to zzzyq/vllm that referenced this pull request May 24, 2025

[Feature] Support Pipeline Parallism in torchrun SPMD offline inferen…

6fd8c7b

…ce for V1 (vllm-project#17827) Signed-off-by: Lucia Fang <fanglu@fb.com> Signed-off-by: Yuqi Zhang <yuqizhang@google.com>

Uh oh!

Conversation

luccafong commented May 8, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented May 8, 2025

Uh oh!

Uh oh!

comaniac left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

youkaichao left a comment

Choose a reason for hiding this comment

Uh oh!

houseroad May 12, 2025

Choose a reason for hiding this comment

Uh oh!

luccafong May 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mergify bot commented May 12, 2025

Uh oh!

luccafong commented May 12, 2025

Uh oh!

houseroad left a comment

Choose a reason for hiding this comment

Uh oh!

ruisearch42 left a comment

Choose a reason for hiding this comment

Uh oh!

ruisearch42 May 12, 2025

Choose a reason for hiding this comment

Uh oh!

luccafong May 13, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mergify bot commented May 14, 2025

Uh oh!

luccafong commented May 15, 2025

Uh oh!

houseroad commented May 15, 2025

Uh oh!

luccafong commented May 15, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

luccafong commented May 8, 2025 •

edited by github-actions bot

Loading

luccafong May 12, 2025 •

edited

Loading