[BugFix] Make PD work with Ray by kouroshHakha · Pull Request #21072 · vllm-project/vllm

kouroshHakha · 2025-07-16T18:35:55Z

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Purpose

Fixes #21070

Test Plan

Test Result

(Optional) Documentation Update

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

github-actions · 2025-07-16T18:36:03Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

gemini-code-assist

Code Review

This pull request introduces a fix to make distributed KV cache transfer work correctly with Ray. The changes in vllm/v1/executor/ray_distributed_executor.py add logic to aggregate finished_sending and finished_recving statuses from all workers, which is crucial for tracking request completion in a distributed setup. The implementation correctly handles both synchronous and asynchronous execution paths. The corresponding change in vllm/executor/ray_utils.py ensures this information is propagated to the model output. The debugging log statements should be removed from production code.

gemini-code-assist · 2025-07-16T18:37:09Z

vllm/executor/ray_utils.py

            output = self.worker.model_runner.execute_model(
                scheduler_output, intermediate_tensors)
+
+            logger.info(f"in the ray_utils.py ...")


This logger.info call is likely for debugging and should be removed before merging to production. Leaving it in will add noise to the logs.

gemini-code-assist · 2025-07-16T18:37:10Z

vllm/executor/ray_utils.py

+                        output = copy.copy(EMPTY_MODEL_RUNNER_OUTPUT)
+                    output.finished_sending = finished_sending
+                    output.finished_recving = finished_recving
+                logger.info(f"Have succesfully set finished_sending: {finished_sending}, finished_recving: {finished_recving}")


This logger.info call is likely for debugging and should be removed before merging to production. Leaving it in will add noise to the logs.

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

njhill

Thanks @kouroshHakha

njhill · 2025-07-16T21:04:00Z

vllm/v1/executor/ray_distributed_executor.py

+            # Block and get results from all workers
+            outputs = [ref.get() for ref in refs]
+            return self._aggregate_workers_output(outputs)
+        else:


redundant else

njhill · 2025-07-16T21:12:18Z

vllm/v1/executor/ray_distributed_executor.py

            return 2
        return self.parallel_config.pipeline_parallel_size

+    def _aggregate_workers_output(


Can we move these to utility functions (shared with multiproc executor), or maybe a mixin superclass also containing the _send_remaining_count and _recv_remaining_count fields?

njhill · 2025-07-16T21:13:50Z

vllm/v1/executor/ray_distributed_executor.py

+        if finished_sending:
+            output.finished_sending = finished_sending
+        if finished_recving:
+            output.finished_recving = finished_recving


Suggested change

if finished_sending:

output.finished_sending = finished_sending

if finished_recving:

output.finished_recving = finished_recving

if finished_sending:

output.finished_sending = finished_sending

if finished_recving:

output.finished_recving = finished_recving

See also #21048 which has a fix for this that we should get in.

njhill · 2025-07-16T21:16:00Z

vllm/v1/worker/gpu_model_runner.py

        hidden_states: torch.Tensor,
        num_scheduled_tokens: int,
        num_scheduled_tokens_np: np.ndarray,
+        finished_sending: Optional[set[str]],


Why are all of these changes to gpu_model_runner.py and gpu_worker.py needed? I don't think they affect what's being fixed by this PR and so would be best to revert (unless I'm missing something).

so ray distributed executor runs the runner directly (not gpu_worker), it's better for the model_runner to own the logic of filling in the finished_sending and finished_recving fields in the output. So I basically reverted the changes to gpu_model_runner introduced in #19555. In that PR, for some reason that I don't understand, the logic is taken out of the model_runner and is put into the worker.

This change was made to support PP.
Notice these lines in GPUModelRunner::execute_model:

if not get_pp_group().is_last_rank: # For mid-pipeline stages, return the hidden states. if not broadcast_pp_output: return hidden_states

so ray distributed executor runs the runner directly

@kouroshHakha could you point to where this is the case? From a quick look it appears that it's still a bit entangled with V0 logic but ultimately uses a RayWorkerWrapper which wraps a WorkerBase which should in this case resolve to a vllm.v1.worker.gpu_worker.Worker.

So here is the flow with Ray as the distributed backend in v1:

v1.RayDistributedExecutor.execute_model() --> RayWrapper.execute_model_ray() (or RayWrapper.execute_model_spmd()) --> worker.model_runner.execute_model()

Therefore the worker.execute_model() logic is skipped and it directly interacts with model_runner.execute_model().

code for execute_model_ray: https://github.com/vllm-project/vllm/blob/main/vllm/executor/ray_utils.py#L135

@orozery Is there a test / script that I should target to make sure the intended pp logic still works with my changes?

FWIW, there should be no regression for test_pipeline_parallel.py , not sure if there is a test for PP + P/D

@orozery I see what you are saying. Basically for PP, each pp rank (except the last one) will forward an empty output back to the scheduler. You want those empty outputs to have the correct finished/recved attributes. I added that logic back but leaving the main model_runner logic intact. Let's get this merged asap so that it can catch the 0.10.0 train since it's a massive regression in the ray behavior. We can follow up with better solutions after.

not sure if there is a test for PP + P/D

Based on my research there is none. I also couldn't get the master run PP + P/D so not sure if at any point that combo was supported and now it's regressed? Or was it not supported at all.

njhill · 2025-07-16T21:20:30Z

vllm/v1/executor/ray_distributed_executor.py

-        # When PP is not used, we block here until the result is available.
-        if self.max_concurrent_batches == 1:
-            return refs[0].get()
+        if not self.has_connector:


I think the changes below here in this file should be all that's needed in this PR (apart from moving those other two functions so that they can be shared).

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

kouroshHakha

cc @robertgshaw2-redhat @njhill I think it's ready for review now.

kouroshHakha · 2025-07-16T21:54:39Z

vllm/v1/worker/gpu_worker.py

            output = EMPTY_MODEL_RUNNER_OUTPUT

        assert isinstance(output, ModelRunnerOutput)
-        if has_kv_transfer_group():


The changes in gpu_model_runner.py and gpu_worker.py are revert of what was done in #19555

kouroshHakha · 2025-07-16T22:13:39Z

tests/v1/kv_connector/unit/test_output_aggreagator.py

@@ -0,0 +1,108 @@
+# SPDX-License-Identifier: Apache-2.0


Adding the tests and bug fix from #21048

That's now been merged to main so can rebase.

ruisearch42

ray executor part LGTM, haven't looked at the KV part in detail

vllm/distributed/kv_transfer/kv_connector/utils.py

vllm/v1/executor/ray_distributed_executor.py

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

mergify · 2025-07-17T05:32:57Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @kouroshHakha.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

njhill

thanks @kouroshHakha

njhill · 2025-07-17T07:17:24Z

tests/v1/kv_connector/unit/test_output_aggreagator.py

@@ -0,0 +1,108 @@
+# SPDX-License-Identifier: Apache-2.0


That's now been merged to main so can rebase.

njhill · 2025-07-17T07:30:12Z

vllm/v1/worker/gpu_model_runner.py

        hidden_states: torch.Tensor,
        num_scheduled_tokens: int,
        num_scheduled_tokens_np: np.ndarray,
+        finished_sending: Optional[set[str]],


so ray distributed executor runs the runner directly

@kouroshHakha could you point to where this is the case? From a quick look it appears that it's still a bit entangled with V0 logic but ultimately uses a RayWorkerWrapper which wraps a WorkerBase which should in this case resolve to a vllm.v1.worker.gpu_worker.Worker.

njhill · 2025-07-17T07:31:31Z

vllm/distributed/kv_transfer/kv_connector/utils.py

    return "NHD"
+
+
+class KVOutputAggregator:


This utility class LGTM

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

…ix-ray-pd Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

…ix-ray-pd Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

njhill

Thanks @kouroshHakha, looks ok to me as an immediate fix but there may be a couple of things we can look at streamlining after (like the mock import thing and some more consolidation of the executor/worker/runner structure when untangling some v0 parts).

kouroshHakha · 2025-07-19T16:24:09Z

@njhill Yes. 100%. I just discussed this with @ruisearch42. He will own the consolidation effort, basically we want to re-architect the ray implementation in a way that it inherits the changes to the max. extend possible + increase the test coverage in a systematic way. We should target those for the next release.

lk-chen · 2025-07-28T21:14:57Z

vllm/distributed/kv_transfer/kv_connector/utils.py

+        # set the aggregated finished_sending / finished_recving
+        # if output.finished_sending/recving is not empty, but the other ranks
+        # still have unfinished send/recv, we want to set the aggregated
+        # finished_sending/recving to None until all ranks have finished


Any reason why it's set to None instead of empty set ?

I think it's imposed by higher level logic. This part of the PR is inheriting existing logic on master at the time btw.

…nch (#2122) ### What this PR does / why we need it? We notice that vllm's main branch merged the PR vllm-project/vllm#21072 and vllm-project/vllm#21473 to support ray backend and fix some rebase bug from previous change. Those changes makes the disaggregate pd in vllm ascend breaks in some scenario. In this PR, we adopt those changes to make sure the `llmdatddist_c_mgr_connector` works fine on the newest vllm main branch. ### Does this PR introduce _any_ user-facing change? No user face change. ### How was this patch tested? relevant ut will be added to make sure the functionality of those changes. - vLLM version: v0.10.0 - vLLM main: vllm-project/vllm@ad57f23 --------- Signed-off-by: ganyi <pleaplusone.gy@gmail.com>

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com> Signed-off-by: x22x22 <wadeking@qq.com>

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com> Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>

…nch (vllm-project#2122) ### What this PR does / why we need it? We notice that vllm's main branch merged the PR vllm-project/vllm#21072 and vllm-project/vllm#21473 to support ray backend and fix some rebase bug from previous change. Those changes makes the disaggregate pd in vllm ascend breaks in some scenario. In this PR, we adopt those changes to make sure the `llmdatddist_c_mgr_connector` works fine on the newest vllm main branch. ### Does this PR introduce _any_ user-facing change? No user face change. ### How was this patch tested? relevant ut will be added to make sure the functionality of those changes. - vLLM version: v0.10.0 - vLLM main: vllm-project/vllm@ad57f23 --------- Signed-off-by: ganyi <pleaplusone.gy@gmail.com>

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com> Signed-off-by: Paul Pak <paulpak58@gmail.com>

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com> Signed-off-by: Diego-Castan <diego.castan@ibm.com>

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

…nch (vllm-project#2122) ### What this PR does / why we need it? We notice that vllm's main branch merged the PR vllm-project/vllm#21072 and vllm-project/vllm#21473 to support ray backend and fix some rebase bug from previous change. Those changes makes the disaggregate pd in vllm ascend breaks in some scenario. In this PR, we adopt those changes to make sure the `llmdatddist_c_mgr_connector` works fine on the newest vllm main branch. ### Does this PR introduce _any_ user-facing change? No user face change. ### How was this patch tested? relevant ut will be added to make sure the functionality of those changes. - vLLM version: v0.10.0 - vLLM main: vllm-project/vllm@ad57f23 --------- Signed-off-by: ganyi <pleaplusone.gy@gmail.com>

hidva · 2025-11-04T06:37:41Z

vllm/v1/worker/gpu_worker.py

-                output.finished_recving = finished_recving
-
-            # Clear KVConnector state for this step.
-            get_kv_transfer_group().clear_connector_metadata()


Why was clear_connector_metadata removed here?

…nch (vllm-project#2122) ### What this PR does / why we need it? We notice that vllm's main branch merged the PR vllm-project/vllm#21072 and vllm-project/vllm#21473 to support ray backend and fix some rebase bug from previous change. Those changes makes the disaggregate pd in vllm ascend breaks in some scenario. In this PR, we adopt those changes to make sure the `llmdatddist_c_mgr_connector` works fine on the newest vllm main branch. ### Does this PR introduce _any_ user-facing change? No user face change. ### How was this patch tested? relevant ut will be added to make sure the functionality of those changes. - vLLM version: v0.10.0 - vLLM main: vllm-project/vllm@ad57f23 --------- Signed-off-by: ganyi <pleaplusone.gy@gmail.com>

wip

951096e

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

mergify bot added the v1 label Jul 16, 2025

gemini-code-assist bot reviewed Jul 16, 2025

View reviewed changes

wip

b629b86

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

njhill reviewed Jul 16, 2025

View reviewed changes

kouroshHakha added 5 commits July 16, 2025 14:27

wip

c0f9c92

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

wip

80d861e

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

wip

1c63f8e

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

wip

913cd52

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

wip

9d4c583

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

kouroshHakha commented Jul 16, 2025

View reviewed changes

kouroshHakha marked this pull request as ready for review July 16, 2025 22:16

kouroshHakha requested review from WoosukKwon, alexm-redhat, comaniac, robertgshaw2-redhat and ywang96 as code owners July 16, 2025 22:16

kouroshHakha added the ready ONLY add when PR is ready to merge/full CI is needed label Jul 16, 2025

ruisearch42 reviewed Jul 17, 2025

View reviewed changes

vllm/distributed/kv_transfer/kv_connector/utils.py Show resolved Hide resolved

vllm/v1/executor/ray_distributed_executor.py Show resolved Hide resolved

Merge branch 'main' into kh/fix-ray-pd

ac43f24

simon-mo added this to the v0.10.0 milestone Jul 17, 2025

wip

2013ef6

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

mergify bot added the needs-rebase label Jul 17, 2025

njhill reviewed Jul 17, 2025

View reviewed changes

kouroshHakha added 4 commits July 17, 2025 09:36

wip

7e4bf72

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

Merge branch 'main' of https://github.com/vllm-project/vllm into kh/f…

c6c48c5

…ix-ray-pd Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

wip

5e97ce6

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

wip

f04be9f

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

mergify bot added the needs-rebase label Jul 19, 2025

Merge branch 'main' of https://github.com/vllm-project/vllm into kh/f…

ee04a92

…ix-ray-pd Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

mergify bot removed the needs-rebase label Jul 19, 2025

kouroshHakha requested a review from njhill July 19, 2025 01:16

njhill approved these changes Jul 19, 2025

View reviewed changes

vllm-bot merged commit 9f414a1 into vllm-project:main Jul 19, 2025
69 of 72 checks passed

maobaolong mentioned this pull request Jul 23, 2025

[V1] [P/D] Add Support for KV Load Failure Recovery #19330

Merged

njhill mentioned this pull request Jul 23, 2025

[BugFix] Fix KVConnector TP worker aggregation #21473

Merged

lk-chen reviewed Jul 28, 2025

View reviewed changes

ganyi1996ppo mentioned this pull request Jul 31, 2025

[Bugfix] Adopt the new changes on disaggregated pd from vllm main branch vllm-project/vllm-ascend#2122

Merged

x22x22 pushed a commit to x22x22/vllm that referenced this pull request Aug 5, 2025

[BugFix] Make PD work with Ray (vllm-project#21072)

b6ad5b2

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com> Signed-off-by: x22x22 <wadeking@qq.com>

Pradyun92 pushed a commit to Pradyun92/vllm that referenced this pull request Aug 6, 2025

[BugFix] Make PD work with Ray (vllm-project#21072)

785fd3f

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

npanpaliya pushed a commit to odh-on-pz/vllm-upstream that referenced this pull request Aug 6, 2025

[BugFix] Make PD work with Ray (vllm-project#21072)

55c8cf9

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

jinzhen-lin pushed a commit to jinzhen-lin/vllm that referenced this pull request Aug 9, 2025

[BugFix] Make PD work with Ray (vllm-project#21072)

a3a5345

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com> Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>

paulpak58 pushed a commit to paulpak58/vllm that referenced this pull request Aug 13, 2025

[BugFix] Make PD work with Ray (vllm-project#21072)

c290751

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com> Signed-off-by: Paul Pak <paulpak58@gmail.com>

diegocastanibm pushed a commit to diegocastanibm/vllm that referenced this pull request Aug 15, 2025

[BugFix] Make PD work with Ray (vllm-project#21072)

c432d8f

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com> Signed-off-by: Diego-Castan <diego.castan@ibm.com>

epwalsh pushed a commit to epwalsh/vllm that referenced this pull request Aug 27, 2025

[BugFix] Make PD work with Ray (vllm-project#21072)

5be0082

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

hidva reviewed Nov 4, 2025

View reviewed changes

mergify bot added the kv-connector label Nov 4, 2025

Uh oh!

Conversation

kouroshHakha commented Jul 16, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Essential Elements of an Effective PR Description Checklist

Purpose

Test Plan

Test Result

(Optional) Documentation Update

Uh oh!

github-actions bot commented Jul 16, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jul 16, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 16, 2025

Choose a reason for hiding this comment

Uh oh!

njhill left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kouroshHakha Jul 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kouroshHakha Jul 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kouroshHakha left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ruisearch42 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

mergify bot commented Jul 17, 2025

Uh oh!

njhill left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kouroshHakha commented Jul 16, 2025 •

edited by github-actions bot

Loading

kouroshHakha Jul 16, 2025 •

edited

Loading

kouroshHakha Jul 17, 2025 •

edited

Loading