Skip to content

[BugFix] Make PD work with Ray#21072

Merged
vllm-bot merged 16 commits intovllm-project:mainfrom
kouroshHakha:kh/fix-ray-pd
Jul 19, 2025
Merged

[BugFix] Make PD work with Ray#21072
vllm-bot merged 16 commits intovllm-project:mainfrom
kouroshHakha:kh/fix-ray-pd

Conversation

@kouroshHakha
Copy link
Collaborator

@kouroshHakha kouroshHakha commented Jul 16, 2025

Essential Elements of an Effective PR Description Checklist

  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Purpose

Fixes #21070

Test Plan

Test Result

(Optional) Documentation Update

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@mergify mergify bot added the v1 label Jul 16, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a fix to make distributed KV cache transfer work correctly with Ray. The changes in vllm/v1/executor/ray_distributed_executor.py add logic to aggregate finished_sending and finished_recving statuses from all workers, which is crucial for tracking request completion in a distributed setup. The implementation correctly handles both synchronous and asynchronous execution paths. The corresponding change in vllm/executor/ray_utils.py ensures this information is propagated to the model output. The debugging log statements should be removed from production code.

output = self.worker.model_runner.execute_model(
scheduler_output, intermediate_tensors)

logger.info(f"in the ray_utils.py ...")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This logger.info call is likely for debugging and should be removed before merging to production. Leaving it in will add noise to the logs.

output = copy.copy(EMPTY_MODEL_RUNNER_OUTPUT)
output.finished_sending = finished_sending
output.finished_recving = finished_recving
logger.info(f"Have succesfully set finished_sending: {finished_sending}, finished_recving: {finished_recving}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This logger.info call is likely for debugging and should be removed before merging to production. Leaving it in will add noise to the logs.

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Copy link
Member

@njhill njhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kouroshHakha

# Block and get results from all workers
outputs = [ref.get() for ref in refs]
return self._aggregate_workers_output(outputs)
else:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

redundant else

return 2
return self.parallel_config.pipeline_parallel_size

def _aggregate_workers_output(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move these to utility functions (shared with multiproc executor), or maybe a mixin superclass also containing the _send_remaining_count and _recv_remaining_count fields?

Comment on lines +87 to +90
if finished_sending:
output.finished_sending = finished_sending
if finished_recving:
output.finished_recving = finished_recving
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if finished_sending:
output.finished_sending = finished_sending
if finished_recving:
output.finished_recving = finished_recving
if finished_sending:
output.finished_sending = finished_sending
if finished_recving:
output.finished_recving = finished_recving

See also #21048 which has a fix for this that we should get in.

hidden_states: torch.Tensor,
num_scheduled_tokens: int,
num_scheduled_tokens_np: np.ndarray,
finished_sending: Optional[set[str]],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are all of these changes to gpu_model_runner.py and gpu_worker.py needed? I don't think they affect what's being fixed by this PR and so would be best to revert (unless I'm missing something).

Copy link
Collaborator Author

@kouroshHakha kouroshHakha Jul 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so ray distributed executor runs the runner directly (not gpu_worker), it's better for the model_runner to own the logic of filling in the finished_sending and finished_recving fields in the output. So I basically reverted the changes to gpu_model_runner introduced in #19555. In that PR, for some reason that I don't understand, the logic is taken out of the model_runner and is put into the worker.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change was made to support PP.
Notice these lines in GPUModelRunner::execute_model:

        if not get_pp_group().is_last_rank:
            # For mid-pipeline stages, return the hidden states.
            if not broadcast_pp_output:
                return hidden_states

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so ray distributed executor runs the runner directly

@kouroshHakha could you point to where this is the case? From a quick look it appears that it's still a bit entangled with V0 logic but ultimately uses a RayWorkerWrapper which wraps a WorkerBase which should in this case resolve to a vllm.v1.worker.gpu_worker.Worker.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So here is the flow with Ray as the distributed backend in v1:

v1.RayDistributedExecutor.execute_model() --> RayWrapper.execute_model_ray() (or RayWrapper.execute_model_spmd()) --> worker.model_runner.execute_model()

Therefore the worker.execute_model() logic is skipped and it directly interacts with model_runner.execute_model().

Copy link
Collaborator Author

@kouroshHakha kouroshHakha Jul 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@orozery Is there a test / script that I should target to make sure the intended pp logic still works with my changes?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, there should be no regression for test_pipeline_parallel.py , not sure if there is a test for PP + P/D

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@orozery I see what you are saying. Basically for PP, each pp rank (except the last one) will forward an empty output back to the scheduler. You want those empty outputs to have the correct finished/recved attributes. I added that logic back but leaving the main model_runner logic intact. Let's get this merged asap so that it can catch the 0.10.0 train since it's a massive regression in the ray behavior. We can follow up with better solutions after.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if there is a test for PP + P/D

Based on my research there is none. I also couldn't get the master run PP + P/D so not sure if at any point that combo was supported and now it's regressed? Or was it not supported at all.

# When PP is not used, we block here until the result is available.
if self.max_concurrent_batches == 1:
return refs[0].get()
if not self.has_connector:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the changes below here in this file should be all that's needed in this PR (apart from moving those other two functions so that they can be shared).

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Copy link
Collaborator Author

@kouroshHakha kouroshHakha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @robertgshaw2-redhat @njhill I think it's ready for review now.

output = EMPTY_MODEL_RUNNER_OUTPUT

assert isinstance(output, ModelRunnerOutput)
if has_kv_transfer_group():
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes in gpu_model_runner.py and gpu_worker.py are revert of what was done in #19555

@@ -0,0 +1,108 @@
# SPDX-License-Identifier: Apache-2.0
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding the tests and bug fix from #21048

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's now been merged to main so can rebase.

@kouroshHakha kouroshHakha marked this pull request as ready for review July 16, 2025 22:16
@kouroshHakha kouroshHakha added the ready ONLY add when PR is ready to merge/full CI is needed label Jul 16, 2025
Copy link
Collaborator

@ruisearch42 ruisearch42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ray executor part LGTM, haven't looked at the KV part in detail

@simon-mo simon-mo added this to the v0.10.0 milestone Jul 17, 2025
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
@mergify
Copy link

mergify bot commented Jul 17, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @kouroshHakha.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Jul 17, 2025
Copy link
Member

@njhill njhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @kouroshHakha

@@ -0,0 +1,108 @@
# SPDX-License-Identifier: Apache-2.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's now been merged to main so can rebase.

hidden_states: torch.Tensor,
num_scheduled_tokens: int,
num_scheduled_tokens_np: np.ndarray,
finished_sending: Optional[set[str]],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so ray distributed executor runs the runner directly

@kouroshHakha could you point to where this is the case? From a quick look it appears that it's still a bit entangled with V0 logic but ultimately uses a RayWorkerWrapper which wraps a WorkerBase which should in this case resolve to a vllm.v1.worker.gpu_worker.Worker.

return "NHD"


class KVOutputAggregator:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This utility class LGTM

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
…ix-ray-pd

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
@mergify mergify bot added the needs-rebase label Jul 19, 2025
…ix-ray-pd

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
@mergify mergify bot removed the needs-rebase label Jul 19, 2025
@kouroshHakha kouroshHakha requested a review from njhill July 19, 2025 01:16
Copy link
Member

@njhill njhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kouroshHakha, looks ok to me as an immediate fix but there may be a couple of things we can look at streamlining after (like the mock import thing and some more consolidation of the executor/worker/runner structure when untangling some v0 parts).

@vllm-bot vllm-bot merged commit 9f414a1 into vllm-project:main Jul 19, 2025
69 of 72 checks passed
@kouroshHakha
Copy link
Collaborator Author

@njhill Yes. 100%. I just discussed this with @ruisearch42. He will own the consolidation effort, basically we want to re-architect the ray implementation in a way that it inherits the changes to the max. extend possible + increase the test coverage in a systematic way. We should target those for the next release.

# set the aggregated finished_sending / finished_recving
# if output.finished_sending/recving is not empty, but the other ranks
# still have unfinished send/recv, we want to set the aggregated
# finished_sending/recving to None until all ranks have finished
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason why it's set to None instead of empty set ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's imposed by higher level logic. This part of the PR is inheriting existing logic on master at the time btw.

wangxiyuan pushed a commit to vllm-project/vllm-ascend that referenced this pull request Aug 4, 2025
…nch (#2122)

### What this PR does / why we need it?
We notice that vllm's main branch merged the PR
vllm-project/vllm#21072 and
vllm-project/vllm#21473 to support ray backend
and fix some rebase bug from previous change. Those changes makes the
disaggregate pd in vllm ascend breaks in some scenario.

In this PR, we adopt those changes to make sure the
`llmdatddist_c_mgr_connector` works fine on the newest vllm main branch.

### Does this PR introduce _any_ user-facing change?

No user face change.

### How was this patch tested?
relevant ut will be added to make sure the functionality of those
changes.

- vLLM version: v0.10.0
- vLLM main:
vllm-project/vllm@ad57f23

---------

Signed-off-by: ganyi <pleaplusone.gy@gmail.com>
x22x22 pushed a commit to x22x22/vllm that referenced this pull request Aug 5, 2025
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: x22x22 <wadeking@qq.com>
Pradyun92 pushed a commit to Pradyun92/vllm that referenced this pull request Aug 6, 2025
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
npanpaliya pushed a commit to odh-on-pz/vllm-upstream that referenced this pull request Aug 6, 2025
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
jinzhen-lin pushed a commit to jinzhen-lin/vllm that referenced this pull request Aug 9, 2025
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
zzhx1 pushed a commit to lidenghui1110/vllm-ascend that referenced this pull request Aug 11, 2025
…nch (vllm-project#2122)

### What this PR does / why we need it?
We notice that vllm's main branch merged the PR
vllm-project/vllm#21072 and
vllm-project/vllm#21473 to support ray backend
and fix some rebase bug from previous change. Those changes makes the
disaggregate pd in vllm ascend breaks in some scenario.

In this PR, we adopt those changes to make sure the
`llmdatddist_c_mgr_connector` works fine on the newest vllm main branch.

### Does this PR introduce _any_ user-facing change?

No user face change.

### How was this patch tested?
relevant ut will be added to make sure the functionality of those
changes.

- vLLM version: v0.10.0
- vLLM main:
vllm-project/vllm@ad57f23

---------

Signed-off-by: ganyi <pleaplusone.gy@gmail.com>
zzhx1 pushed a commit to lidenghui1110/vllm-ascend that referenced this pull request Aug 11, 2025
…nch (vllm-project#2122)

### What this PR does / why we need it?
We notice that vllm's main branch merged the PR
vllm-project/vllm#21072 and
vllm-project/vllm#21473 to support ray backend
and fix some rebase bug from previous change. Those changes makes the
disaggregate pd in vllm ascend breaks in some scenario.

In this PR, we adopt those changes to make sure the
`llmdatddist_c_mgr_connector` works fine on the newest vllm main branch.

### Does this PR introduce _any_ user-facing change?

No user face change.

### How was this patch tested?
relevant ut will be added to make sure the functionality of those
changes.

- vLLM version: v0.10.0
- vLLM main:
vllm-project/vllm@ad57f23

---------

Signed-off-by: ganyi <pleaplusone.gy@gmail.com>
paulpak58 pushed a commit to paulpak58/vllm that referenced this pull request Aug 13, 2025
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Paul Pak <paulpak58@gmail.com>
diegocastanibm pushed a commit to diegocastanibm/vllm that referenced this pull request Aug 15, 2025
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Diego-Castan <diego.castan@ibm.com>
epwalsh pushed a commit to epwalsh/vllm that referenced this pull request Aug 27, 2025
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
chopper0126 pushed a commit to chopper0126/vllm-ascend that referenced this pull request Sep 26, 2025
…nch (vllm-project#2122)

### What this PR does / why we need it?
We notice that vllm's main branch merged the PR
vllm-project/vllm#21072 and
vllm-project/vllm#21473 to support ray backend
and fix some rebase bug from previous change. Those changes makes the
disaggregate pd in vllm ascend breaks in some scenario.

In this PR, we adopt those changes to make sure the
`llmdatddist_c_mgr_connector` works fine on the newest vllm main branch.

### Does this PR introduce _any_ user-facing change?

No user face change.

### How was this patch tested?
relevant ut will be added to make sure the functionality of those
changes.

- vLLM version: v0.10.0
- vLLM main:
vllm-project/vllm@ad57f23

---------

Signed-off-by: ganyi <pleaplusone.gy@gmail.com>
Angazenn pushed a commit to Angazenn/vllm-ascend that referenced this pull request Oct 21, 2025
…nch (vllm-project#2122)

### What this PR does / why we need it?
We notice that vllm's main branch merged the PR
vllm-project/vllm#21072 and
vllm-project/vllm#21473 to support ray backend
and fix some rebase bug from previous change. Those changes makes the
disaggregate pd in vllm ascend breaks in some scenario.

In this PR, we adopt those changes to make sure the
`llmdatddist_c_mgr_connector` works fine on the newest vllm main branch.

### Does this PR introduce _any_ user-facing change?

No user face change.

### How was this patch tested?
relevant ut will be added to make sure the functionality of those
changes.

- vLLM version: v0.10.0
- vLLM main:
vllm-project/vllm@ad57f23

---------

Signed-off-by: ganyi <pleaplusone.gy@gmail.com>
output.finished_recving = finished_recving

# Clear KVConnector state for this step.
get_kv_transfer_group().clear_connector_metadata()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was clear_connector_metadata removed here?

@mergify mergify bot added the kv-connector label Nov 4, 2025
Clorist33 pushed a commit to Clorist33/vllm-ascend that referenced this pull request Dec 9, 2025
…nch (vllm-project#2122)

### What this PR does / why we need it?
We notice that vllm's main branch merged the PR
vllm-project/vllm#21072 and
vllm-project/vllm#21473 to support ray backend
and fix some rebase bug from previous change. Those changes makes the
disaggregate pd in vllm ascend breaks in some scenario.

In this PR, we adopt those changes to make sure the
`llmdatddist_c_mgr_connector` works fine on the newest vllm main branch.

### Does this PR introduce _any_ user-facing change?

No user face change.

### How was this patch tested?
relevant ut will be added to make sure the functionality of those
changes.

- vLLM version: v0.10.0
- vLLM main:
vllm-project/vllm@ad57f23

---------

Signed-off-by: ganyi <pleaplusone.gy@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kv-connector ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: PD does not work with ray distributed backend

8 participants