[Feat][Executor] Introduce RayExecutorV2 by jeffreywang-anyscale · Pull Request #36836 · vllm-project/vllm

jeffreywang-anyscale · 2026-03-12T01:40:50Z

Purpose

Implement RayExecutorV2, a new Ray-based distributed executor that uses MessageQueue (shared memory + TCP fallback) for the control plane instead of Ray compiled graphs. It reuses MultiprocExecutor's MQ-based RPC and NCCL data plane while spawning workers as Ray actors into placement group bundles.
Workers on the same node as the driver communicate via shared memory; cross-node workers automatically fall back to ZMQ TCP transport. Bundle assignments are sorted driver-node-first to ensure rank 0 is co-located with the executor.
Add VLLM_USE_RAY_V2_EXECUTOR_BACKEND env var feature flag (default off) to opt into the new executor when distributed_executor_backend="ray". Enable async scheduling support for the new backend.

For more details, please refer to RFC: #35848.

EEP support is out-of-scope for this PR and is tracked here: #38164.

Test Plan

Unit tests

pytest tests/distributed/test_ray_v2_executor.py: executor init, TP/PP combos, placement groups, RPC, worker death, shutdown
pytest tests/utils_/test_ray_utils.py: bundle sorting logic
Validate cross-node TCP path for MessageQueue with test_mq_tcp_multinode.py

Integration tests

pytest tests/distributed/test_ray_v2_executor_e2e.py: Creates Ray actors which initialize AsyncLLMEngine internally and verify that they can serve requests.
pytest tests/distributed/test_pipeline_parallel.py -k "ray": PP correctness with the new backend
pytest tests/basic_correctness/test_basic_correctness.py -k "ray": basic correctness

Test Result

Benchmark results (Qwen/Qwen3-8B on L4)

Server:

# MP backend
vllm serve Qwen/Qwen3-8B --tensor-parallel-size 4 --distributed-executor-backend mp --port 8000

# Existing Ray backend
VLLM_USE_RAY_V2_EXECUTOR_BACKEND=0 vllm serve Qwen/Qwen3-8B --tensor-parallel-size 4 --distributed-executor-backend ray --port 8000

# Ray V2 backend
VLLM_USE_RAY_V2_EXECUTOR_BACKEND=1 vllm serve Qwen/Qwen3-8B --tensor-parallel-size 4 --distributed-executor-backend ray --port 8000

Client

vllm bench serve --model Qwen/Qwen3-8B --dataset-name random --input-len 512 --output-len 128 --num-prompts 500 --request-rate 10 --port 8000

TP=4; MP backend (async scheduling is on by default)

============ Serving Benchmark Result ============
Successful requests:                     500       
Failed requests:                         0         
Request rate configured (RPS):           10.00     
Benchmark duration (s):                  53.64     
Total input tokens:                      256000    
Total generated tokens:                  64000     
Request throughput (req/s):              9.32      
Output token throughput (tok/s):         1193.20   
Peak output token throughput (tok/s):    1475.00   
Peak concurrent requests:                82.00     
Total token throughput (tok/s):          5965.99   
---------------Time to First Token----------------
Mean TTFT (ms):                          117.12    
Median TTFT (ms):                        117.26    
P99 TTFT (ms):                           156.28    
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          40.95     
Median TPOT (ms):                        41.81     
P99 TPOT (ms):                           46.68     
---------------Inter-token Latency----------------
Mean ITL (ms):                           40.95     
Median ITL (ms):                         40.80     
P99 ITL (ms):                            54.51

TP=4; Ray backend

============ Serving Benchmark Result ============
Successful requests:                     500       
Failed requests:                         0         
Request rate configured (RPS):           10.00     
Benchmark duration (s):                  53.93     
Total input tokens:                      256000    
Total generated tokens:                  64000     
Request throughput (req/s):              9.27      
Output token throughput (tok/s):         1186.80   
Peak output token throughput (tok/s):    1464.00   
Peak concurrent requests:                84.00     
Total token throughput (tok/s):          5934.02   
---------------Time to First Token----------------
Mean TTFT (ms):                          86.00     
Median TTFT (ms):                        86.32     
P99 TTFT (ms):                           120.62    
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          45.88     
Median TPOT (ms):                        47.14     
P99 TPOT (ms):                           51.94     
---------------Inter-token Latency----------------
Mean ITL (ms):                           45.88     
Median ITL (ms):                         47.21     
P99 ITL (ms):                            58.59

TP=4; Ray V2 backend w/ async scheduling

============ Serving Benchmark Result ============
Successful requests:                     500       
Failed requests:                         0         
Request rate configured (RPS):           10.00     
Benchmark duration (s):                  53.67     
Total input tokens:                      256000    
Total generated tokens:                  64000     
Request throughput (req/s):              9.32      
Output token throughput (tok/s):         1192.53   
Peak output token throughput (tok/s):    1442.00   
Peak concurrent requests:                82.00     
Total token throughput (tok/s):          5962.65   
---------------Time to First Token----------------
Mean TTFT (ms):                          119.11    
Median TTFT (ms):                        120.43    
P99 TTFT (ms):                           154.20    
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          41.11     
Median TPOT (ms):                        42.06     
P99 TPOT (ms):                           46.64     
---------------Inter-token Latency----------------
Mean ITL (ms):                           41.11     
Median ITL (ms):                         40.82     
P99 ITL (ms):                            54.10

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

mergify · 2026-03-12T20:54:42Z

Hi @jeffreywang-anyscale, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?

mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

jeffreywang-anyscale · 2026-03-12T20:55:19Z

@njhill FYI this PR is not ready for review yet as I'm iterating on the CI. Will let you know once it's in a good shape for review!

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

mergify · 2026-03-16T20:36:04Z

Hi @jeffreywang-anyscale, the pre-commit checks have failed. Please run:

uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?

mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

mergify · 2026-03-17T07:06:39Z

Hi @jeffreywang-anyscale, the pre-commit checks have failed. Please run:

uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?

mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

kouroshHakha

ok beautiful. some broad comments after the first pass.

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

jeffreywang-anyscale · 2026-03-31T05:08:16Z

Ray LLM release tests and premerge tests both pass with the latest non-rebase commit ad8f6d0.

kouroshHakha

Looks good overall — the round-1/round-2 feedback has been well addressed. The two-phase worker init and MQ transport selection are clean. A few remaining items below, mostly minor.

Note

This review was co-written with AI assistance (Claude Code).

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

kouroshHakha

Great. LGTM

njhill

LGTM, thanks @jeffreywang-anyscale @kouroshHakha

Disclaimer: I mainly reviewed the integration surfaces and changes to common code. I didn't review the ray executor v2 and ray utils impl/changes in detail but @kouroshHakha has already reviewed those

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com> Signed-off-by: rishitdholakia13 <rishit+github@cohere.com>

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com> Signed-off-by: Rishi Puri <riship@nvidia.com>

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

jeffreywang-anyscale added 2 commits March 10, 2026 13:54

Implement RayExecutorV2 & tested on a single-node

3a3a250

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

Enable multinode

df75664

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

mergify bot added ci/build v1 labels Mar 12, 2026

khluu added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 12, 2026

jeffreywang-anyscale marked this pull request as ready for review March 12, 2026 20:54

jeffreywang-anyscale requested a review from njhill as a code owner March 12, 2026 20:54

jeffreywang-anyscale added 2 commits March 16, 2026 11:46

Fix pre-commit

bbaa21b

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

Fix RayExecutorV2 monitor thread self-join

2541f2d

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

jeffreywang-anyscale force-pushed the ray branch from 44868f7 to 39402d7 Compare March 17, 2026 05:43

Remove unnecessary changes

c3ad8e5

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

jeffreywang-anyscale force-pushed the ray branch from 39402d7 to c3ad8e5 Compare March 17, 2026 05:45

Extract bundle sorting to a utility

300d0ae

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

jeffreywang-anyscale added 2 commits March 17, 2026 07:28

Fix linter

11d32eb

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

Enable async scheduling

5795f1d

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

jeffreywang-anyscale requested review from ProExpertProg, WoosukKwon, hmellor, houseroad, mgoin, robertgshaw2-redhat, tlrmchlsmth, yewentao256 and youkaichao as code owners March 18, 2026 01:46

jeffreywang-anyscale marked this pull request as draft March 18, 2026 02:18

kouroshHakha reviewed Mar 18, 2026

View reviewed changes

jeffreywang-anyscale added 3 commits March 30, 2026 21:49

CR feedback round 2

af21cdd

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

Only apply blacklist & propagate env with setdefault

ad8f6d0

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

Merge branch 'main' into ray

605a347

kouroshHakha reviewed Mar 31, 2026

View reviewed changes

Comment thread vllm/v1/executor/ray_executor_v2.py Outdated

Comment thread vllm/v1/executor/ray_executor_v2.py

Comment thread vllm/v1/executor/ray_executor_v2.py Outdated

Comment thread vllm/v1/executor/ray_executor_v2.py Outdated

Comment thread vllm/v1/executor/ray_executor_v2.py Outdated

CR feedback round 3

7586204

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

kouroshHakha approved these changes Mar 31, 2026

View reviewed changes

njhill approved these changes Apr 1, 2026

View reviewed changes

njhill merged commit de5e6c4 into vllm-project:main Apr 1, 2026
71 checks passed

njhill mentioned this pull request Apr 1, 2026

[BugFix] Fix precommit breakage due to conflicting in-flight merges #38759

Merged

ShobhitBehl mentioned this pull request Apr 2, 2026

Fix vLLM LKG version vllm-project/tpu-inference#2121

Merged

yzong-rh pushed a commit to yzong-rh/vllm that referenced this pull request Apr 3, 2026

[Feat][Executor] Introduce RayExecutorV2 (vllm-project#36836)

fe4c909

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

HenryTangDev pushed a commit to HenryTangMain/vllm that referenced this pull request Apr 6, 2026

[Feat][Executor] Introduce RayExecutorV2 (vllm-project#36836)

684ed25

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

rishitdholakia13 pushed a commit to rishitdholakia13/vllm that referenced this pull request Apr 7, 2026

[Feat][Executor] Introduce RayExecutorV2 (vllm-project#36836)

4a2cbf5

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com> Signed-off-by: rishitdholakia13 <rishit+github@cohere.com>

puririshi98 pushed a commit to puririshi98/vllm that referenced this pull request Apr 7, 2026

[Feat][Executor] Introduce RayExecutorV2 (vllm-project#36836)

187ea0a

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com> Signed-off-by: Rishi Puri <riship@nvidia.com>

big-yellow-duck pushed a commit to EmbeddedLLM/vllm that referenced this pull request Apr 8, 2026

[Feat][Executor] Introduce RayExecutorV2 (vllm-project#36836)

4ae68c8

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

mtparet pushed a commit to blackfuel-ai/vllm that referenced this pull request Apr 9, 2026

[Feat][Executor] Introduce RayExecutorV2 (vllm-project#36836)

05ba12f

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

EricccYang pushed a commit to EricccYang/vllm that referenced this pull request Apr 10, 2026

[Feat][Executor] Introduce RayExecutorV2 (vllm-project#36836)

3aa8d4f

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

aidendle94 pushed a commit to aidendle94/vllm that referenced this pull request Apr 11, 2026

[Feat][Executor] Introduce RayExecutorV2 (vllm-project#36836)

e382d13

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

jeffreywang-anyscale mentioned this pull request Apr 16, 2026

[core][cgraph] Introduce fault-tolerant PushMutableObject ray-project/ray#58866

Open

Uh oh!

Conversation

jeffreywang-anyscale commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Unit tests

Integration tests

Test Result

Benchmark results (Qwen/Qwen3-8B on L4)

Uh oh!

mergify bot commented Mar 12, 2026

Uh oh!

jeffreywang-anyscale commented Mar 12, 2026

Uh oh!

mergify bot commented Mar 16, 2026

Uh oh!

mergify bot commented Mar 17, 2026

Uh oh!

kouroshHakha left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jeffreywang-anyscale commented Mar 31, 2026

Uh oh!

kouroshHakha left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kouroshHakha left a comment

Choose a reason for hiding this comment

Uh oh!

njhill left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

jeffreywang-anyscale commented Mar 12, 2026 •

edited

Loading