[Feat][RL][1/2] Native Weight Syncing API: NCCL by hao-aaron · Pull Request #31943 · vllm-project/vllm

hao-aaron · 2026-01-08T04:37:32Z

Purpose

This PR introduces native weight syncing APIs for vLLM to support reinforcement learning post-training workflows (RLHF, PPO, etc.).

Currently, open-source projects like SkyRL, VeRL, and TRL must implement their own weight syncing infrastructure to use vLLM as an inference server during training. This leads to duplicated effort and requires users to version-lock to specific implementations. See RFC #31848 for full motivation.

New APIs

Three new endpoints exposed through LLM, AsyncLLM, and OpenAI-compatible HTTP server:

Endpoint	Description
`POST /init_weight_transfer`	Initialize process group between trainer and vLLM workers
`POST /update_weights`	Receive weight tensors from trainer and load incrementally
~~`POST /finalize_weight_update`~~	~~Run post-processing for quantized/custom kernel formats~~ Removed due to #32133
`GET /get_world_size`	Get total worker world size (TP × PP × DP) for rank calculation

Architecture

Pluggable WeightTransferEngine abstraction separates transport logic from worker implementation
Supported backends: NCCL (distributed), planned future CUDA IPC PR
Configuration via WeightTransferConfig dataclass with backend selection (only nccl for now)
Packed tensors - tensors are bucketed and packed for improved transfer efficiency, credit to https://github.com/NVIDIA-NeMo/RL/blob/main/nemo_rl/utils/packed_tensor.py for the packed tensor implementation

Examples

New examples added in examples/offline_inference/new_weight_syncing/:

File	Description
`rlhf.py`	Basic RLHF with Ray + NCCL backend, training on GPU 0, TP=2 inference on GPUs 1-2
`rlhf_async_new_apis.py`	Async RLHF with `pause_generation`/`resume_generation` for in-flight weight updates
`rlhf_http.py`	HTTP API-based weight syncing with a running `vllm serve` instance

Usage

# Launch vLLM as a Ray actor with weight transfer enabled
llm = ray.remote(LLM).remote(
    model=MODEL_NAME,
    tensor_parallel_size=2,
    weight_transfer_config=WeightTransferConfig(backend="nccl"),
)

# === Initialize process group (runs concurrently) ===
# [vLLM] Start joining the process group
handle = llm.init_weight_transfer.remote(
    WeightTransferInitRequest(init_info={
        "master_address": master_address,
        "master_port": master_port,
        "rank_offset": 1,
        "world_size": world_size,
    })
)
# ... [Trainer] simultaneously joins the process group as rank 0
ray.get(handle)  # Wait for vLLM workers to join

# === Weight sync (runs concurrently) ===
handle = llm.update_weights.remote(
    WeightUpdateRequest(update_info={
        "names": names, "dtype_names": dtypes, "shapes": shapes
    })
)
# ... [Trainer] simultaneously broadcasts weights via NCCL
ray.get(handle)  # Wait for vLLM to finish loading

Test Plan

Unit tests:

tests/distributed/test_packed_tensor.py - Tests packed tensor batching for efficient NCCL broadcast
tests/distributed/test_weight_transfer.py - Tests engine registry, parsing, validation, and NCCL transfer between Ray tasks
tests/entrypoints/weight_transfer/test_weight_transfer_llm.py - Tests LLM class weight transfer APIs with mock engine

E2E tests:
The examples in examples/offline_inference/new_weight_syncing/ serve as true end-to-end tests, exercising the full trainer→vLLM weight sync flow with actual NCCL communication.

Test Result

WIP

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Note

^{Cursor Bugbot is generating a summary for commit dae6946. Configure here.}

…gines, DP fixes Squashed from 11 commits including: - weight transfer init - ipc support - dataclasses for weight transfer config - async + new weight update APIs - incremental weight loading, dp fixes, http example Co-authored-by: SumanthRH <sumanthrh99@gmail.com>

Signed-off-by: ahao-anyscale <ahao@anyscale.com>

mergify · 2026-01-08T04:38:06Z

Documentation preview: https://vllm--31943.org.readthedocs.build/en/31943/

gemini-code-assist

Code Review

This pull request introduces a significant and well-designed feature for native weight transfer in vLLM, which will be very useful for RLHF workflows. The abstraction of a WeightTransferEngine with different backends like NCCL and IPC is clean, and the inclusion of multiple examples is very helpful. The overall implementation is strong, but I've identified a couple of critical issues that need to be addressed: a NameError due to a typo in an __init__.py file's __all__ list, and a bug in the IPC engine where an incorrect index is used for tensor reconstruction, which could lead to memory corruption. I also found some minor code duplication in the GPU worker initialization and shutdown logic. After these fixes, this will be a great addition to vLLM.

vllm/distributed/weight_transfer/__init__.py

vllm/distributed/weight_transfer/ipc_engine.py

vllm/v1/worker/gpu_worker.py

Signed-off-by: ahao-anyscale <ahao@anyscale.com>

hao-aaron · 2026-01-09T02:15:25Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a significant and well-designed feature for native weight syncing to support reinforcement learning workflows. The abstraction of a WeightTransferEngine with NCCL and IPC backends is clean and extensible. The new APIs are consistently exposed through the LLM, AsyncLLM, and HTTP server interfaces, and the provided examples and tests are comprehensive. My feedback focuses on improving implementation robustness, specifically by adding safeguards to the IPC engine's dependency on PyTorch internals and ensuring consistent, strict type checking in the API entrypoints for better maintainability and correctness.

vllm/distributed/weight_transfer/ipc_engine.py

vllm/entrypoints/llm.py

vllm/v1/engine/async_llm.py

mergify · 2026-01-10T20:53:05Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @ahao-anyscale.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: ahao-anyscale <ahao@anyscale.com>

hao-aaron · 2026-01-13T02:13:39Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a comprehensive and well-designed set of native weight syncing APIs to support reinforcement learning workflows. The changes are extensive, adding a pluggable WeightTransferEngine abstraction with NCCL and IPC backends, corresponding configurations, new API endpoints in LLM, AsyncLLM, and the HTTP server, along with thorough examples and tests. The architecture is modular, and the inclusion of packed tensor transfer for NCCL is a great performance optimization. The unit and integration tests are robust and provide good confidence in the implementation. My review focuses on improving API consistency and the robustness of the provided examples. Overall, this is an excellent contribution that will significantly benefit users building RL-based applications on top of vLLM.

examples/online_serving/rlhf_http.py

vllm/entrypoints/llm.py

vllm/v1/engine/async_llm.py

mergify · 2026-01-30T20:28:34Z

Hi @ahao-anyscale, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?

mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

Signed-off-by: ahao-anyscale <ahao@anyscale.com>

mergify · 2026-01-30T21:51:15Z

Hi @ahao-anyscale, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?

mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

vllm/model_executor/model_loader/reload/layerwise.py

Signed-off-by: ahao-anyscale <ahao@anyscale.com>

mergify · 2026-01-30T23:36:20Z

Hi @ahao-anyscale, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?

mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

Signed-off-by: ahao-anyscale <ahao@anyscale.com>

hao-aaron · 2026-01-30T23:54:48Z

Some comments on integration with #32133, and online quantization:

removed finalize_weight_update
needed to make some changes https://github.com/vllm-project/vllm/pull/31943/changes#diff-7fbdc7b012d399cf0aabe6611f2a2e79d5047d3f2e19a11e35867f024c1cdcdfR221 to support online quantization
needed to include https://github.com/vllm-project/vllm/pull/31943/changes#diff-5511bfcc9c53f7d96517ad43e4087f6777bef21302da983f42cafae40a866644R897 to make sure that online quantization worked with dummy weights for the example.
These are temporary fixes that will be addressed in the future by [RFC]: Online Quantization Roadmap #32029
Kernel format reloading with multi-gpu probably doesn't work, since the nccl engine does not handle any kind of sharding, so i set default as checkpoint format

kylesayrs · 2026-01-31T00:11:04Z

vllm/v1/worker/gpu_worker.py

+
+            # Use layerwise reload pattern for checkpoint format weights
+            with torch.device(self.device):
+                initialize_layerwise_reload(model)


I think that this should go in initialize weight transfer, and that finalize_layerwise_reload should go in finalize weight transfer.

The layerwise api assumes that all weights are updated between initialize_layerwise_reload and finalize_layerwise_reload.

Whereas I think your api assumes that update_weights can be called with a small subset of weights, right?

There has been some changes in ideas about integration, now we are trying to have each call to update_weights contain all of the weights. This allows us to remove finalize_weight_transfer. Also, the problem of including initialize_layerwise_reload in init_weight_transfer is that init_weight_transfer is only called once at the beginning, to setup process groups rather than once everytime we do weight transfer. Confusing name, so its called init_weight_transfer_engine now

Got it! In that case, this looks good to me.

h-avsha · 2026-02-02T09:56:43Z

@kouroshHakha thanks for the great work! question: do you think we could add support for tracking weight versions? e.g. how many times were the weights updated since the vllm instance started. Many implementation track this externally, but we still think there's value in tracking this in vllm (and possibly tagging llm responses with the corresponding version).

hao-aaron · 2026-02-02T20:05:57Z

@h-avsha thanks for the suggestion, can add it to the follow up PR

mergify · 2026-02-03T06:57:39Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @ahao-anyscale.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

hmellor · 2026-02-03T08:25:42Z

vllm/config/weight_transfer.py

+@config
+@dataclass


Soon config will be a dataclasse_transform so it will be unnecessary to decorate with dataclass as well. Just something to be aware of

Signed-off-by: ahao-anyscale <ahao@anyscale.com>

robertgshaw2-redhat · 2026-02-03T22:00:00Z

vllm/distributed/weight_transfer/nccl_engine.py

+        rank_within_dp = self.parallel_config.rank
+
+        # Unique rank across all DP groups
+        worker_rank = dp_rank * world_size_per_dp + rank_within_dp


This is fine for this PR, but we actually have a few more parallelism schemes so we should probably reject if something we dont expect is selected. Can be a Fup

robertgshaw2-redhat · 2026-02-03T22:02:50Z

vllm/entrypoints/serve/rlhf/api_router.py

    return JSONResponse(content={"is_paused": paused})


+@router.post("/init_weight_transfer_engine")


for follow up: do we need a way to delete a weight transfer engine?

like clean up the process group between trainer and inference?

hao-aaron · 2026-02-03T23:15:31Z

failing test unrelated, failing on main as well: https://buildkite.com/vllm/ci/builds/49871/steps/canvas?sid=019c2586-7e43-4591-bb82-dae7ad53b155

robertgshaw2-redhat · 2026-02-04T00:57:24Z

vllm/v1/worker/gpu_worker.py

+        Args:
+            init_info: Dictionary containing backend-specific initialization info
+        """
+        if self.weight_transfer_engine is None:


another follow up: this could crash the server. We should probably just "do nothing" if this happens rather than crash

robertgshaw2-redhat

thanks for the hard work on this!

Signed-off-by: ahao-anyscale <ahao@anyscale.com>

Signed-off-by: ahao-anyscale <ahao@anyscale.com> Signed-off-by: Aaron Hao <ahao@anyscale.com> Co-authored-by: SumanthRH <sumanthrh99@gmail.com>

hao-aaron and others added 2 commits January 7, 2026 16:36

updated async, added world size endpoints

037f968

Signed-off-by: ahao-anyscale <ahao@anyscale.com>

mergify bot added documentation Improvements or additions to documentation frontend v1 labels Jan 8, 2026

gemini-code-assist bot reviewed Jan 8, 2026

View reviewed changes

vllm/distributed/weight_transfer/__init__.py Outdated Show resolved Hide resolved

vllm/distributed/weight_transfer/ipc_engine.py Outdated Show resolved Hide resolved

vllm/v1/worker/gpu_worker.py Outdated Show resolved Hide resolved

vllm/v1/worker/gpu_worker.py Outdated Show resolved Hide resolved

hao-aaron added 5 commits January 8, 2026 10:26

bugfixes

676c7e3

Signed-off-by: ahao-anyscale <ahao@anyscale.com>

ipc fix

0a935dd

Signed-off-by: ahao-anyscale <ahao@anyscale.com>

moved rlhf scripts

1f29fcd

Signed-off-by: ahao-anyscale <ahao@anyscale.com>

added tests

3471efb

Signed-off-by: ahao-anyscale <ahao@anyscale.com>

precommit fix

26249eb

Signed-off-by: ahao-anyscale <ahao@anyscale.com>

gemini-code-assist bot reviewed Jan 9, 2026

View reviewed changes

vllm/distributed/weight_transfer/ipc_engine.py Outdated Show resolved Hide resolved

vllm/entrypoints/llm.py Outdated Show resolved Hide resolved

vllm/v1/engine/async_llm.py Outdated Show resolved Hide resolved

mergify bot added the needs-rebase label Jan 10, 2026

hao-aaron changed the title ~~Weight transfer~~ [WIP] Native Weight Syncing APIs Jan 12, 2026

hao-aaron added 2 commits January 12, 2026 18:00

added packed tensors

0d4e296

Signed-off-by: ahao-anyscale <ahao@anyscale.com>

added unit tests to CI

dae6946

Signed-off-by: ahao-anyscale <ahao@anyscale.com>

mergify bot added the ci/build label Jan 13, 2026

gemini-code-assist bot reviewed Jan 13, 2026

View reviewed changes

examples/online_serving/rlhf_http.py Show resolved Hide resolved

vllm/entrypoints/llm.py Outdated Show resolved Hide resolved

vllm/v1/engine/async_llm.py Outdated Show resolved Hide resolved

hao-aaron marked this pull request as ready for review January 13, 2026 02:30

hao-aaron requested review from DarkLight1337, NickLucche, WoosukKwon, aarnphm, mgoin, robertgshaw2-redhat and youkaichao as code owners January 13, 2026 02:30

hao-aaron added 2 commits January 30, 2026 12:29

removed finalize weight update

cc4c67e

Signed-off-by: ahao-anyscale <ahao@anyscale.com>

fixes to online quant

f69383c

Signed-off-by: ahao-anyscale <ahao@anyscale.com>

hao-aaron requested a review from 22quinn as a code owner January 30, 2026 21:46

kylesayrs reviewed Jan 30, 2026

View reviewed changes

vllm/model_executor/model_loader/reload/layerwise.py Show resolved Hide resolved

fix examples

56249b3

Signed-off-by: ahao-anyscale <ahao@anyscale.com>

hao-aaron requested a review from pavanimajety as a code owner January 30, 2026 23:31

x

669b24c

Signed-off-by: ahao-anyscale <ahao@anyscale.com>

kylesayrs reviewed Jan 31, 2026

View reviewed changes

mergify bot added the needs-rebase label Feb 3, 2026

hmellor reviewed Feb 3, 2026

View reviewed changes

Merge upstream/main into weight_transfer

a2b39df

Signed-off-by: ahao-anyscale <ahao@anyscale.com>

mergify bot removed the needs-rebase label Feb 3, 2026

robertgshaw2-redhat reviewed Feb 3, 2026

View reviewed changes

robertgshaw2-redhat reviewed Feb 4, 2026

View reviewed changes

robertgshaw2-redhat approved these changes Feb 4, 2026

View reviewed changes

Merge upstream vllm up to 9f14c92

c280dbc

Signed-off-by: ahao-anyscale <ahao@anyscale.com>

robertgshaw2-redhat merged commit c1858b7 into vllm-project:main Feb 5, 2026
60 checks passed

hao-aaron mentioned this pull request Feb 9, 2026

[Feat][RL][2/2] Native Weight Syncing API: IPC #34171

Merged

5 tasks

kylesayrs mentioned this pull request Feb 16, 2026

[RFC] [QeRL]: Online Quantization and Model Reloading #30359

Open

1 task

		return JSONResponse(content={"is_paused": paused})


		@router.post("/init_weight_transfer_engine")

		@config
		@dataclass

Uh oh!

Conversation

hao-aaron commented Jan 8, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

New APIs

Architecture

Examples

Usage

Test Plan

Test Result

Uh oh!

mergify bot commented Jan 8, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hao-aaron commented Jan 9, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mergify bot commented Jan 10, 2026

Uh oh!

hao-aaron commented Jan 13, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mergify bot commented Jan 30, 2026

Uh oh!

mergify bot commented Jan 30, 2026

Uh oh!

Uh oh!

mergify bot commented Jan 30, 2026

Uh oh!

hao-aaron commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kylesayrs Jan 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hao-aaron Jan 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kylesayrs Jan 31, 2026

Choose a reason for hiding this comment

Uh oh!

h-avsha commented Feb 2, 2026

Uh oh!

hao-aaron commented Feb 2, 2026

Uh oh!

mergify bot commented Feb 3, 2026

Uh oh!

hmellor Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

robertgshaw2-redhat Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

robertgshaw2-redhat Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

kouroshHakha Feb 3, 2026

Choose a reason for hiding this comment

hao-aaron commented Jan 8, 2026 •

edited by github-actions bot

Loading

hao-aaron commented Jan 30, 2026 •

edited

Loading

kylesayrs Jan 31, 2026 •

edited

Loading

hao-aaron Jan 31, 2026 •

edited

Loading

hao-aaron commented Feb 3, 2026 •

edited

Loading