Skip to content

[Feat][RL][1/2] Native Weight Syncing API: NCCL#31943

Merged
robertgshaw2-redhat merged 56 commits intovllm-project:mainfrom
hao-aaron:weight_transfer
Feb 5, 2026
Merged

[Feat][RL][1/2] Native Weight Syncing API: NCCL#31943
robertgshaw2-redhat merged 56 commits intovllm-project:mainfrom
hao-aaron:weight_transfer

Conversation

@hao-aaron
Copy link
Copy Markdown
Contributor

@hao-aaron hao-aaron commented Jan 8, 2026

Purpose

This PR introduces native weight syncing APIs for vLLM to support reinforcement learning post-training workflows (RLHF, PPO, etc.).

Currently, open-source projects like SkyRL, VeRL, and TRL must implement their own weight syncing infrastructure to use vLLM as an inference server during training. This leads to duplicated effort and requires users to version-lock to specific implementations. See RFC #31848 for full motivation.

New APIs

Three new endpoints exposed through LLM, AsyncLLM, and OpenAI-compatible HTTP server:

Endpoint Description
POST /init_weight_transfer Initialize process group between trainer and vLLM workers
POST /update_weights Receive weight tensors from trainer and load incrementally
POST /finalize_weight_update Run post-processing for quantized/custom kernel formats Removed due to #32133
GET /get_world_size Get total worker world size (TP × PP × DP) for rank calculation

Architecture

  • Pluggable WeightTransferEngine abstraction separates transport logic from worker implementation
  • Supported backends: NCCL (distributed), planned future CUDA IPC PR
  • Configuration via WeightTransferConfig dataclass with backend selection (only nccl for now)
  • Packed tensors - tensors are bucketed and packed for improved transfer efficiency, credit to https://github.com/NVIDIA-NeMo/RL/blob/main/nemo_rl/utils/packed_tensor.py for the packed tensor implementation

Examples

New examples added in examples/offline_inference/new_weight_syncing/:

File Description
rlhf.py Basic RLHF with Ray + NCCL backend, training on GPU 0, TP=2 inference on GPUs 1-2
rlhf_async_new_apis.py Async RLHF with pause_generation/resume_generation for in-flight weight updates
rlhf_http.py HTTP API-based weight syncing with a running vllm serve instance

Usage

# Launch vLLM as a Ray actor with weight transfer enabled
llm = ray.remote(LLM).remote(
    model=MODEL_NAME,
    tensor_parallel_size=2,
    weight_transfer_config=WeightTransferConfig(backend="nccl"),
)

# === Initialize process group (runs concurrently) ===
# [vLLM] Start joining the process group
handle = llm.init_weight_transfer.remote(
    WeightTransferInitRequest(init_info={
        "master_address": master_address,
        "master_port": master_port,
        "rank_offset": 1,
        "world_size": world_size,
    })
)
# ... [Trainer] simultaneously joins the process group as rank 0
ray.get(handle)  # Wait for vLLM workers to join

# === Weight sync (runs concurrently) ===
handle = llm.update_weights.remote(
    WeightUpdateRequest(update_info={
        "names": names, "dtype_names": dtypes, "shapes": shapes
    })
)
# ... [Trainer] simultaneously broadcasts weights via NCCL
ray.get(handle)  # Wait for vLLM to finish loading

Test Plan

Unit tests:

  • tests/distributed/test_packed_tensor.py - Tests packed tensor batching for efficient NCCL broadcast
  • tests/distributed/test_weight_transfer.py - Tests engine registry, parsing, validation, and NCCL transfer between Ray tasks
  • tests/entrypoints/weight_transfer/test_weight_transfer_llm.py - Tests LLM class weight transfer APIs with mock engine

E2E tests:
The examples in examples/offline_inference/new_weight_syncing/ serve as true end-to-end tests, exercising the full trainer→vLLM weight sync flow with actual NCCL communication.

Test Result

WIP


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Note

Cursor Bugbot is generating a summary for commit dae6946. Configure here.

hao-aaron and others added 2 commits January 7, 2026 16:36
…gines, DP fixes

Squashed from 11 commits including:
- weight transfer init
- ipc support
- dataclasses for weight transfer config
- async + new weight update APIs
- incremental weight loading, dp fixes, http example

Co-authored-by: SumanthRH <sumanthrh99@gmail.com>
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
@mergify
Copy link
Copy Markdown

mergify bot commented Jan 8, 2026

Documentation preview: https://vllm--31943.org.readthedocs.build/en/31943/

@mergify mergify bot added documentation Improvements or additions to documentation frontend v1 labels Jan 8, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant and well-designed feature for native weight transfer in vLLM, which will be very useful for RLHF workflows. The abstraction of a WeightTransferEngine with different backends like NCCL and IPC is clean, and the inclusion of multiple examples is very helpful. The overall implementation is strong, but I've identified a couple of critical issues that need to be addressed: a NameError due to a typo in an __init__.py file's __all__ list, and a bug in the IPC engine where an incorrect index is used for tensor reconstruction, which could lead to memory corruption. I also found some minor code duplication in the GPU worker initialization and shutdown logic. After these fixes, this will be a great addition to vLLM.

Signed-off-by: ahao-anyscale <ahao@anyscale.com>
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
@hao-aaron
Copy link
Copy Markdown
Contributor Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant and well-designed feature for native weight syncing to support reinforcement learning workflows. The abstraction of a WeightTransferEngine with NCCL and IPC backends is clean and extensible. The new APIs are consistently exposed through the LLM, AsyncLLM, and HTTP server interfaces, and the provided examples and tests are comprehensive. My feedback focuses on improving implementation robustness, specifically by adding safeguards to the IPC engine's dependency on PyTorch internals and ensuring consistent, strict type checking in the API entrypoints for better maintainability and correctness.

@mergify
Copy link
Copy Markdown

mergify bot commented Jan 10, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @ahao-anyscale.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Jan 10, 2026
@hao-aaron hao-aaron changed the title Weight transfer [WIP] Native Weight Syncing APIs Jan 12, 2026
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
@mergify mergify bot added the ci/build label Jan 13, 2026
@hao-aaron
Copy link
Copy Markdown
Contributor Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a comprehensive and well-designed set of native weight syncing APIs to support reinforcement learning workflows. The changes are extensive, adding a pluggable WeightTransferEngine abstraction with NCCL and IPC backends, corresponding configurations, new API endpoints in LLM, AsyncLLM, and the HTTP server, along with thorough examples and tests. The architecture is modular, and the inclusion of packed tensor transfer for NCCL is a great performance optimization. The unit and integration tests are robust and provide good confidence in the implementation. My review focuses on improving API consistency and the robustness of the provided examples. Overall, this is an excellent contribution that will significantly benefit users building RL-based applications on top of vLLM.

@mergify
Copy link
Copy Markdown

mergify bot commented Jan 30, 2026

Hi @ahao-anyscale, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?
mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

Signed-off-by: ahao-anyscale <ahao@anyscale.com>
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
@hao-aaron hao-aaron requested a review from 22quinn as a code owner January 30, 2026 21:46
@mergify
Copy link
Copy Markdown

mergify bot commented Jan 30, 2026

Hi @ahao-anyscale, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?
mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

Signed-off-by: ahao-anyscale <ahao@anyscale.com>
@mergify
Copy link
Copy Markdown

mergify bot commented Jan 30, 2026

Hi @ahao-anyscale, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?
mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

x
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
@hao-aaron
Copy link
Copy Markdown
Contributor Author

hao-aaron commented Jan 30, 2026

Some comments on integration with #32133, and online quantization:


# Use layerwise reload pattern for checkpoint format weights
with torch.device(self.device):
initialize_layerwise_reload(model)
Copy link
Copy Markdown
Contributor

@kylesayrs kylesayrs Jan 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this should go in initialize weight transfer, and that finalize_layerwise_reload should go in finalize weight transfer.

The layerwise api assumes that all weights are updated between initialize_layerwise_reload and finalize_layerwise_reload.

Whereas I think your api assumes that update_weights can be called with a small subset of weights, right?

Copy link
Copy Markdown
Contributor Author

@hao-aaron hao-aaron Jan 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There has been some changes in ideas about integration, now we are trying to have each call to update_weights contain all of the weights. This allows us to remove finalize_weight_transfer. Also, the problem of including initialize_layerwise_reload in init_weight_transfer is that init_weight_transfer is only called once at the beginning, to setup process groups rather than once everytime we do weight transfer. Confusing name, so its called init_weight_transfer_engine now

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it! In that case, this looks good to me.

@h-avsha
Copy link
Copy Markdown
Contributor

h-avsha commented Feb 2, 2026

@kouroshHakha thanks for the great work! question: do you think we could add support for tracking weight versions? e.g. how many times were the weights updated since the vllm instance started. Many implementation track this externally, but we still think there's value in tracking this in vllm (and possibly tagging llm responses with the corresponding version).

@hao-aaron
Copy link
Copy Markdown
Contributor Author

@h-avsha thanks for the suggestion, can add it to the follow up PR

@mergify
Copy link
Copy Markdown

mergify bot commented Feb 3, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @ahao-anyscale.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Feb 3, 2026
Comment on lines +9 to +10
@config
@dataclass
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Soon config will be a dataclasse_transform so it will be unnecessary to decorate with dataclass as well. Just something to be aware of

Signed-off-by: ahao-anyscale <ahao@anyscale.com>
@mergify mergify bot removed the needs-rebase label Feb 3, 2026
rank_within_dp = self.parallel_config.rank

# Unique rank across all DP groups
worker_rank = dp_rank * world_size_per_dp + rank_within_dp
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine for this PR, but we actually have a few more parallelism schemes so we should probably reject if something we dont expect is selected. Can be a Fup

return JSONResponse(content={"is_paused": paused})


@router.post("/init_weight_transfer_engine")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for follow up: do we need a way to delete a weight transfer engine?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

like clean up the process group between trainer and inference?

@hao-aaron
Copy link
Copy Markdown
Contributor Author

hao-aaron commented Feb 3, 2026

failing test unrelated, failing on main as well: https://buildkite.com/vllm/ci/builds/49871/steps/canvas?sid=019c2586-7e43-4591-bb82-dae7ad53b155

Args:
init_info: Dictionary containing backend-specific initialization info
"""
if self.weight_transfer_engine is None:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another follow up: this could crash the server. We should probably just "do nothing" if this happens rather than crash

Copy link
Copy Markdown
Collaborator

@robertgshaw2-redhat robertgshaw2-redhat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the hard work on this!

Signed-off-by: ahao-anyscale <ahao@anyscale.com>
@robertgshaw2-redhat robertgshaw2-redhat merged commit c1858b7 into vllm-project:main Feb 5, 2026
60 checks passed
ItzDEXX pushed a commit to ItzDEXX/vllm that referenced this pull request Feb 19, 2026
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
Signed-off-by: Aaron Hao <ahao@anyscale.com>
Co-authored-by: SumanthRH <sumanthrh99@gmail.com>
tunglinwood pushed a commit to tunglinwood/vllm that referenced this pull request Mar 4, 2026
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
Signed-off-by: Aaron Hao <ahao@anyscale.com>
Co-authored-by: SumanthRH <sumanthrh99@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build documentation Improvements or additions to documentation frontend ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants