[Feat][RL][1/2] Native Weight Syncing API: NCCL#31943
[Feat][RL][1/2] Native Weight Syncing API: NCCL#31943robertgshaw2-redhat merged 56 commits intovllm-project:mainfrom
Conversation
…gines, DP fixes Squashed from 11 commits including: - weight transfer init - ipc support - dataclasses for weight transfer config - async + new weight update APIs - incremental weight loading, dp fixes, http example Co-authored-by: SumanthRH <sumanthrh99@gmail.com>
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
|
Documentation preview: https://vllm--31943.org.readthedocs.build/en/31943/ |
There was a problem hiding this comment.
Code Review
This pull request introduces a significant and well-designed feature for native weight transfer in vLLM, which will be very useful for RLHF workflows. The abstraction of a WeightTransferEngine with different backends like NCCL and IPC is clean, and the inclusion of multiple examples is very helpful. The overall implementation is strong, but I've identified a couple of critical issues that need to be addressed: a NameError due to a typo in an __init__.py file's __all__ list, and a bug in the IPC engine where an incorrect index is used for tensor reconstruction, which could lead to memory corruption. I also found some minor code duplication in the GPU worker initialization and shutdown logic. After these fixes, this will be a great addition to vLLM.
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a significant and well-designed feature for native weight syncing to support reinforcement learning workflows. The abstraction of a WeightTransferEngine with NCCL and IPC backends is clean and extensible. The new APIs are consistently exposed through the LLM, AsyncLLM, and HTTP server interfaces, and the provided examples and tests are comprehensive. My feedback focuses on improving implementation robustness, specifically by adding safeguards to the IPC engine's dependency on PyTorch internals and ensuring consistent, strict type checking in the API entrypoints for better maintainability and correctness.
|
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a comprehensive and well-designed set of native weight syncing APIs to support reinforcement learning workflows. The changes are extensive, adding a pluggable WeightTransferEngine abstraction with NCCL and IPC backends, corresponding configurations, new API endpoints in LLM, AsyncLLM, and the HTTP server, along with thorough examples and tests. The architecture is modular, and the inclusion of packed tensor transfer for NCCL is a great performance optimization. The unit and integration tests are robust and provide good confidence in the implementation. My review focuses on improving API consistency and the robustness of the provided examples. Overall, this is an excellent contribution that will significantly benefit users building RL-based applications on top of vLLM.
|
Hi @ahao-anyscale, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
|
Hi @ahao-anyscale, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
|
Hi @ahao-anyscale, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
|
Some comments on integration with #32133, and online quantization:
|
|
|
||
| # Use layerwise reload pattern for checkpoint format weights | ||
| with torch.device(self.device): | ||
| initialize_layerwise_reload(model) |
There was a problem hiding this comment.
I think that this should go in initialize weight transfer, and that finalize_layerwise_reload should go in finalize weight transfer.
The layerwise api assumes that all weights are updated between initialize_layerwise_reload and finalize_layerwise_reload.
Whereas I think your api assumes that update_weights can be called with a small subset of weights, right?
There was a problem hiding this comment.
There has been some changes in ideas about integration, now we are trying to have each call to update_weights contain all of the weights. This allows us to remove finalize_weight_transfer. Also, the problem of including initialize_layerwise_reload in init_weight_transfer is that init_weight_transfer is only called once at the beginning, to setup process groups rather than once everytime we do weight transfer. Confusing name, so its called init_weight_transfer_engine now
There was a problem hiding this comment.
Got it! In that case, this looks good to me.
|
@kouroshHakha thanks for the great work! question: do you think we could add support for tracking weight versions? e.g. how many times were the weights updated since the vllm instance started. Many implementation track this externally, but we still think there's value in tracking this in vllm (and possibly tagging llm responses with the corresponding version). |
|
@h-avsha thanks for the suggestion, can add it to the follow up PR |
|
This pull request has merge conflicts that must be resolved before it can be |
| @config | ||
| @dataclass |
There was a problem hiding this comment.
Soon config will be a dataclasse_transform so it will be unnecessary to decorate with dataclass as well. Just something to be aware of
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
| rank_within_dp = self.parallel_config.rank | ||
|
|
||
| # Unique rank across all DP groups | ||
| worker_rank = dp_rank * world_size_per_dp + rank_within_dp |
There was a problem hiding this comment.
This is fine for this PR, but we actually have a few more parallelism schemes so we should probably reject if something we dont expect is selected. Can be a Fup
| return JSONResponse(content={"is_paused": paused}) | ||
|
|
||
|
|
||
| @router.post("/init_weight_transfer_engine") |
There was a problem hiding this comment.
for follow up: do we need a way to delete a weight transfer engine?
There was a problem hiding this comment.
like clean up the process group between trainer and inference?
|
failing test unrelated, failing on main as well: https://buildkite.com/vllm/ci/builds/49871/steps/canvas?sid=019c2586-7e43-4591-bb82-dae7ad53b155 |
| Args: | ||
| init_info: Dictionary containing backend-specific initialization info | ||
| """ | ||
| if self.weight_transfer_engine is None: |
There was a problem hiding this comment.
another follow up: this could crash the server. We should probably just "do nothing" if this happens rather than crash
robertgshaw2-redhat
left a comment
There was a problem hiding this comment.
thanks for the hard work on this!
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
Signed-off-by: ahao-anyscale <ahao@anyscale.com> Signed-off-by: Aaron Hao <ahao@anyscale.com> Co-authored-by: SumanthRH <sumanthrh99@gmail.com>
Signed-off-by: ahao-anyscale <ahao@anyscale.com> Signed-off-by: Aaron Hao <ahao@anyscale.com> Co-authored-by: SumanthRH <sumanthrh99@gmail.com>
Purpose
This PR introduces native weight syncing APIs for vLLM to support reinforcement learning post-training workflows (RLHF, PPO, etc.).
Currently, open-source projects like SkyRL, VeRL, and TRL must implement their own weight syncing infrastructure to use vLLM as an inference server during training. This leads to duplicated effort and requires users to version-lock to specific implementations. See RFC #31848 for full motivation.
New APIs
Three new endpoints exposed through
LLM,AsyncLLM, and OpenAI-compatible HTTP server:POST /init_weight_transferPOST /update_weightsPOST /finalize_weight_updateRun post-processing for quantized/custom kernel formatsRemoved due to #32133GET /get_world_sizeArchitecture
WeightTransferEngineabstraction separates transport logic from worker implementationWeightTransferConfigdataclass with backend selection (only nccl for now)Examples
New examples added in
examples/offline_inference/new_weight_syncing/:rlhf.pyrlhf_async_new_apis.pypause_generation/resume_generationfor in-flight weight updatesrlhf_http.pyvllm serveinstanceUsage
Test Plan
Unit tests:
tests/distributed/test_packed_tensor.py- Tests packed tensor batching for efficient NCCL broadcasttests/distributed/test_weight_transfer.py- Tests engine registry, parsing, validation, and NCCL transfer between Ray taskstests/entrypoints/weight_transfer/test_weight_transfer_llm.py- Tests LLM class weight transfer APIs with mock engineE2E tests:
The examples in
examples/offline_inference/new_weight_syncing/serve as true end-to-end tests, exercising the full trainer→vLLM weight sync flow with actual NCCL communication.Test Result
WIP
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.Note
Cursor Bugbot is generating a summary for commit dae6946. Configure here.