[docs] Add docs for new RL flows#36188
Conversation
|
Documentation preview: https://vllm--36188.org.readthedocs.build/en/36188/ |
|
This pull request has merge conflicts that must be resolved before it can be |
|
Hi @hao-aaron, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
There was a problem hiding this comment.
Code Review
This pull request introduces comprehensive documentation for new Reinforcement Learning (RL) features, including asynchronous RL flows and weight transfer mechanisms. The new documentation is well-structured, covering the overview, API usage, and different backends like NCCL and IPC. The PR also reorganizes the example files by moving older examples to a legacy directory. The documentation additions are clear and provide important details, such as required environment variables and API behavior, which will be very helpful for users. I have reviewed the changes and found no issues.
Note: Security Review is unavailable for this PR.
|
Hi @hao-aaron, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
| ) | ||
| ``` | ||
|
|
||
| See [`NCCLTrainerSendWeightsArgs`](https://github.com/vllm-project/vllm/blob/main/vllm/distributed/weight_transfer/nccl_engine.py) for the full list of configurable fields. |
There was a problem hiding this comment.
You could instead link to the API docs with [`NCCLTrainerSendWeightsArgs`][vllm.distributed.weight_transfer.nccl_engine.NCCLTrainerSendWeightsArgs]
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
|
Hi @hao-aaron, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
|
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
I just pushed a couple of changes to make the documentation generation a bit better. Is there any reason to keep the legacy examples? They will still be present in older versions of the docs |
|
they could be good to have so people have an idea of how to use worker extension in case the custom weight sync doesn't work for their use case, but otherwise its not necessary. I can remove it |
Made-with: Cursor # Conflicts: # .buildkite/test_areas/distributed.yaml Signed-off-by: ahao-anyscale <ahao@anyscale.com>
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
|
Hi @hao-aaron, the pre-commit checks have failed. Please run: uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
|
||
| ```bash | ||
| vllm serve my-model \ | ||
| --weight-transfer-config '{"backend": "nccl"}' |
There was a problem hiding this comment.
@hao-aaron I just wanted to give you the option to recommend this, I think it looks nicer but it's up to you
| --weight-transfer-config '{"backend": "nccl"}' | |
| --weight-transfer-config.backend nccl |
hmellor
left a comment
There was a problem hiding this comment.
This is a really nice improvement!
I'll start merging now and any tweaks to syntax/API reference linking can be follow ups if we want.
Signed-off-by: ahao-anyscale <ahao@anyscale.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: ahao-anyscale <ahao@anyscale.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
This reverts commit 47a1f11.
Signed-off-by: ahao-anyscale <ahao@anyscale.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: ahao-anyscale <ahao@anyscale.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: ahao-anyscale <ahao@anyscale.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: Monishver Chandrasekaran <monishverchandrasekaran@gmail.com>
Signed-off-by: ahao-anyscale <ahao@anyscale.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: ahao-anyscale <ahao@anyscale.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: Vinay Damodaran <vrdn@hey.com>
Signed-off-by: ahao-anyscale <ahao@anyscale.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: EricccYang <yangyang4991@gmail.com>

Purpose
Introduce new docs detailing weight sync and new pause resume, move old rlhf examples to legacy
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.