[EPLB] Add alternative communication for EPLB weight exchange#33176
Merged
tlrmchlsmth merged 48 commits intovllm-project:mainfrom Mar 31, 2026
Merged
[EPLB] Add alternative communication for EPLB weight exchange#33176tlrmchlsmth merged 48 commits intovllm-project:mainfrom
tlrmchlsmth merged 48 commits intovllm-project:mainfrom
Conversation
Signed-off-by: ilmarkov <markovilya197@gmail.com>
Signed-off-by: ilmarkov <markovilya197@gmail.com>
Signed-off-by: ilmarkov <markovilya197@gmail.com>
Signed-off-by: ilmarkov <markovilya197@gmail.com>
Signed-off-by: ilmarkov <markovilya197@gmail.com>
SageMoore
reviewed
Feb 4, 2026
Contributor
SageMoore
left a comment
There was a problem hiding this comment.
This looks like a good change.
I have two minor nits, but otherwise LGTM.
Contributor
|
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: ilmarkov <markovilya197@gmail.com>
tlrmchlsmth
approved these changes
Feb 11, 2026
Contributor
|
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: ilmarkov <markovilya197@gmail.com>
tlrmchlsmth
requested changes
Mar 25, 2026
Signed-off-by: ilmarkov <markovilya197@gmail.com>
tlrmchlsmth
reviewed
Mar 26, 2026
Signed-off-by: ilmarkov <markovilya197@gmail.com>
tlrmchlsmth
approved these changes
Mar 26, 2026
Contributor
|
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: ilmarkov <markovilya197@gmail.com>
auto-merge was automatically disabled
March 27, 2026 10:25
Head branch was pushed to by a user without write access
Contributor
|
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Markov Ilya <markovilya19@gmail.com>
puririshi98
pushed a commit
to puririshi98/vllm
that referenced
this pull request
Apr 7, 2026
…roject#33176) Signed-off-by: ilmarkov <markovilya197@gmail.com> Signed-off-by: Markov Ilya <markovilya19@gmail.com> Co-authored-by: Markov Ilya <markovilya19@gmail.com> Signed-off-by: Rishi Puri <riship@nvidia.com>
5 tasks
mtparet
pushed a commit
to blackfuel-ai/vllm
that referenced
this pull request
Apr 9, 2026
…roject#33176) Signed-off-by: ilmarkov <markovilya197@gmail.com> Signed-off-by: Markov Ilya <markovilya19@gmail.com> Co-authored-by: Markov Ilya <markovilya19@gmail.com>
TomerBN-Nvidia
pushed a commit
to TomerBN-Nvidia/vllm
that referenced
this pull request
Apr 20, 2026
…roject#33176) Signed-off-by: ilmarkov <markovilya197@gmail.com> Signed-off-by: Markov Ilya <markovilya19@gmail.com> Co-authored-by: Markov Ilya <markovilya19@gmail.com>
SandishKumarHN
added a commit
to SandishKumarHN/vllm
that referenced
this pull request
Apr 30, 2026
vllm/distributed/eplb/rebalance_execute.py:586 had a device-wide GPU sync with a NOTE(bowen) comment admitting the original author didn't know why it was needed. After investigation, the line is dead code in the SYNC path (rearrange_expert_weights_inplace). Why it's safe ------------- The SYNC path runs entirely on the default CUDA stream end-to-end — torch.empty_like, move_to_buffer's b.copy_(w, non_blocking=True), and NCCL Send/Recv (default stream=None -> current_stream()) all share it. No cross-stream hazard exists. PyTorch's ProcessGroupNCCL correctly calls record_stream() on input/output tensors, so the caching allocator is also safe across iterations. The ASYNC path (transfer_layer + async_worker) uses its own design — cuda_stream.synchronize() (async_worker.py:134) plus CpuGpuEvent for thread handoff (eplb_utils.py) — and is unaffected by this change. Likely historical reason ------------------------ The original EPLB PR (vllm-project#18343) used torch.distributed.batch_isend_irecv directly. req.wait() on those work objects only guarantees the NCCL collective has been enqueued, NOT that the underlying tensors are safe to free/reuse — there is no record_stream() linkage to the caching allocator. torch.cuda.synchronize() was a hammer to flush all work before the next iteration's torch.empty_like allocations. The communicator refactor (vllm-project#33176) replaced batch_isend_irecv with ProcessGroupNCCL-based send/recv, which calls record_stream() correctly. The sync became dead code at that point but was never removed. Verification ------------ - Bytecode: 'synchronize' in rearrange_expert_weights_inplace.__code__.co_names -> False - Stress: 50 runs x 2000 iter x hidden=[1024,2048] on 2x A100 (torch_nccl) -> 50/50 race-clean (~100k effective sync-path iterations). - Larger-scale (4-rank A100) re-validation in progress.
SandishKumarHN
added a commit
to SandishKumarHN/vllm
that referenced
this pull request
Apr 30, 2026
vllm/distributed/eplb/rebalance_execute.py:586 had a device-wide GPU sync with a NOTE(bowen) comment admitting the original author didn't know why it was needed. After investigation, the line is dead code in the SYNC path (rearrange_expert_weights_inplace). Why it's safe ------------- The SYNC path runs entirely on the default CUDA stream end-to-end — torch.empty_like, move_to_buffer's b.copy_(w, non_blocking=True), and NCCL Send/Recv (default stream=None -> current_stream()) all share it. No cross-stream hazard exists. PyTorch's ProcessGroupNCCL correctly calls record_stream() on input/output tensors, so the caching allocator is also safe across iterations. The ASYNC path (transfer_layer + async_worker) uses its own design — cuda_stream.synchronize() (async_worker.py:134) plus CpuGpuEvent for thread handoff (eplb_utils.py) — and is unaffected by this change. Likely historical reason ------------------------ The original EPLB PR (vllm-project#18343) used torch.distributed.batch_isend_irecv directly. req.wait() on those work objects only guarantees the NCCL collective has been enqueued, NOT that the underlying tensors are safe to free/reuse — there is no record_stream() linkage to the caching allocator. torch.cuda.synchronize() was a hammer to flush all work before the next iteration's torch.empty_like allocations. The communicator refactor (vllm-project#33176) replaced batch_isend_irecv with ProcessGroupNCCL-based send/recv, which calls record_stream() correctly. The sync became dead code at that point but was never removed. Verification ------------ - Bytecode: 'synchronize' in rearrange_expert_weights_inplace.__code__.co_names -> False - Stress: 50 runs x 2000 iter x hidden=[1024,2048] on 2x A100 (torch_nccl) -> 50/50 race-clean (~100k effective sync-path iterations). - Larger-scale (4-rank A100) re-validation in progress. Signed-off-by: SandishKumarHN <sandish@fb.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Purpose
PR adds an option in eplb_config - communicator [torch_nccl|torch_gloo|pynccl], isolates weights exchange communication from the routing logic.
torch_gloo and nixl avoid async EPLB hangs when NCCL is used in all2all backend, so in this PR we force using these EPLB communicators for async EPLB (instead of doing sync EPLB as of now on main).
Validation
server. Sync EPLB:
gsm8k (same as on main) pynccl
Async EPLB
Added tests for all communicators in
test_eplb_execute.py. Updated timings in corresponding.buildkite.Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.