Skip to content

[BugFix] Fix sparse NCCL weight transfer test construction#44345

Merged
robertgshaw2-redhat merged 1 commit into
vllm-project:mainfrom
bedeks:fix/sparse-nccl-test-engine-model
Jun 2, 2026
Merged

[BugFix] Fix sparse NCCL weight transfer test construction#44345
robertgshaw2-redhat merged 1 commit into
vllm-project:mainfrom
bedeks:fix/sparse-nccl-test-engine-model

Conversation

@bedeks
Copy link
Copy Markdown
Contributor

@bedeks bedeks commented Jun 2, 2026

Purpose

Fix #44272 forward for the tests that failed in the nightly CI.

The production path already passes the model through WeightTransferEngineFactory.create_engine(...); the breakage was in stale test/example-style call sites that still constructed NCCLWeightTransferEngine(config, parallel_config) directly which run only when >2 GPUs.

This updates:

tests/distributed/test_weight_transfer.py
docs/training/weight_transfer/base.md

Test Plan

  • Ran targeted sparse NCCL regression tests on a two-GPU L40 environment.
  • Ran related sparse worker/model-runner tests.
  • Ran the full distributed weight transfer suite covering NCCL dense, NCCL sparse, and IPC paths.
  • Ran the sparse NCCL example to validate the real-model dense-vs-sparse update flow.

Test Result

  • Targeted sparse NCCL regression tests passed.
  • Related sparse worker/model-runner tests passed.
  • Full distributed weight transfer suite passed.
  • Sparse NCCL example completed successfully with matching dense and sparse update behavior.

Signed-off-by: Siddharth Bedekar <bedeksid@gmail.com>
@robertgshaw2-redhat robertgshaw2-redhat enabled auto-merge (squash) June 2, 2026 17:28
@github-actions github-actions Bot added the ready ONLY add when PR is ready to merge/full CI is needed label Jun 2, 2026
@njhill njhill changed the title Fix sparse NCCL weight transfer test construction [BugFix] Fix sparse NCCL weight transfer test construction Jun 2, 2026
@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Jun 2, 2026

Documentation preview: https://vllm--44345.org.readthedocs.build/en/44345/

@mergify mergify Bot added documentation Improvements or additions to documentation bug Something isn't working labels Jun 2, 2026
@robertgshaw2-redhat robertgshaw2-redhat merged commit 0917a00 into vllm-project:main Jun 2, 2026
22 of 24 checks passed
mvanhorn pushed a commit to mvanhorn/vllm that referenced this pull request Jun 4, 2026
Signed-off-by: Siddharth Bedekar <bedeksid@gmail.com>
Signed-off-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>
bnellnm pushed a commit to neuralmagic/vllm that referenced this pull request Jun 4, 2026
Signed-off-by: Siddharth Bedekar <bedeksid@gmail.com>
andakai pushed a commit to andakai/vllm that referenced this pull request Jun 4, 2026
Signed-off-by: Siddharth Bedekar <bedeksid@gmail.com>
JisoLya pushed a commit to JisoLya/vllm that referenced this pull request Jun 5, 2026
Signed-off-by: Siddharth Bedekar <bedeksid@gmail.com>
Signed-off-by: JisoLya <523420504@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working documentation Improvements or additions to documentation ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants