[2/N] Elastic EP Milestone 2: Integrating NIXL-EP#29630
[2/N] Elastic EP Milestone 2: Integrating NIXL-EP#29630libertyeagle wants to merge 2 commits intovllm-project:mainfrom
Conversation
support request serving during scaling up/down Signed-off-by: Yongji Wu <wuyongji317@gmail.com> misc fixes Signed-off-by: Yongji Wu <wuyongji317@gmail.com> minor fix Signed-off-by: Yongji Wu <wuyongji317@gmail.com> minor fix Signed-off-by: Yongji Wu <wuyongji317@gmail.com> scaling test: 2->4->2 Signed-off-by: Yongji Wu <wuyongji317@gmail.com> tiny fix Signed-off-by: Yongji Wu <wuyongji317@gmail.com> rebase fix Signed-off-by: Yongji Wu <wuyongji317@gmail.com> rebase fix Signed-off-by: Yongji Wu <wuyongji317@gmail.com> rebase fix Signed-off-by: Yongji Wu <wuyongji317@gmail.com> rebase fix Signed-off-by: Yongji Wu <wuyongji317@gmail.com> rebase fix Signed-off-by: Yongji Wu <wuyongji317@gmail.com> small fix Signed-off-by: Yongji Wu <wuyongji317@gmail.com> small fix Signed-off-by: Yongji Wu <wuyongji317@gmail.com> small fix Signed-off-by: Yongji Wu <wuyongji317@gmail.com> rebase fix Signed-off-by: Yongji Wu <wuyongji317@gmail.com>
Signed-off-by: Yongji Wu <wuyongji317@gmail.com> rebase fix Signed-off-by: Yongji Wu <wuyongji317@gmail.com> rebase fix Signed-off-by: Yongji Wu <wuyongji317@gmail.com> rebase fix Signed-off-by: Yongji Wu <wuyongji317@gmail.com> rebase fix Signed-off-by: Yongji Wu <wuyongji317@gmail.com>
8ba94c2 to
297bec9
Compare
💡 Codex Reviewhttps://github.com/vllm-project/vllm/blob/8ba94c2ec34f40b9b03752287e21c0e6baec2d00/vllm/distributed/stateless_coordinator.py#L285-L289 When a stateless group receives tensors on the CPU path, the data is dropped: ℹ️ About Codex in GitHubCodex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback". |
There was a problem hiding this comment.
Code Review
This pull request introduces a significant and complex feature: elastic scaling for expert parallelism (EP) by integrating the NIXL-EP kernel. The changes are extensive, touching many core components of vLLM's distributed infrastructure, including communication primitives, model execution, and configuration management. The core of this feature is the introduction of stateless communication groups, which allows for dynamic reconfiguration of the cluster topology without requiring a full restart. A state machine has been implemented to orchestrate the scaling operations (both up and down), which is a robust approach for such a complex distributed process. The implementation also includes optimizations for new worker startup, where they receive model weights from peers instead of loading from disk. Overall, the changes appear well-architected and the logic is consistent across the various components. I have found one high-severity issue related to a debug print statement that should be removed.
|
This pull request has merge conflicts that must be resolved before it can be |
|
Hi @libertyeagle, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, |
|
This pull request has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this pull request should remain open. Thank you! |
|
This pull request has merge conflicts that must be resolved before it can be |
Purpose
This is the 2nd PR towards milestone 2 of elastic EP. The 1st PR is #26278.
This PR integrates NIXL EP kernel.
NIXL EP is a EP kernel based on NIXL's device API. It provides elastic scaling capabilities, enabling dynamic addition and removal of processes (ranks) during runtime, without the need to destroy and recreate communicators during scaling up/down.
Test Plan
Performance testing script:
Qwen/Qwen3-30B-A3B-Thinking-2507-FP8model on 8xH100 with EP=8.vllm bench serve \ --model $MODEL_NAME \ --host $HOST \ --port $PORT \ --dataset-name random \ --random-input-len 128 \ --random-output-len 512 \ --num-prompts 512Test Result
CC List
@ruisearch42 @tlrmchlsmth @kouroshHakha