[2/N] Elastic EP Milestone 2: Integrating NIXL-EP by itayalroy · Pull Request #35627 · vllm-project/vllm

itayalroy · 2026-02-28T21:09:04Z

This PR is a rebase of #29630 originally authored by @libertyeagle that integrates NIXL-EP kernels into vLLM.
NIXL-EP is an implementation of expert-parallel communication kernels over NIXL's device API. It provides elastic scaling capabilities, enabling dynamic addition and removal of processes (ranks) during runtime, without the need to destroy and recreate communicators during scaling up/down.

This PR also includes a few small fixes to vLLM Elastic EP (#34861) that we found while thoroughly testing vLLM with NIXL-EP.

github-actions · 2026-02-28T21:09:12Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

mergify · 2026-02-28T21:13:34Z

Hi @itayalroy, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?

mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

gemini-code-assist

Code Review

This pull request integrates NIXL-EP kernels for elastic expert parallelism, which is a significant enhancement. The changes are mostly about adding the new nixl_ep backend and its related logic. The implementation also includes some important fixes, such as properly destroying NCCL communicators to prevent resource leaks.

My review focuses on ensuring thread safety in the new NixlEPAll2AllManager. I've identified a potential race condition when accessing the shared buffer and suggested a fix using a lock to ensure robustness in a multi-threaded environment.

gemini-code-assist · 2026-02-28T21:19:04Z

vllm/distributed/device_communicators/all2all.py

+    def get_handle(self, kwargs):
+        if (
+            NixlEPAll2AllManager._buffer is not None
+            and NixlEPAll2AllManager._buffer[1] == self.cpu_group.size()
+        ):
+            return NixlEPAll2AllManager._buffer[0]
+
+        num_experts_per_rank = kwargs["num_global_experts"] // kwargs["num_ep_ranks"]
+        nixl_kwargs = dict(
+            max_num_tokens_per_dp_rank=kwargs["max_num_tokens_per_dp_rank"],
+            token_hidden_size=kwargs["token_hidden_size"],
+            num_experts_per_rank=num_experts_per_rank,
+        )
+        if NixlEPAll2AllManager._buffer is None:
+            self._init_buffer(**nixl_kwargs)
+        else:
+            self._update_buffer()
+
+        assert NixlEPAll2AllManager._buffer is not None
+        handle = NixlEPAll2AllManager._buffer[0]
+        return handle


The _buffer class attribute is a shared mutable state. The get_handle method reads and writes to this shared state without any synchronization, which can lead to a race condition if called from multiple threads concurrently. This could happen, for example, during dynamic LoRA loading, leading to incorrect behavior or crashes.

To prevent this, a lock should be used to protect access to _buffer.

First, please add a lock to the NixlEPAll2AllManager class:

class NixlEPAll2AllManager(All2AllManagerBase): ... _lock = threading.Lock() ...

Then, wrap the get_handle method's logic with this lock as suggested below.

def get_handle(self, kwargs): with NixlEPAll2AllManager._lock: if ( NixlEPAll2AllManager._buffer is not None and NixlEPAll2AllManager._buffer[1] == self.cpu_group.size() ): return NixlEPAll2AllManager._buffer[0] num_experts_per_rank = kwargs["num_global_experts"] // kwargs["num_ep_ranks"] nixl_kwargs = dict( max_num_tokens_per_dp_rank=kwargs["max_num_tokens_per_dp_rank"], token_hidden_size=kwargs["token_hidden_size"], num_experts_per_rank=num_experts_per_rank, ) if NixlEPAll2AllManager._buffer is None: self._init_buffer(**nixl_kwargs) else: self._update_buffer() assert NixlEPAll2AllManager._buffer is not None handle = NixlEPAll2AllManager._buffer[0] return handle

@itayalroy could you address this comment?

I don't think this can actually happen, since get_handle() only appears to be called from a single thread during initial setup or elastic EP reconfiguration. In any case, this isn't on the data path, the cost of adding a lock here is negligible, so I added it to be safe.

mergify · 2026-03-03T22:20:10Z

Hi @itayalroy, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?

mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

tlrmchlsmth

One quesiont: Does NIXL-EP use NVLINK at all for intranode traffic? Is it suitable for MNNVL systems? And are there any de-duplication optimizations?

vllm/envs.py

tlrmchlsmth · 2026-03-05T20:55:04Z

vllm/distributed/device_communicators/all2all.py

+    def get_handle(self, kwargs):
+        if (
+            NixlEPAll2AllManager._buffer is not None
+            and NixlEPAll2AllManager._buffer[1] == self.cpu_group.size()
+        ):
+            return NixlEPAll2AllManager._buffer[0]
+
+        num_experts_per_rank = kwargs["num_global_experts"] // kwargs["num_ep_ranks"]
+        nixl_kwargs = dict(
+            max_num_tokens_per_dp_rank=kwargs["max_num_tokens_per_dp_rank"],
+            token_hidden_size=kwargs["token_hidden_size"],
+            num_experts_per_rank=num_experts_per_rank,
+        )
+        if NixlEPAll2AllManager._buffer is None:
+            self._init_buffer(**nixl_kwargs)
+        else:
+            self._update_buffer()
+
+        assert NixlEPAll2AllManager._buffer is not None
+        handle = NixlEPAll2AllManager._buffer[0]
+        return handle


@itayalroy could you address this comment?

vllm/model_executor/layers/fused_moe/nixl_ep_prepare_finalize.py

tlrmchlsmth · 2026-03-05T21:22:31Z

vllm/model_executor/layers/fused_moe/nixl_ep_prepare_finalize.py

this file is extremely similar to the DeepEP LL prepare_finalize implementation. Should we consolidate these?

@bnellnm WDYT?

It might be good to factor out some of the common utilities, e.g. dequant_fp8, maybe_roundup_layer_hidden_size, (maybe _do_quant?) but I think it might be good to keep the main implementations separate in case one or the other of the backends changes their API.

We expect nixl_ep_prepare_finalize.py and deepep_ll_prepare_finalize.py to diverge pretty quickly as NIXL-EP progresses, and possibly on the DeepEP side too, so preferred to keep them separate

OK, fair enough!

vllm/utils/network_utils.py

vllm/model_executor/layers/fused_moe/nixl_ep_prepare_finalize.py

itayalroy · 2026-03-08T12:44:08Z

One quesiont: Does NIXL-EP use NVLINK at all for intranode traffic? Is it suitable for MNNVL systems? And are there any de-duplication optimizations?

NVLink is used for intranode traffic, support for MNNVL is in review, and currently no de-duplication optimizations (although this might change, we are still evaluating the tradeoffs of that approach)

Signed-off-by: Yongji Wu <wuyongji317@gmail.com> rebase fix Signed-off-by: Yongji Wu <wuyongji317@gmail.com> rebase fix Signed-off-by: Yongji Wu <wuyongji317@gmail.com> rebase fix Signed-off-by: Yongji Wu <wuyongji317@gmail.com> rebase fix Signed-off-by: Yongji Wu <wuyongji317@gmail.com> Signed-off-by: Itay Alroy <ialroy@nvidia.com>

Scale-up after scale-down would hang indefinitely in the ZMQ poll loop waiting for engine identity messages. Without ROUTER_HANDOVER enabled on the ZMQ ROUTER socket, engines reconnecting with previously-used identities had their messages silently dropped, because the ROUTER still held stale routing entries from the dead connections. Signed-off-by: Itay Alroy <ialroy@nvidia.com>

Signed-off-by: Itay Alroy <ialroy@nvidia.com>

tlrmchlsmth

Last thing missing is better integration testing, which is tricky given the heavy dependency. From @itayalroy we should be able to pip install nixl in either the next NIXL release or the one after, which will make this easier to manage in the test image

Signed-off-by: Itay Alroy <ialroy@nvidia.com> Co-authored-by: Yongji Wu <wuyongji317@gmail.com> Co-authored-by: Ron Tourgeman <rtourgeman@nvidia.com> Signed-off-by: whycoming <120623296@qq.com>

Signed-off-by: Itay Alroy <ialroy@nvidia.com> Co-authored-by: Yongji Wu <wuyongji317@gmail.com> Co-authored-by: Ron Tourgeman <rtourgeman@nvidia.com>

Signed-off-by: Itay Alroy <ialroy@nvidia.com> Co-authored-by: Yongji Wu <wuyongji317@gmail.com> Co-authored-by: Ron Tourgeman <rtourgeman@nvidia.com> Signed-off-by: Athrael Soju <athrael.soju@gmail.com>

Signed-off-by: Itay Alroy <ialroy@nvidia.com> Co-authored-by: Yongji Wu <wuyongji317@gmail.com> Co-authored-by: Ron Tourgeman <rtourgeman@nvidia.com>

itayalroy requested review from ProExpertProg, WoosukKwon, hmellor, houseroad, mgoin, pavanimajety, robertgshaw2-redhat, tlrmchlsmth, yewentao256 and youkaichao as code owners February 28, 2026 21:09

mergify bot added nvidia kv-connector labels Feb 28, 2026

github-project-automation bot added this to NVIDIA Feb 28, 2026

gemini-code-assist bot reviewed Feb 28, 2026

View reviewed changes

itayalroy force-pushed the nixl_ep_integration branch from 5647180 to 2def741 Compare February 28, 2026 21:28

LucasWilkinson added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 3, 2026

itayalroy force-pushed the nixl_ep_integration branch 2 times, most recently from 9242642 to 020600d Compare March 3, 2026 22:05

itayalroy force-pushed the nixl_ep_integration branch 6 times, most recently from 6abe5cf to 29070b4 Compare March 4, 2026 17:03

tlrmchlsmth reviewed Mar 5, 2026

View reviewed changes

bnellnm reviewed Mar 5, 2026

View reviewed changes

vllm/model_executor/layers/fused_moe/nixl_ep_prepare_finalize.py Show resolved Hide resolved

mergify bot added the needs-rebase label Mar 7, 2026

itayalroy force-pushed the nixl_ep_integration branch from 29070b4 to 4429898 Compare March 8, 2026 11:55

mergify bot removed the needs-rebase label Mar 8, 2026

itayalroy force-pushed the nixl_ep_integration branch 2 times, most recently from 52728b1 to 80253cc Compare March 8, 2026 23:09

itayalroy requested a review from njhill as a code owner March 9, 2026 10:46

mergify bot added the v1 label Mar 9, 2026

itayalroy force-pushed the nixl_ep_integration branch 3 times, most recently from 260d1a1 to 08facd0 Compare March 9, 2026 17:17

libertyeagle and others added 6 commits March 11, 2026 13:42

Remove redundant comment from envs.py

4f2fc9c

Signed-off-by: Itay Alroy <ialroy@nvidia.com>

Move hidden size sort check to class definition

4881d6a

Signed-off-by: Itay Alroy <ialroy@nvidia.com>

Add lock for NIXL EP shared buffer

1ba4701

Signed-off-by: Itay Alroy <ialroy@nvidia.com>

Selective router handover

8df1eb6

Signed-off-by: Itay Alroy <ialroy@nvidia.com>

itayalroy force-pushed the nixl_ep_integration branch from 08facd0 to 8df1eb6 Compare March 11, 2026 11:42

tlrmchlsmth approved these changes Mar 12, 2026

View reviewed changes

github-project-automation bot moved this to Ready in NVIDIA Mar 12, 2026

tlrmchlsmth merged commit d5af196 into vllm-project:main Mar 13, 2026
72 of 73 checks passed

github-project-automation bot moved this from Ready to Done in NVIDIA Mar 13, 2026

Uh oh!

Conversation

itayalroy commented Feb 28, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Feb 28, 2026

Uh oh!

mergify bot commented Feb 28, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 28, 2026

Choose a reason for hiding this comment

Uh oh!

tlrmchlsmth Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

itayalroy Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

mergify bot commented Mar 3, 2026

Uh oh!

tlrmchlsmth left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tlrmchlsmth Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

tlrmchlsmth Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

bnellnm Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

itayalroy Mar 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tlrmchlsmth Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

itayalroy commented Mar 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tlrmchlsmth left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

itayalroy commented Feb 28, 2026 •

edited by github-actions bot

Loading

bnellnm Mar 5, 2026 •

edited

Loading

itayalroy Mar 8, 2026 •

edited

Loading

itayalroy commented Mar 8, 2026 •

edited

Loading

tlrmchlsmth left a comment •

edited

Loading