EXAMPLES: Introduce NIXL-EP example#1043
Merged
ovidiusm merged 1 commit intoai-dynamo:mainfrom Dec 8, 2025
Merged
Conversation
|
👋 Hi itayalroy! Thank you for contributing to ai-dynamo/nixl. Your PR reviewers will review your contribution then trigger the CI to test your changes. 🚀 |
itayalroy
commented
Nov 20, 2025
697d7a4 to
e6a29b7
Compare
eaf37f0 to
b59b1ce
Compare
fa5dbc9 to
3658e83
Compare
ovidiusm
approved these changes
Dec 5, 2025
Contributor
|
/build |
Contributor
|
/ok to test 3658e83 |
brminich
approved these changes
Dec 5, 2025
Contributor
|
/build |
dmitry-tokarev-nv
approved these changes
Dec 5, 2025
Contributor
dmitry-tokarev-nv
left a comment
There was a problem hiding this comment.
reviewed file headers wrt licenses and copyright notices. LGTM!
Contributor
|
/build |
Contributor
Author
Contributor
|
/build |
Contributor
|
@itayalroy pls use GPG-signed commits so the gitlab, AWS CI will run automatically |
Contributor
|
/build |
Add an example implementation of expert-parallel dispatch and combine operations using the NIXL device API. Co-authored-by: Roey Azran <roeya@nvidia.com> Co-authored-by: Micha Dery <mdery@nvidia.com> Co-authored-by: Michal Shalev <mshalev@nvidia.com> Signed-off-by: Itay Alroy <ialroy@nvidia.com>
Contributor
|
/build |
Contributor
|
/ok to test efb2bf0 |
Contributor
|
Please use normal commits once the code is in review, I cannot see the diff for the changes |
Contributor
|
/build |
This was referenced Dec 12, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
NIXL EP: Expert-Parallel Communication Example
Overview
NIXL EP is a complete example implementation of expert-parallel communication for Mixture of Experts (MoE) models built on top of NIXL's device API. It provides elastic scaling capabilities, enabling dynamic addition and removal of processes (ranks) during runtime without disrupting existing connections, and leverages NIXL's RDMA and NVLink support for optimal performance.
Features
Buffer Initialization
NIXL EP provides a flexible buffer initialization pattern that supports dynamic rank management:
Key APIs
Buffer(rank_id, nvlink_backend, explicitly_destroy): Initialize the NIXL communication bufferupdate_memory_buffers(num_ranks, num_experts_per_rank, num_rdma_bytes): Prepare buffers for up tonum_ranksranks andnum_experts_per_rankexpertsconnect_ranks(remote_ranks): Establish NIXL connections to new peers (can be called multiple times)disconnect_ranks(remote_ranks): Clean up connections to departing peersTesting
The elastic test suite in
tests/elastic/validates dynamic scaling capabilities:Example Plan (
expansion_contraction.json):This plan defines three phases:
Getting Started
Build NIXL with NIXL EP:
First, configure the pkg-config paths (only needed when dependencies are installed to non-default paths)
Then, configure the NIXL plugin directory so it can find UCX plugin, and set the LD_LIBRARY_PATH so UCX can find rdma-core:
Build and install:
meson setup build \ -Ducx_path=<path to UCX install> \ -Dprefix=<path to NIXL install directory> \ -Dbuildtype=release \ -Dbuild_nixl_ep=true cd build ninja installFinally, configure PYTHONPATH to use NIXL EP:
Refer to tests/elastic/README.md for detailed instructions on how to run the elastic test suite.