Skip to content

NIXL EP: Use VMM API for device memory allocation.#1415

Open
ofirfarjun7 wants to merge 29 commits intoai-dynamo:mainfrom
ofirfarjun7:topic/nixl-ep-use-vmm-api
Open

NIXL EP: Use VMM API for device memory allocation.#1415
ofirfarjun7 wants to merge 29 commits intoai-dynamo:mainfrom
ofirfarjun7:topic/nixl-ep-use-vmm-api

Conversation

@ofirfarjun7
Copy link
Copy Markdown
Contributor

@ofirfarjun7 ofirfarjun7 commented Mar 8, 2026

What?

Use VMM API for device memory allocation in nixl_ep

Why?

To support multi node nvlink.

How?

  • Create cuda allocator wrapper.
  • Replace calls for cudaMalloc with new allocator.
  • Fallback to cudaMalloc if fabric is not supported

Summary by CodeRabbit

  • Refactor

    • Endpoint memory management now uses virtual-memory-backed allocation throughout, replacing direct device allocations for more robust initialization, teardown, and automatic cleanup.
  • New Features

    • Added support for virtual memory / RDMA-backed device regions, enabling larger, more flexible buffers and improved interoperability with advanced device capabilities.

@ofirfarjun7 ofirfarjun7 requested review from a team, ebarilanM and itayalroy as code owners March 8, 2026 16:45
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Mar 8, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 8, 2026

👋 Hi ofirfarjun7! Thank you for contributing to ai-dynamo/nixl.

Your PR reviewers will review your contribution then trigger the CI to test your changes.

🚀

@ofirfarjun7 ofirfarjun7 changed the title Topic/nixl ep use vmm api NIXL EP: Use VMM API for device memory allocation. Mar 8, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 8, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds CUDA Driver VMM-backed allocation support: introduces vmm_region, vmm_init(size_t, CUdevice) and vmm_free(vmm_region&), replaces direct cudaMalloc/cudaFree with VMM-backed allocations in Buffer (workspace, RDMA, mask, sync, sync-count), updates init/destroy flows and memory-view integration.

Changes

Cohort / File(s) Summary
Header: types & Buffer fields
examples/device/ep/csrc/nixl_ep.hpp
Adds struct vmm_region { CUdeviceptr ptr; size_t size; CUmemGenericAllocationHandle handle; bool is_cuda_malloc; }; adds private vmm_region members to Buffer for workspace, rdma, mask, sync, and sync_count; adds <cuda.h>, <cuda_runtime.h>, and <stdexcept> includes.
Source: VMM helpers & allocator logic
examples/device/ep/csrc/nixl_ep.cpp
Introduces vmm_init(size_t, CUdevice) and vmm_free(vmm_region&), internal cuda_alloc_ctx, device capability/granularity checks, and fallback to cudaMalloc; replaces prior cudaMalloc/cudaFree with VMM-backed allocations and stores regions in new m_*_alloc members.
Buffer lifecycle & memory-view integration
examples/device/ep/csrc/...
Updates Buffer::init and Buffer::destroy to allocate/free via vmm_init/vmm_free, assign pointer fields from m_*_alloc.ptr, reset pointers to nullptr, and adapt calls to _nixl_ep_memory_views_create / _nixl_ep_memory_views_destroy to use VMM regions.
Includes & compilation
examples/device/ep/csrc/...
Adds CUDA Driver API includes and runtime headers required for VMM and driver calls.

Sequence Diagram(s)

mermaid
sequenceDiagram
rect rgba(200,200,255,0.5)
participant App as Buffer (app)
end
rect rgba(200,255,200,0.5)
participant Driver as CUDA Driver
end
rect rgba(255,200,200,0.5)
participant Device as GPU/Device
end
rect rgba(255,255,200,0.5)
participant Fallback as cudaMalloc/runtime
end

App->>Driver: vmm_init(size, device)
Driver->>Driver: check device attributes & granularity
alt VMM supported
Driver->>Device: CUmemCreateAllocation / map / reserve VMM
Device-->>Driver: allocation handle & device pointer
Driver-->>App: vmm_region {ptr, size, handle, is_cuda_malloc=false}
else Fallback
Driver->>Fallback: cudaMalloc(size)
Fallback-->>Driver: device pointer
Driver-->>App: vmm_region {ptr, size, handle=0, is_cuda_malloc=true}
end
App->>Driver: use ptr for buffers / create memory views
App->>Driver: vmm_free(vmm_region)
Driver->>Device: CUmemRelease / unmap (or cudaFree if fallback)
Driver-->>App: freed

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐇 I swapped my mallocs for mapped terrain,
Handles hug regions, pointers train,
Granularity snug, no stray pain,
Views aligned along the lane,
Hop—VMM carrots in my brain 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely describes the main change: replacing direct CUDA allocations with VMM API for device memory allocation in NIXL EP.
Description check ✅ Passed The PR description includes all required sections (What, Why, How) and provides sufficient detail about the changes and their purpose.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/device/ep/csrc/nixl_ep.hpp`:
- Around line 66-77: The two calls to cuDeviceGetAttribute (checking
CU_DEVICE_ATTRIBUTE_GPU_DIRECT_RDMA_WITH_CUDA_VMM_SUPPORTED and
CU_DEVICE_ATTRIBUTE_HANDLE_TYPE_FABRIC_SUPPORTED) do not check their CUresult
return values; update the code around the variables rdma_vmm_supported and
fabric_supported to capture the CUresult, test it against CUDA_SUCCESS, and on
failure throw or log a runtime_error that includes the cuGetErrorString result
and context (which attribute failed and for which device); ensure you only rely
on rdma_vmm_supported/fabric_supported after the call succeeds so you don't act
on zero-initialized values.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 8bc85385-7d61-405a-90d0-e86c5ca8956c

📥 Commits

Reviewing files that changed from the base of the PR and between 1870127 and 07674ee.

📒 Files selected for processing (2)
  • examples/device/ep/csrc/nixl_ep.cpp
  • examples/device/ep/csrc/nixl_ep.hpp

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/device/ep/csrc/nixl_ep.hpp`:
- Around line 102-110: The destructor ~cuda_allocator() currently unmaps and
releases VMM state without waiting for GPU work; fix by calling
cudaDeviceSynchronize() at the start of ~cuda_allocator() before any
cuMemUnmap/cuMemAddressFree/cuMemRelease calls so all in-flight
kernels/transfers are fenced; additionally, ensure allocator creation paths
cannot bypass that fence on exception by either making explicitly_destroy
default to false or wrapping allocator construction in the init paths
(_nixl_agent_init(), _nixl_ep_init(), or any init() that creates the allocator)
with a try/catch that calls cudaDeviceSynchronize() before rethrowing so
stack-unwound destructor runs safe; keep references to the methods destroy() and
~cuda_allocator() when making changes.
- Around line 56-64: The allocator currently queries the ambient CUDA context
via cuCtxGetDevice() which is unsafe; change cuda_allocator to accept an
explicit CUdevice (or device_id) parameter, set the context explicitly inside
the constructor using cuCtxSetCurrent() and handle errors (throw on failure),
then update Buffer::init call sites to pass the tracked device_id into
cuda_allocator so allocations are bound to the correct device regardless of
external context switches; ensure you remove the cuCtxGetDevice() usage in
cuda_allocator and propagate failures with clear runtime_error messages.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: d14696d0-b771-4297-bed3-a7596c6c4643

📥 Commits

Reviewing files that changed from the base of the PR and between 07674ee and 7b60106.

📒 Files selected for processing (1)
  • examples/device/ep/csrc/nixl_ep.hpp

throw std::runtime_error("Failed to get CUDA allocation granularity");
}

init_vmm(size, device, prop, granularity);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be the only thing we need to do per allocation, which makes me believe we don't need this cuda_allocator class, we can just verify once that we can allocate using vmm and get granularity etc and then just call init_vmm directly

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can change it.
You want to get rid of the class? or make it static?

@ofirfarjun7
Copy link
Copy Markdown
Contributor Author

/build

@ofirfarjun7 ofirfarjun7 marked this pull request as draft March 9, 2026 16:33
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/device/ep/csrc/nixl_ep.cpp`:
- Around line 70-127: The static cuda_alloc_ctx ctx binds device-specific fields
(prop.location.id, granularity, fabric support) to whichever device was active
at first cuCtxGetDevice() call, causing wrong-device VMM allocations; change
initialization so driver/version checks remain global but device-specific
queries (cuCtxGetDevice(), cuDeviceGetAttribute(),
cuMemGetAllocationGranularity()) are performed per-call or cached per-device
(keyed by device ID) instead of in the static cuda_alloc_ctx constructor—either
remove static cuda_alloc_ctx ctx and build a ctx per vmm_init()/allocation (with
a device-ID cache), or split cuda_alloc_ctx into a static global verifier and a
per-device struct populated on each allocation using the current context.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 120c00f8-eacc-4630-9658-e92390364a9c

📥 Commits

Reviewing files that changed from the base of the PR and between 7b60106 and 25b2e5b.

📒 Files selected for processing (2)
  • examples/device/ep/csrc/nixl_ep.cpp
  • examples/device/ep/csrc/nixl_ep.hpp

@ofirfarjun7 ofirfarjun7 marked this pull request as ready for review March 11, 2026 09:51
@ofirfarjun7 ofirfarjun7 requested a review from itayalroy March 11, 2026 09:51
@ofirfarjun7
Copy link
Copy Markdown
Contributor Author

/build

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (1)
examples/device/ep/csrc/nixl_ep.cpp (1)

70-121: ⚠️ Potential issue | 🟠 Major

Do not cache device-specific VMM state in a function-local static.

static cuda_alloc_ctx ctx(device); is initialized only on the first vmm_init() call, so every later allocation reuses that first device's prop.location.id, granularity, and fallback decision. In a multi-GPU process, buffers allocated on GPU 1 can end up using GPU 0's VMM properties, which defeats the multi-device support this change is introducing.

Suggested direction
-    static cuda_alloc_ctx ctx(device);
+    const cuda_alloc_ctx ctx(device);

If the repeated driver/version probe is a concern, keep that part in a separate one-time helper and build/cache the device-specific state per CUdevice.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/device/ep/csrc/nixl_ep.cpp` around lines 70 - 121, The
device-specific VMM context is incorrectly cached in a function-local static
(static cuda_alloc_ctx ctx(device);) causing all subsequent calls to reuse the
first device's prop.location.id, granularity and fallback decision; change this
by removing the function-local static and either (a) create a per-call
cuda_alloc_ctx instance (e.g., cuda_alloc_ctx ctx(device);) so each device is
probed correctly, or (b) implement a per-device cache keyed by CUdevice (e.g.,
std::unordered_map<CUdevice,cuda_alloc_ctx>) and look up/create the
cuda_alloc_ctx for the specific device, while extracting any global-only
driver/version probe into a separate one-time helper function to avoid repeated
work.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/device/ep/csrc/nixl_ep.cpp`:
- Around line 124-129: The code incorrectly passes a CUdeviceptr* to cudaMalloc;
change the allocation to use a temporary void* (e.g., void* tmp = nullptr), call
cudaMalloc(&tmp, size), check the return, and then assign region.ptr =
reinterpret_cast<CUdeviceptr>(tmp) (or static_cast if appropriate) so that
vmm_region.region.ptr receives the allocated device pointer without violating
the CUDA Runtime API contract.
- Around line 101-104: Replace the throw when rdma_vmm_supported is false with
an early return so the function can continue to the existing fallback path (the
later check of fabric_supported that falls back to cudaMalloc); specifically,
remove the std::runtime_error throw and return (keeping the function's normal
flow) when rdma_vmm_supported == false to match the behavior used for the CUDA
version and fabric support checks and allow the cudaMalloc fallback to execute.

In `@examples/device/ep/csrc/nixl_ep.hpp`:
- Around line 53-58: The vmm_region fields are left uninitialized causing
vmm_free() to operate on garbage values; update the vmm_region definition so its
members are zero-initialized by default (e.g., initialize CUdeviceptr ptr,
size_t size, and CUmemGenericAllocationHandle handle to zero or provide a
default ctor that sets them to 0) so that Buffer's members (m_rdma_alloc,
m_mask_alloc, m_sync_alloc, m_sync_count_alloc, m_workspace_alloc) are safe if
Buffer::~Buffer()/destroy() runs before init(); ensure the guard in vmm_free()
will reliably detect an unused region.

---

Duplicate comments:
In `@examples/device/ep/csrc/nixl_ep.cpp`:
- Around line 70-121: The device-specific VMM context is incorrectly cached in a
function-local static (static cuda_alloc_ctx ctx(device);) causing all
subsequent calls to reuse the first device's prop.location.id, granularity and
fallback decision; change this by removing the function-local static and either
(a) create a per-call cuda_alloc_ctx instance (e.g., cuda_alloc_ctx
ctx(device);) so each device is probed correctly, or (b) implement a per-device
cache keyed by CUdevice (e.g., std::unordered_map<CUdevice,cuda_alloc_ctx>) and
look up/create the cuda_alloc_ctx for the specific device, while extracting any
global-only driver/version probe into a separate one-time helper function to
avoid repeated work.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: cb93105b-51b9-4789-b86a-242031b99df3

📥 Commits

Reviewing files that changed from the base of the PR and between 25b2e5b and ee2fd64.

📒 Files selected for processing (2)
  • examples/device/ep/csrc/nixl_ep.cpp
  • examples/device/ep/csrc/nixl_ep.hpp

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/device/ep/csrc/nixl_ep.hpp`:
- Around line 53-58: The functions vmm_init and vmm_free currently have external
linkage but no header declarations; decide intended scope and fix accordingly:
if they are internal implementation details, add the static keyword to their
definitions in nixl_ep.cpp (i.e., mark vmm_init and vmm_free static) to restrict
linkage; if they are part of the API, add forward declarations for vmm_init and
vmm_free to the header alongside vmm_region so callers can see the prototypes
and linkage is explicit. Ensure the chosen change is applied consistently for
both functions.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 7af83afb-0d9f-4ed5-bc2c-d16193eb3605

📥 Commits

Reviewing files that changed from the base of the PR and between ee2fd64 and 7677e59.

📒 Files selected for processing (1)
  • examples/device/ep/csrc/nixl_ep.hpp

size_t size_ = 0;
CUmemGenericAllocationHandle handle_ = 0;
bool is_cuda_malloc_ = false;
bool vmm_addr_reserved_ = false;
Copy link
Copy Markdown
Contributor

@rakhmets rakhmets Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vmm_addr_reserved_ can be removed.

if (!ctx.fabric_supported) {
size_ = size;
is_cuda_malloc_ = true;
if (cudaMalloc(reinterpret_cast<void **>(&ptr_), size) != cudaSuccess) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cudaMalloc -> cuMemAlloc
cudaFree -> cuMemFree

to avoid cast, and #include <cuda_runtime.h> can be removed from vmm.hpp

Comment on lines +42 to +50
[[nodiscard]] size_t
size() const noexcept {
return size_;
}

[[nodiscard]] CUmemGenericAllocationHandle
handle() const noexcept {
return handle_;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be removed as they are unused.


[[nodiscard]] CUdeviceptr
ptr() const noexcept {
return ptr_;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe do reinterpret_cast here and return void *, as it only used to get the pointer.

Comment on lines +147 to +149
access_desc.location.type = CU_MEM_LOCATION_TYPE_DEVICE;
access_desc.location.id = device;
access_desc.flags = CU_MEM_ACCESS_FLAGS_PROT_READWRITE;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems that this is also should be done only once in cuda_alloc_ctx.


prop.type = CU_MEM_ALLOCATION_TYPE_PINNED;
prop.location.type = CU_MEM_LOCATION_TYPE_DEVICE;
prop.location.id = dev;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this implementation cuda_alloc_ctx is initialized only once.
So you can remove CUdevice device from the parameters list, call cuCtxGetDevice, and throw if it returns an error (which means that device should be set before constructing a vmm_region). And remove const CUdevice cu_dev = static_cast<CUdevice>(device_id); from nixl_ep.cpp.

Comment on lines +61 to +63
if (size == 0) {
throw std::invalid_argument("vmm_region: size must be non-zero");
}
Copy link
Copy Markdown
Contributor

@rakhmets rakhmets Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really needed? I guess it can be removed.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think cudaMalloc return success even for size == 0 (and nullptr for the ptr), so we will need to check it if we remove it.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it's safe to call cudaFree for 0 \ NULL \ nullptr. So, I think it's not really an exceptional case for this class.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this class as abstraction, don't you think we should hint the user if he call the ctr with zero?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion, no, we shouldn't. Since this is an abstraction over various methods of memory allocation. And in general, it is not forbidden to pass the zero size to allocators.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK cuMemCreate will fail if we pass zero.
If it's true don't you think it is strange that vmm will fail in some systems with zero and not fail in others?
I don't think user should care which API vmm used and it should get same behavior

@@ -0,0 +1,151 @@
/*
* SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.

@@ -0,0 +1,62 @@
/*
* SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.

@@ -0,0 +1,48 @@
/*
* SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.

#include <cuda_runtime.h>
#include <cstddef>

class vmm_region {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use an existing namespace nixl_ep.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants