Implement/migrate Arena allocators to cuda plugin ep. by yuslepukhin · Pull Request #27931 · microsoft/onnxruntime

yuslepukhin · 2026-04-01T19:31:57Z

This pull request makes several improvements to the CUDA plugin Execution Provider (EP) in ONNX Runtime, with a major focus on integrating advanced memory arena support, improving CMake build logic, and updating documentation to reflect these enhancements. The most important changes are summarized below.

Memory Arena Integration and Documentation:

Added a full BFC-style arena (CudaArenaAllocator / ArenaImpl) and a CUDA native mempool allocator (CudaMempoolOrtAllocator) to the plugin, with detailed documentation of their design, integration with the ONNX Runtime core, and file layout. This enables stream-aware, shrinkable memory allocation for the CUDA plugin EP, matching ORT core capabilities.
Updated the file layout and design documentation to reflect the new arena and allocator classes, and documented how plugin allocators are now visible to session-level arena management via the standard IArena interface.
Removed the previous "future work" section about memory arena parity, as it is now implemented, and updated the "future work" section to focus on remaining tasks such as profiling, stream/adapter parity, and kernel registration validation.

CMake Build Improvements:

Refactored force-include logic for adapter headers in cmake/onnxruntime_providers_cuda_plugin.cmake to use generator expressions, removing redundant MSVC/GCC/Clang blocks and simplifying the build configuration.
Mirrored the source directory structure in the Visual Studio solution for better organization, and set the solution folder for the CUDA plugin target. [1] [2]
Added compile options to force /permissive and include iso646.h for MSVC builds, ensuring compatibility with CUTLASS headers and alternative C++ tokens.
Updated unit test CMake to include plugin-specific CUDA test sources when the plugin is enabled.

Documentation Fixes:

Fixed heading levels and updated descriptions for device synchronization and NHWC layout transformation support in the CUDA plugin EP design document. [1] [2]
Clarified which files have been replaced or refactored in the plugin (e.g., cuda_mempool_arena.cc is now replaced by plugin-native arena allocator files).

These changes collectively improve the maintainability, feature parity, and documentation quality of the CUDA plugin EP in ONNX Runtime.

Copilot

Pull request overview

Adds a design document describing how to integrate arena-backed allocators (BFCArena + CUDA mempool arena) into the CUDA Plugin Execution Provider to close the performance gap vs the in-tree CUDA EP.

Changes:

Document current allocator behavior differences between in-tree CUDA EP vs CUDA plugin EP.
Propose an ORT-core-side BFCArena wrapping approach for plugin-provided allocators (shared + per-session paths).
Outline a plan to migrate CudaMempoolArena into the plugin build, including logging and OrtAllocator wrapper considerations.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

tianleiwu

1. Problem Statement & Arena Mode Taxonomy (Sections 1–2)

Positive:

Clear side-by-side table showing the in-tree vs. plugin allocator gap. This immediately justifies the work.
The three arena modes (Default BFC, CUDA Mempool, No Arena) are accurately described. Correctly noting that DisableCpuMemArena() only affects the CPU EP — not the CUDA EP — prevents a common misconception.

2. Part A — BFCArena Integration via Option B (Section 3)

Positive:

Option B (wrapping at the two ORT core entry points) is clearly superior to Option A (wrapping at all callers). The comparison matrix in Section 3.6 makes the tradeoff transparent.
The allocator lifecycle diagrams for both Path 1 (shared) and Path 2 (per-session) are accurate when compared against the actual code in environment.cc (lines 567–582) and ep_plugin_provider_interfaces.cc (lines 666–705).
The observation that allocator_options is always nullptr today in both paths is confirmed by the code. This is the correct gap to fix.
The recommendation to extract a shared MaybeWrapInArena(...) helper (Section 3.7, Decided 4) is sound — the two sites need identical logic and must stay in sync.

Concern:

⚠️ CreateAllocator(AllocatorCreationInfo) requires an AllocatorFactory function, not an existing IAllocator: The design says to "wrap the returned IAllocator in BFCArena via CreateAllocator(AllocatorCreationInfo{...})." However, AllocatorCreationInfo takes a device_alloc_factory (a std::function<std::unique_ptr<IAllocator>(OrtDevice::DeviceId)>) — it constructs a new allocator, not wrapping an existing one. The proposed MaybeWrapInArena helper will need to either (a) directly construct BFCArena/StreamAwareBFCArena from the existing IAllocator (bypassing CreateAllocator), or (b) wrap the existing allocator in a factory lambda. The design should clarify which approach is intended, since option (a) duplicates the config-resolution logic in CreateAllocator, and option (b) has an awkward move-into-lambda lifecycle.

Suggested clarification: construct BFCArena/StreamAwareBFCArena directly, reusing the config-resolution logic from CreateAllocator inside the helper, or factor out the config-resolution into a separate function.
⚠️ Stream-awareness determination assumes memory info alone is sufficient: Decided 1 says the helper determines stream-awareness from OrtMemoryInfo — "if the memory is on a GPU device, create StreamAwareBFCArena; if it is host-accessible (pinned), create BFCArena." But OrtMemoryInfo doesn't have an explicit "is GPU vs. pinned" flag. The code would need to compare OrtDevice::Type() (GPU) and OrtDevice::MemType() (DEFAULT vs. CUDA_PINNED). This is workable but the design should be explicit about which fields are used so the helper doesn't silently pick the wrong arena type for future EPs with different device topologies (e.g., a unified-memory accelerator).
⚠️ RegisterExecutionProviderLibrary shared allocators may be too early for meaningful arena config: The design acknowledges this in Section 3.3 — shared allocator creation happens at environment level before any session. The proposed fix requires the user to put arena config in CreateEnvWithOptions config entries with ep_factory.<name>.arena.* keys. This is a new usage pattern that doesn't exist today. The document should call out that this is a user-facing API change requiring documentation, even though it reuses the existing OrtEnvCreationOptions::config_entries mechanism. If no env-level arena config is provided, the fallback to default OrtArenaCfg{0, -1, -1, -1, -1, -1L} means all plugin EPs that register shared allocators will get BFCArena wrapping by default. Confirm this is intentional — today plugin EPs get raw allocators and some may rely on that behavior.

3. Arena-Already-Handled Signal — Name Comparison (Section 3.5)

Positive:

Listing four candidate signal mechanisms and evaluating their tradeoffs is thorough.
Option (c) (allowing OrtArenaAllocator type) is correctly rejected — it reverses an explicitly enforced invariant in both CreateSharedAllocatorImpl and CreatePreferredAllocators.

Concern:

⚠️ Option (d) — name comparison — is fragile and has an inversion risk: The signal is "if the returned allocator's memory info name matches the OrtEpDevice's memory info name, it's a raw allocator → wrap in BFCArena; otherwise it's a self-contained arena → skip." This means:
- Any future factory that intentionally returns an allocator with a different name (e.g., for a debug allocator, a profiling wrapper, or a sub-device allocator) would be silently treated as "already arena-wrapped" and never get BFCArena.
- Conversely, if a mempool implementation accidentally uses the same name as the device's baseline, it would get double-wrapped.
- The convention is implicit — there's no compile-time or runtime assertion that the factory intentionally chose a different name.
Consider adding a comment/requirement in the EP factory API documentation that the name mismatch is the arena signal. Better yet, add an explicit bool is_arena field to OrtAllocator or a new optional API OrtAllocator::GetTraits() that can be checked. This would be a small, forward-compatible API addition that prevents silent misclassification.

Alternate approach: use option (c) but gate it differently — instead of rejecting OrtArenaAllocator outright, change the two call sites to skip BFCArena wrapping when the type is OrtArenaAllocator instead of failing. This is more explicit and less fragile than name comparison. The check is already there in both functions; it just needs to branch differently.

4. Config Conflict & Prefix Mismatch (Sections 3.7–3.8)

Positive:

The "conflict blindness" analysis is honest and correct. The EP only sees one config source at each lifecycle point, so conflicts are inherently undetectable.
The acceptance rationale — shared and per-session allocators are independent and don't compete — is valid for the arena use case.

Concern:

⚠️ Prefix mismatch (ep_factory.cuda.* vs ep.cudapluginexecutionprovider.*) is user-hostile: The document correctly identifies this as "a guaranteed source of user confusion" but defers to "documentation and examples." This is the kind of inconsistency that causes support burden. Consider adding a validation helper at environment creation time that warns if arena keys appear under only one prefix but not the other, or if the registration name vs. EP type name can be mapped automatically. Even a log-level warning would significantly reduce debugging time.

5. Part B — CudaMempoolArena Migration (Section 4)

Positive:

The dependency audit table in Section 4.1 is precise. The Logger dependency is correctly identified as the primary blocker — all other dependencies are plugin-safe.
Option 1 (conditional logger with macro bridge) is the right recommendation. The 6 logging sites genuinely aid debugging and shouldn't be stripped.
The CudaMempoolOrtAllocator wrapper design is clean and follows the same pattern as the existing CudaDeviceAllocator / CudaPinnedAllocator.

Concern:

⚠️ CudaMempoolOrtAllocator must report OrtDeviceAllocator, not OrtArenaAllocator — but Section 3.5 relies on name difference to detect it as an arena: These two constraints are consistent in the current design (Section 4.4 says "alloc_type must be OrtDeviceAllocator" and Section 3.5 uses name comparison to detect arenas). But Section 4.4's phrasing "ORT must not double-wrap it" could confuse a reader into thinking OrtArenaAllocator is the mechanism — add a forward reference to Section 3.5's decided signal mechanism.
⚠️ OrtSyncStream* → cudaStream_t in AllocOnStreamImpl: The design mentions OrtEpApi::SyncStream_GetHandle() for resolving stream handles. Confirm that this API is available from within the plugin factory's allocator context (i.e., that the OrtEpApi function table is accessible when AllocOnStream is called, not just during factory initialization). In the current plugin architecture, OrtEpApi is typically stored on the factory. The allocator wrapper would need a back-pointer to the factory or a cached function pointer.
⚠️ #ifdef BUILD_CUDA_EP_AS_PLUGIN in shared source — maintenance cost: Introducing build-conditional code in cuda_mempool_arena.h/cc adds maintenance risk. Every future change to the logging in this file must consider both branches. Consider whether a thin ICudaMempoolLogger interface (virtual, two impls) would be cleaner than preprocessor branching, at the cost of one virtual dispatch per log call (acceptable — logging is inherently slow-path).

6. Phased Plan (Section 5)

Positive:

The three-phase plan is well-sequenced — Phase 1 (BFCArena infrastructure) is a prerequisite for Phase 2 (CudaMempoolArena), and Phase 3 (validation) ensures parity.
Phase 3's checklist items are the right ones: mode selection, benchmarking, DisableCpuMemArena isolation, and shared-allocator replacement.

Concern:

⚠️ Missing: rollback / fallback plan if BFCArena wrapping causes regressions: Since Phase 1 changes ORT core behavior for all plugin EPs (shared allocators get BFCArena by default), there should be a session option or environment config key to disable arena wrapping for debugging/rollback. Something like ep_factory.<name>.arena.enable=0 that the MaybeWrapInArena helper checks before wrapping.
⚠️ Missing: thread-safety of shared allocator arena wrapping: RegisterExecutionProviderLibrary can be called from multiple threads (the environment is shared). BFCArena's construction is not inherently thread-safe if multiple sessions start simultaneously. The existing shared_allocators_ access is presumably mutex-protected, but the design should confirm that arena wrapping doesn't introduce additional synchronization requirements (e.g., BFCArena's internal state initialization is thread-safe if the allocator is shared across sessions).

7. Decisions & Open Questions (Section 6)

Positive:

The four "Decided" items are clear and well-reasoned.
Decided 2's pinned-allocator exception (always default config, never user-configurable) matches the in-tree EP behavior and avoids unnecessary complexity.
The "Needs validation" note about max_mem=0 upfront allocation behavior is a critical item — BFCArena with DEFAULT_MAX_MEM = std::numeric_limits<size_t>::max() doesn't pre-allocate at construction (it grows on demand starting with DEFAULT_INITIAL_CHUNK_SIZE_BYTES = 1MB), so this should be safe. But worth confirming.

Concern:

⚠️ Missing open question: what happens to existing plugin EP instances that call CreateSharedAllocator after RegisterExecutionProviderLibrary? The public API OrtApi::CreateSharedAllocator (not from RegisterExecutionProviderLibrary) also flows through CreateSharedAllocatorImpl. If arena wrapping is added there, user-created shared allocators would also get wrapped. Is this intended? The helper should only wrap when the allocator comes from a plugin factory's CreateAllocator, not when the user provides a custom allocator via the public API.

Summary of Concerns

#	Severity	Component	Issue
1	High	Arena Signal (§3.5)	Name-based arena detection is fragile; consider `OrtArenaAllocator` type as a skip-wrapping signal or an explicit trait
2	High	Phase 1 Rollback	No fallback mechanism to disable BFCArena wrapping if it causes regressions for plugin EPs
3	Suggestion	BFCArena Wrapping (§3.2)	`CreateAllocator(AllocatorCreationInfo)` takes a factory, not an existing allocator — design should clarify the wrapping mechanics
4	Suggestion	Shared Allocator Default (§3.2)	All plugin EPs get BFCArena by default — confirm this is intentional and won't break existing plugins
5	Suggestion	Config Prefix (§3.8)	Add validation/warning for mismatched `ep_factory.` vs `ep.` arena keys
6	Suggestion	Stream Handle (§4.4)	Confirm `OrtEpApi::SyncStream_GetHandle()` accessibility in allocator callback context
7	Suggestion	Public API (§6)	Distinguish plugin-factory allocators from user-provided allocators in `CreateSharedAllocatorImpl` to avoid unintended wrapping
8	Nitpick	Logger (§4.3)	Consider virtual `ICudaMempoolLogger` interface over `#ifdef` branching for maintainability
9	Nitpick	Stream-awareness (§3.2, Decided 1)	Document which `OrtMemoryInfo`/`OrtDevice` fields determine GPU-vs-pinned for the stream-awareness decision

Verdict

COMMENT — The design is sound and Option B is the right architectural choice. The two high-priority items (fragile name-based arena signal and missing rollback mechanism) should be resolved before implementation begins. The remaining suggestions are refinements that can be addressed during implementation PRs.

Copilot

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 4 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 4 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

tianleiwu

Review Summary

This PR implements a complete arena allocator system for the CUDA plugin EP, closing the allocation-performance gap with the in-tree CUDA EP. The work is well-structured: thorough design document (12 review iterations), BFC arena adapted from the example plugin EP, native CUDA mempool allocator, per-device shared arena lifecycle with ref counting, session/environment-level config plumbing, and a new Shrink API on OrtAllocator.

Code quality is high overall — exception handling is disciplined at the C ABI boundary, overflow checks are present, and pointer stability guarantees are well-documented.

2 high-priority concerns (use-after-free risk in arena pointer access; empty non-MSVC GetEnvironmentVar branch), 3 suggestions, and 3 nitpicks below.

Positives

AllocGuard RAII in Extend() prevents memory leaks on throw
Overflow protection in power-of-two doubling path
Config validation: negative value rejection for std::stoull, ORT_TRY/ORT_CATCH around every parser, IsValid() final check
Per-device DeviceCacheEntry with documented lock ordering (device_cache_mutex_ before arena_mutex)
strtol() with full bounds checking replaces atoi()
Version-gated Shrink/Reserve C++ wrappers prevent UB on older allocators
Comprehensive test coverage: config parsing, chunk splitting/coalescing, OOM, shrink, mempool, defensive tests

Copilot

Pull request overview

Copilot reviewed 45 out of 45 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

tianleiwu

Good progress — the arena migration architecture is sound, the per-device cache keyed by CUDA ordinal is correct, and many earlier review items have been addressed. Three blocking issues remain.

Addressed since last review

✅ Extend() memory leak — fixed with AllocGuard RAII
✅ Refcount underflow — now has LogWarning + early return instead of assert-only
✅ provider_api_shims.cc empty non-MSVC branch — fixed
✅ Use-after-free in stream plugin — fixed via ResetDeviceArenaChunksUsingStream holding arena_mutex
✅ IndexFor return type — changed to size_t
✅ max_mem negative value wrapping — now rejects leading -
✅ ToKeyValuePairs always emits stats (unconditional)
✅ Dead total_allocated_ member removed from mempool allocator
✅ Redundant inner kNextPowerOfTwo check removed from Extend
✅ Lock ordering + pointer stability comments added
✅ Duplicate DEFAULT_* constants removed from ArenaImpl

Summary of remaining concerns

#	Severity	Component	Issue
1	High	`CudaArenaAllocator` / `CudaMempoolOrtAllocator` / `EXCEPTION_TO_STATUS_END`	`ORT_HANDLE_EXCEPTION` discards lambda return value — `OrtStatus*` leaks, errors silently swallowed
2	High	`cuda_arena.cc` `SplitChunk` / `ResetChunksUsingStream`	Split remainder chunks keep stale stream pointers; `ResetChunksUsingStream` only clears tracked handles
3	High	`ep_plugin_provider_interfaces.cc`	Per-session plugin arenas always wrapped as `IAllocatorImplWrapping` (not `IArenaImplWrapping`), so `ShrinkMemoryArenas` cannot find them
4	Suggestion	`cuda_ep_factory.cc` `CreateAllocatorImpl`	Session arena options silently ignored after default shared allocator is created
5	Suggestion	`cuda_arena.h` config parsing	`std::stoi` silently rejects valid >2GB chunk sizes; invalid `extend_strategy` silently defaults to power-of-two
6	Suggestion	`cuda_mempool_allocator_plugin.cc` `AllocInternal`	`ORT_THROW` on non-OOM CUDA error → `abort()` under `ORT_NO_EXCEPTIONS`
7	Suggestion	Test coverage	No tests for `CudaMempoolOrtAllocator` or multi-GPU arena isolation

Copilot

Pull request overview

Copilot reviewed 47 out of 47 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

tianleiwu

APPROVE — This is a high-quality, well-designed implementation that brings the CUDA plugin EP to feature parity with the in-tree EP's arena allocation. The ORT core changes (AsArena(), IArenaImplWrappingOrtAllocator, SafeArenaCast simplification) are clean improvements that benefit the entire plugin EP ecosystem. The concerns are all suggestions for additional documentation/comments rather than correctness issues. The test coverage is excellent.

#	Severity	Component	Issue
1	Suggestion	`IArenaImplWrappingOrtAllocator`	`AllocOnStream` casts `Stream` → `OrtSyncStream` without precondition comment
2	Suggestion	`ArenaImpl::SplitChunk`	Stream-cleared remainder chunks may prevent coalescing, causing fragmentation over many runs
3	Suggestion	`CudaMempoolOrtAllocator::AllocImpl`	Default-stream semantics with per-thread streams could cause cross-thread ordering issues (caller responsibility, but worth documenting)
4	Suggestion	`CudaMempoolOrtAllocator` destructor	Flow between `cudaFreeAsync` loop → `SyncAllKnownStreams` → clear is correct but could use a clarifying comment
5	Nitpick	`CudaEpFactory`	Lock ordering between `CreateAllocatorImpl` and `ReleaseAllocatorImpl` is subtly different — correct but deserves a cross-reference comment

This pull request makes several improvements to the CUDA plugin Execution Provider (EP) in ONNX Runtime, with a major focus on integrating advanced memory arena support, improving CMake build logic, and updating documentation to reflect these enhancements. The most important changes are summarized below. **Memory Arena Integration and Documentation:** * Added a full BFC-style arena (`CudaArenaAllocator` / `ArenaImpl`) and a CUDA native mempool allocator (`CudaMempoolOrtAllocator`) to the plugin, with detailed documentation of their design, integration with the ONNX Runtime core, and file layout. This enables stream-aware, shrinkable memory allocation for the CUDA plugin EP, matching ORT core capabilities. * Updated the file layout and design documentation to reflect the new arena and allocator classes, and documented how plugin allocators are now visible to session-level arena management via the standard `IArena` interface. * Removed the previous "future work" section about memory arena parity, as it is now implemented, and updated the "future work" section to focus on remaining tasks such as profiling, stream/adapter parity, and kernel registration validation. **CMake Build Improvements:** * Refactored force-include logic for adapter headers in `cmake/onnxruntime_providers_cuda_plugin.cmake` to use generator expressions, removing redundant MSVC/GCC/Clang blocks and simplifying the build configuration. * Mirrored the source directory structure in the Visual Studio solution for better organization, and set the solution folder for the CUDA plugin target. [[1]](diffhunk://#diff-38eb6ad4f3ce15c7a2395d8eb4edcf95f415e3e557eae50da2590e9e0bbccf8fR114-R117) [[2]](diffhunk://#diff-38eb6ad4f3ce15c7a2395d8eb4edcf95f415e3e557eae50da2590e9e0bbccf8fL290-R294) * Added compile options to force `/permissive` and include `iso646.h` for MSVC builds, ensuring compatibility with CUTLASS headers and alternative C++ tokens. * Updated unit test CMake to include plugin-specific CUDA test sources when the plugin is enabled. **Documentation Fixes:** * Fixed heading levels and updated descriptions for device synchronization and NHWC layout transformation support in the CUDA plugin EP design document. [[1]](diffhunk://#diff-7bfdbe8f16ed58d5800f488aeb1f7882d96827ea3a2c429b58a743811f9d371cL273-R273) [[2]](diffhunk://#diff-7bfdbe8f16ed58d5800f488aeb1f7882d96827ea3a2c429b58a743811f9d371cL324-R324) * Clarified which files have been replaced or refactored in the plugin (e.g., `cuda_mempool_arena.cc` is now replaced by plugin-native arena allocator files). These changes collectively improve the maintainability, feature parity, and documentation quality of the CUDA plugin EP in ONNX Runtime.

…cherry picks (#28035) ### Description FF release cut to last merge prior to version bump. This also cherry-picks the following commits for the release: | Commit ID | PR Number | Commit Title | |-----------|-----------|-------------| | 4e1c42e | #27931 | Implement/migrate Arena allocators to cuda plugin ep. | | fbba40a | #27786 | Model Package Support | | c159603 | #28015 | Remove unnecessary model package test | | 87b0643 | #27997 | centralise feed authentication for ADO pipelines | ### Motivation and Context Minimise # of cherry picks. --------- Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Guenther Schmuelling <guschmue@microsoft.com> Co-authored-by: Hariharan Seshadri <shariharan91@gmail.com> Co-authored-by: Adrian Lizarraga <adlizarraga@microsoft.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Ti-Tai Wang <titaiwang@microsoft.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com> Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com> Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com> Co-authored-by: Dmitri Smirnov <yuslepukhin@users.noreply.github.com> Co-authored-by: Chi Lo <54722500+chilo-ms@users.noreply.github.com> Co-authored-by: eserscor <erscor@microsoft.com>

yuslepukhin requested a review from Copilot April 1, 2026 19:31

Copilot started reviewing on behalf of yuslepukhin April 1, 2026 19:37 View session

Copilot AI reviewed Apr 1, 2026

View reviewed changes

yuslepukhin requested a review from Copilot April 1, 2026 21:13

Copilot started reviewing on behalf of yuslepukhin April 1, 2026 21:14 View session

Copilot AI reviewed Apr 1, 2026

View reviewed changes

Comment thread docs/cuda_plugin_ep/arena_allocator_migration_design.md Outdated

yuslepukhin requested a review from Copilot April 1, 2026 21:28

Copilot started reviewing on behalf of yuslepukhin April 1, 2026 21:29 View session

Copilot AI reviewed Apr 1, 2026

View reviewed changes

Comment thread docs/cuda_plugin_ep/arena_allocator_migration_design.md Outdated

Comment thread docs/cuda_plugin_ep/arena_allocator_migration_design.md Outdated

Comment thread docs/cuda_plugin_ep/arena_allocator_migration_design.md Outdated

tianleiwu reviewed Apr 1, 2026

View reviewed changes

yuslepukhin requested review from Copilot and tianleiwu April 2, 2026 00:52

Copilot started reviewing on behalf of yuslepukhin April 2, 2026 00:53 View session

Copilot AI reviewed Apr 2, 2026

View reviewed changes

Comment thread docs/cuda_plugin_ep/arena_allocator_migration_design.md Outdated

Comment thread docs/cuda_plugin_ep/arena_allocator_migration_design.md

Comment thread docs/cuda_plugin_ep/arena_allocator_migration_design.md

yuslepukhin requested a review from Copilot April 2, 2026 02:12

Copilot started reviewing on behalf of yuslepukhin April 2, 2026 02:13 View session

Copilot AI reviewed Apr 2, 2026

View reviewed changes

yuslepukhin requested a review from Copilot April 2, 2026 03:20

Copilot started reviewing on behalf of yuslepukhin April 2, 2026 03:21 View session

Copilot AI reviewed Apr 2, 2026

View reviewed changes

yuslepukhin added 10 commits April 2, 2026 11:36

Cuda Arena migration plan

6519034

Update the design

26fcaae

Clarify IArena inhertance

9dad919

Address review comments

0027c19

Clarify Environment::CreateAndRegisterAllocatorV2()

ad48120

Address review comments

93850d9

Re-design for a in-plugin arena using examples as a base

318edae

Address review comments

b6973b6

Re-work inheritance of Cuda Arean allocators

6748f7d

Adjust CudaMempoolOrtAllocator

2bcd8d3

yuslepukhin added 2 commits April 6, 2026 18:23

Address review comments

9961b56

Merge branch 'main' into yuslepukhin/cuda_arena_ep

121d53b

tianleiwu requested changes Apr 7, 2026

View reviewed changes

yuslepukhin added 2 commits April 7, 2026 13:40

Add ArenaAllocator wrapper for Shrink and ReleaseStreamBuffers

982eb6a

Address review comments

540962d

yuslepukhin requested review from Copilot and tianleiwu April 7, 2026 21:20

Copilot started reviewing on behalf of yuslepukhin April 7, 2026 21:22 View session

Copilot AI reviewed Apr 7, 2026

View reviewed changes

Comment thread include/onnxruntime/core/session/onnxruntime_cxx_inline.h Outdated

Comment thread onnxruntime/core/providers/cuda/plugin/cuda_mempool_allocator_plugin.cc

Update docs

6151008

tianleiwu requested changes Apr 7, 2026

View reviewed changes

address review comments

9aebc8c

tianleiwu mentioned this pull request Apr 7, 2026

Add CUDA Graph support for the CUDA plugin EP #28002

Merged

3 tasks

yuslepukhin added 3 commits April 7, 2026 15:32

Address most recent comments

1c612cc

Address compile issues. Add test.

da13dd5

Merge branch 'main' into yuslepukhin/cuda_arena_ep

4eb4238

yuslepukhin requested a review from Copilot April 8, 2026 15:17

Copilot started reviewing on behalf of yuslepukhin April 8, 2026 15:18 View session

Copilot AI reviewed Apr 8, 2026

View reviewed changes

Comment thread onnxruntime/test/framework/ep_plugin_provider_test.cc

Comment thread docs/cuda_plugin_ep/cuda_plugin_ep_design.md

yuslepukhin added the release:1.25.0 label Apr 8, 2026

yuslepukhin added 2 commits April 8, 2026 08:55

Address review comments

e0204a8

Build error

65769d5

yuslepukhin requested a review from tianleiwu April 8, 2026 17:25

tianleiwu approved these changes Apr 8, 2026

View reviewed changes

yuslepukhin merged commit 4e1c42e into main Apr 8, 2026
96 checks passed

yuslepukhin deleted the yuslepukhin/cuda_arena_ep branch April 8, 2026 18:24

sanaa-hamel-microsoft mentioned this pull request Apr 10, 2026

FF rel-1.25 to last merge prior to version bump & add first round of cherry picks #28035

Merged

BrewTestBot mentioned this pull request Apr 20, 2026

onnxruntime 1.25.0 Homebrew/homebrew-core#278543

Merged

Conversation

yuslepukhin commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tianleiwu left a comment

Choose a reason for hiding this comment

1. Problem Statement & Arena Mode Taxonomy (Sections 1–2)

2. Part A — BFCArena Integration via Option B (Section 3)

3. Arena-Already-Handled Signal — Name Comparison (Section 3.5)

4. Config Conflict & Prefix Mismatch (Sections 3.7–3.8)

5. Part B — CudaMempoolArena Migration (Section 4)

6. Phased Plan (Section 5)

7. Decisions & Open Questions (Section 6)

Summary of Concerns

Verdict

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tianleiwu left a comment

Choose a reason for hiding this comment

Review Summary

Positives

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

tianleiwu left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

yuslepukhin commented Apr 1, 2026 •

edited

Loading

tianleiwu left a comment •

edited

Loading