CUDA Plugin Cleanup for Shared Kernel Helpers#27915
Conversation
There was a problem hiding this comment.
Pull request overview
This PR reduces CUDA plugin-specific code paths by extracting shared validation/attribute-parsing helpers into adapter-safe headers, and by extending the EP adapter API to expose allocators needed by CUDA kernels—improving parity between bundled and plugin CUDA EP behavior.
Changes:
- Added
ep::adapter::OpKernelInfo::GetAllocator(OrtMemType)to support allocator-dependent kernel initialization in plugin builds. - Extracted shared helpers for
ScatterNDshape validation andSpaceToDepth/DepthToSpaceattribute parsing/dimension validation, then reused them from CUDA/shared-provider code. - Removed several plugin-only CUDA fallbacks and aligned cuDNN frontend diagnostics between plugin and bundled builds; updated docs and added plugin regression tests.
Reviewed changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
include/onnxruntime/ep/adapter/op_kernel_info.h |
Exposes allocator access in adapter OpKernelInfo to enable shared CUDA initialization paths in plugin builds. |
onnxruntime/core/providers/cpu/tensor/scatter_nd.h |
Introduces scatter_nd_internal::ValidateShapes helper and routes ScatterND::ValidateShapes through it. |
onnxruntime/core/providers/cuda/tensor/scatter_nd.cc |
Switches CUDA ScatterND shape validation to shared helper and removes plugin-only duplicate validation. |
onnxruntime/core/providers/cuda/tensor/scatter_nd.h |
Removes conditional CPU include split now that validation is shared via helper. |
onnxruntime/core/providers/cpu/cpu_provider_shared.cc |
Routes shared-provider ScatterND validation through the extracted helper. |
onnxruntime/core/providers/shared_library/provider_bridge_provider.cc |
Updates bridge-side ScatterND validation to call the shared helper directly. |
onnxruntime/core/providers/cpu/tensor/space_depth_ops.h |
Extracts ReadBlocksize, ReadIsDCR, and dimension validation into space_depth_internal for reuse. |
onnxruntime/core/providers/cuda/tensor/space_depth_ops.h |
Removes plugin-only reimplementation and uses shared CPU base/helper logic for all builds. |
onnxruntime/core/providers/cuda/tensor/upsample.h |
Keeps persistent antialias lookup-table device buffer member available in plugin builds. |
onnxruntime/core/providers/cuda/tensor/upsample.cc |
Unifies plugin/bundled antialias lookup-table allocation/caching using allocator access. |
onnxruntime/core/providers/cuda/cudnn_common.h |
Preserves batch-norm epsilon warning behavior in plugin builds for parity. |
onnxruntime/core/providers/cuda/nn/conv.cc |
Aligns cuDNN frontend failure messages to always include frontend JSON (plugin parity). |
onnxruntime/core/providers/cuda/nn/conv_transpose.cc |
Extends cuDNN frontend failure diagnostics to include frontend JSON (matching Conv). |
docs/cuda_plugin_ep/cuda_plugin_ep_design.md |
Updates plugin design documentation to reflect the new shared-helper model and reduced #ifdef usage. |
onnxruntime/test/python/transformers/test_cuda_plugin_ep.py |
Adds targeted plugin regression tests for antialias Resize and ScatterND behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 15 out of 15 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- space_depth_ops.h: Add #include <string> and relax include guard to #ifndef SHARED_PROVIDER (matching scatter_nd.h pattern) so that plugin builds get op_kernel.h directly instead of relying on transitive includes from the adapter force-include chain. - op_kernel_info.h: Change adapter GetAllocator() to return nullptr when the EP is not associated or the OrtMemType is unsupported, matching the core OpKernelInfo::GetAllocator behavior and avoiding unexpected termination in ORT_NO_EXCEPTIONS builds.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 15 out of 15 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…dd missing <string> include
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 15 out of 15 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
/azp run Linux QNN CI Pipeline |
|
Azure Pipelines successfully started running 1 pipeline(s). |
…ugin_ops_clean_up
Description
This PR reduces the amount of CUDA plugin-specific compatibility code by moving reusable validation and attribute-reading logic into shared helper paths that work for both bundled and plugin builds. It also fills in a missing allocator hook in the EP adapter so plugin kernels can reuse the same initialization path as the in-tree CUDA EP, which simplifies maintenance and improves behavior parity. The follow-up changes update the CUDA plugin design doc to reflect the new shared-helper model and add focused plugin regression tests for the two runtime paths that changed most materially.
Summary of Changes
EP adapter and shared helper extraction
ep/adapter/op_kernel_info.hOpKernelInfo::GetAllocator(OrtMemType)so adapter-based kernels can request device or CPU temp allocators in plugin builds.cpu/tensor/scatter_nd.hscatter_nd_internal::ValidateShapesso the same logic can be reused outside the CPUScatterNDclass.cpu/tensor/space_depth_ops.hspace_depth_internalhelpers that can be shared by CUDA kernels.CUDA kernel cleanup and plugin parity
cuda/tensor/scatter_nd.ccScatterNDvalidation duplicate and reuses the shared helper implementation.cuda/tensor/scatter_nd.hcuda/tensor/space_depth_ops.hSpaceToDepth/DepthToSpacereimplementation and inherits from the shared base/helper logic in all builds.cuda/tensor/upsample.cccuda/tensor/upsample.hShared-provider and diagnostics alignment
cpu/cpu_provider_shared.ccScatterNDshape validation through the extracted helper.provider_bridge_provider.ccScatterND::ValidateShapesimplementation to call the shared helper directly.cuda/cudnn_common.hcuda/nn/conv.cccuda/nn/conv_transpose.ccConvbehavior.Documentation and regression coverage
cuda_plugin_ep_design.mdScatterND,SpaceDepth, andUpsamplenow use shared adapter-safe helper paths instead of plugin-only fallback branches.test_cuda_plugin_ep.pyResize/UpsampleandScatterND, covering the new allocator-backed lookup-table path and the sharedScatterNDvalidation helper.Testing
onnxruntime_BUILD_CUDA_EP_AS_PLUGIN=ONand verify the affected CUDA provider sources compile without the removed plugin-only fallback paths.ScatterND,SpaceToDepth/DepthToSpace,Resize/Upsample,Conv, andConvTransposein both plugin and bundled CUDA configurations.ResizeandScatterNDinonnxruntime/test/python/transformers/test_cuda_plugin_ep.py.Motivation and Context
The initial CUDA plugin enablement introduced several localized
#ifdef BUILD_CUDA_EP_AS_PLUGINbranches and helper copies to get kernels compiling under the adapter path. This cleanup pays down that compatibility debt by extracting the truly shared pieces into reusable helpers and by teaching the adapterOpKernelInfohow to provide the allocators those kernels already expect. The result is less duplicated logic, fewer plugin-only code paths to keep in sync, and better debugging consistency between the plugin EP and the built-in CUDA EP.Checklist