CUDA Plugin Cleanup for Shared Kernel Helpers by tianleiwu · Pull Request #27915 · microsoft/onnxruntime

tianleiwu · 2026-03-31T20:39:55Z

Description

This PR reduces the amount of CUDA plugin-specific compatibility code by moving reusable validation and attribute-reading logic into shared helper paths that work for both bundled and plugin builds. It also fills in a missing allocator hook in the EP adapter so plugin kernels can reuse the same initialization path as the in-tree CUDA EP, which simplifies maintenance and improves behavior parity. The follow-up changes update the CUDA plugin design doc to reflect the new shared-helper model and add focused plugin regression tests for the two runtime paths that changed most materially.

Summary of Changes

EP adapter and shared helper extraction

File	Change
`ep/adapter/op_kernel_info.h`	Adds `OpKernelInfo::GetAllocator(OrtMemType)` so adapter-based kernels can request device or CPU temp allocators in plugin builds.
`cpu/tensor/scatter_nd.h`	Extracts shape validation into `scatter_nd_internal::ValidateShapes` so the same logic can be reused outside the CPU `ScatterND` class.
`cpu/tensor/space_depth_ops.h`	Moves blocksize parsing, mode parsing, and dimension validation into `space_depth_internal` helpers that can be shared by CUDA kernels.

CUDA kernel cleanup and plugin parity

File	Change
`cuda/tensor/scatter_nd.cc`	Removes the plugin-only `ScatterND` validation duplicate and reuses the shared helper implementation.
`cuda/tensor/scatter_nd.h`	Drops the old conditional include split now that validation is shared through the common helper path.
`cuda/tensor/space_depth_ops.h`	Deletes the plugin-only `SpaceToDepth`/`DepthToSpace` reimplementation and inherits from the shared base/helper logic in all builds.
`cuda/tensor/upsample.cc`	Reuses the normal antialias lookup-table allocation/caching path in plugin builds via the new allocator adapter support.
`cuda/tensor/upsample.h`	Keeps the persistent device lookup-table member available in plugin builds as well.

Shared-provider and diagnostics alignment

File	Change
`cpu/cpu_provider_shared.cc`	Routes shared-provider `ScatterND` shape validation through the extracted helper.
`provider_bridge_provider.cc`	Updates the bridge-side `ScatterND::ValidateShapes` implementation to call the shared helper directly.
`cuda/cudnn_common.h`	Preserves the batch-norm epsilon warning path in plugin builds instead of suppressing it.
`cuda/nn/conv.cc`	Removes plugin-specific shortened cuDNN frontend errors so bundled and plugin builds both include frontend JSON in failures.
`cuda/nn/conv_transpose.cc`	Extends cuDNN frontend failures to include frontend JSON for easier debugging, matching the `Conv` behavior.

Documentation and regression coverage

File	Change
`cuda_plugin_ep_design.md`	Updates the design doc to reflect that `ScatterND`, `SpaceDepth`, and `Upsample` now use shared adapter-safe helper paths instead of plugin-only fallback branches.
`test_cuda_plugin_ep.py`	Adds plugin regression coverage for antialias `Resize`/`Upsample` and `ScatterND`, covering the new allocator-backed lookup-table path and the shared `ScatterND` validation helper.

Testing

Build with onnxruntime_BUILD_CUDA_EP_AS_PLUGIN=ON and verify the affected CUDA provider sources compile without the removed plugin-only fallback paths.
Run targeted CUDA provider coverage for ScatterND, SpaceToDepth/DepthToSpace, Resize/Upsample, Conv, and ConvTranspose in both plugin and bundled CUDA configurations.
Confirm antialias upsample still initializes and uses the shared lookup table correctly in plugin builds.
Run the new plugin tests for antialias Resize and ScatterND in onnxruntime/test/python/transformers/test_cuda_plugin_ep.py.
Confirm cuDNN frontend failure paths now emit the same diagnostic detail in plugin and non-plugin builds.

Motivation and Context

The initial CUDA plugin enablement introduced several localized #ifdef BUILD_CUDA_EP_AS_PLUGIN branches and helper copies to get kernels compiling under the adapter path. This cleanup pays down that compatibility debt by extracting the truly shared pieces into reusable helpers and by teaching the adapter OpKernelInfo how to provide the allocators those kernels already expect. The result is less duplicated logic, fewer plugin-only code paths to keep in sync, and better debugging consistency between the plugin EP and the built-in CUDA EP.

Checklist

Tests added/updated
Documentation updated (if applicable)
No breaking changes (or documented in description)

Copilot

Pull request overview

This PR reduces CUDA plugin-specific code paths by extracting shared validation/attribute-parsing helpers into adapter-safe headers, and by extending the EP adapter API to expose allocators needed by CUDA kernels—improving parity between bundled and plugin CUDA EP behavior.

Changes:

Added ep::adapter::OpKernelInfo::GetAllocator(OrtMemType) to support allocator-dependent kernel initialization in plugin builds.
Extracted shared helpers for ScatterND shape validation and SpaceToDepth/DepthToSpace attribute parsing/dimension validation, then reused them from CUDA/shared-provider code.
Removed several plugin-only CUDA fallbacks and aligned cuDNN frontend diagnostics between plugin and bundled builds; updated docs and added plugin regression tests.

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
`include/onnxruntime/ep/adapter/op_kernel_info.h`	Exposes allocator access in adapter `OpKernelInfo` to enable shared CUDA initialization paths in plugin builds.
`onnxruntime/core/providers/cpu/tensor/scatter_nd.h`	Introduces `scatter_nd_internal::ValidateShapes` helper and routes `ScatterND::ValidateShapes` through it.
`onnxruntime/core/providers/cuda/tensor/scatter_nd.cc`	Switches CUDA ScatterND shape validation to shared helper and removes plugin-only duplicate validation.
`onnxruntime/core/providers/cuda/tensor/scatter_nd.h`	Removes conditional CPU include split now that validation is shared via helper.
`onnxruntime/core/providers/cpu/cpu_provider_shared.cc`	Routes shared-provider ScatterND validation through the extracted helper.
`onnxruntime/core/providers/shared_library/provider_bridge_provider.cc`	Updates bridge-side ScatterND validation to call the shared helper directly.
`onnxruntime/core/providers/cpu/tensor/space_depth_ops.h`	Extracts `ReadBlocksize`, `ReadIsDCR`, and dimension validation into `space_depth_internal` for reuse.
`onnxruntime/core/providers/cuda/tensor/space_depth_ops.h`	Removes plugin-only reimplementation and uses shared CPU base/helper logic for all builds.
`onnxruntime/core/providers/cuda/tensor/upsample.h`	Keeps persistent antialias lookup-table device buffer member available in plugin builds.
`onnxruntime/core/providers/cuda/tensor/upsample.cc`	Unifies plugin/bundled antialias lookup-table allocation/caching using allocator access.
`onnxruntime/core/providers/cuda/cudnn_common.h`	Preserves batch-norm epsilon warning behavior in plugin builds for parity.
`onnxruntime/core/providers/cuda/nn/conv.cc`	Aligns cuDNN frontend failure messages to always include frontend JSON (plugin parity).
`onnxruntime/core/providers/cuda/nn/conv_transpose.cc`	Extends cuDNN frontend failure diagnostics to include frontend JSON (matching Conv).
`docs/cuda_plugin_ep/cuda_plugin_ep_design.md`	Updates plugin design documentation to reflect the new shared-helper model and reduced `#ifdef` usage.
`onnxruntime/test/python/transformers/test_cuda_plugin_ep.py`	Adds targeted plugin regression tests for antialias Resize and ScatterND behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- space_depth_ops.h: Add #include <string> and relax include guard to #ifndef SHARED_PROVIDER (matching scatter_nd.h pattern) so that plugin builds get op_kernel.h directly instead of relying on transitive includes from the adapter force-include chain. - op_kernel_info.h: Change adapter GetAllocator() to return nullptr when the EP is not associated or the OrtMemType is unsupported, matching the core OpKernelInfo::GetAllocator behavior and avoiding unexpected termination in ORT_NO_EXCEPTIONS builds.

Copilot

Pull request overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…dd missing <string> include

Copilot

Pull request overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…cudaMemcpy

tianleiwu · 2026-04-06T17:11:39Z

/azp run Linux QNN CI Pipeline

azure-pipelines · 2026-04-06T17:11:50Z

Azure Pipelines successfully started running 1 pipeline(s).

…ugin_ops_clean_up

CUDA Plugin Cleanup for Shared Kernel Helpers

c565653

tianleiwu requested a review from Copilot March 31, 2026 20:40

Copilot started reviewing on behalf of tianleiwu March 31, 2026 20:41 View session

Copilot AI reviewed Mar 31, 2026

View reviewed changes

Comment thread onnxruntime/core/providers/cpu/tensor/scatter_nd.h Outdated

comments

bcf51a4

tianleiwu added the cuda_plugin label Apr 2, 2026

tianleiwu requested review from Copilot, kunal-vaishnavi and yuslepukhin April 2, 2026 03:59

Copilot started reviewing on behalf of tianleiwu April 2, 2026 04:01 View session

Copilot AI reviewed Apr 2, 2026

View reviewed changes

Comment thread onnxruntime/core/providers/cpu/tensor/space_depth_ops.h

Comment thread include/onnxruntime/ep/adapter/op_kernel_info.h Outdated

tianleiwu requested a review from Copilot April 4, 2026 22:05

Copilot started reviewing on behalf of tianleiwu April 4, 2026 22:06 View session

Copilot AI reviewed Apr 4, 2026

View reviewed changes

Comment thread include/onnxruntime/ep/adapter/op_kernel_info.h Outdated

Address review: delegate adapter GetAllocator to core OpKernelInfo, a…

3e7776a

…dd missing <string> include

tianleiwu requested a review from Copilot April 5, 2026 03:34

Copilot started reviewing on behalf of tianleiwu April 5, 2026 03:35 View session

Copilot AI reviewed Apr 5, 2026

View reviewed changes

Comment thread onnxruntime/core/providers/cpu/tensor/space_depth_ops.h Outdated

Comment thread onnxruntime/core/providers/cuda/tensor/scatter_nd.cc

Comment thread onnxruntime/core/providers/cuda/tensor/upsample.cc Outdated

Address review: guard op_kernel.h for plugin builds, use synchronous …

85d9e7e

…cudaMemcpy

kunal-vaishnavi approved these changes Apr 6, 2026

View reviewed changes

tianleiwu enabled auto-merge (squash) April 6, 2026 17:07

Merge remote-tracking branch 'origin/main' into tlwu/20260331/cuda_pl…

897c214

…ugin_ops_clean_up

tianleiwu merged commit 8c4245e into main Apr 7, 2026
105 of 108 checks passed

tianleiwu deleted the tlwu/20260331/cuda_plugin_ops_clean_up branch April 7, 2026 05:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA Plugin Cleanup for Shared Kernel Helpers#27915

CUDA Plugin Cleanup for Shared Kernel Helpers#27915
tianleiwu merged 6 commits into
mainfrom
tlwu/20260331/cuda_plugin_ops_clean_up

tianleiwu commented Mar 31, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tianleiwu commented Apr 6, 2026

Uh oh!

azure-pipelines Bot commented Apr 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

tianleiwu commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Summary of Changes

EP adapter and shared helper extraction

CUDA kernel cleanup and plugin parity

Shared-provider and diagnostics alignment

Documentation and regression coverage

Testing

Motivation and Context

Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tianleiwu commented Apr 6, 2026

Uh oh!

azure-pipelines Bot commented Apr 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tianleiwu commented Mar 31, 2026 •

edited

Loading