[CUDA EP] Add pad op version from 19 to 23 support for CUDA by ShirasawaSama · Pull Request #27416 · microsoft/onnxruntime

ShirasawaSama · 2026-02-22T16:21:08Z

Description

Add pad op version from 19 to 23 support for CUDA

Motivation and Context

The current CUDA executor does not support the pad operation in Opset from 19 to 23. When an ONNX model exported in Opset from 19 to 23 is run on the CUDA executor, the pad operation is forcibly offloaded to the CPU, resulting in significant performance degradation.

Copilot

Pull request overview

Adds CUDA Execution Provider coverage for ONNX Pad in opset 19–23 (previously only registered up to opset 18), including implementing wrap mode behavior so models exported with newer opsets no longer force a CPU fallback for Pad.

Changes:

Register CUDA Pad kernels for opset 19–20, 21–22, and 23 (and make opset 18 explicitly versioned).
Add CUDA kernel support for wrap mode, including handling negative pads via slicing metadata.
Update an existing wrap padding test comment now that CUDA is expected to support opset 19.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
`onnxruntime/test/providers/cpu/tensor/pad_test.cc`	Updates wrap-mode test context now that CUDA can register opset 19+ Pad.
`onnxruntime/core/providers/cuda/tensor/pad_impl.h`	Extends CUDA pad kernel APIs to accept slice/effective-dim metadata needed for wrap + negative pads.
`onnxruntime/core/providers/cuda/tensor/pad_impl.cu`	Implements `wrap` mode in CUDA kernels and wires new parameters through launch paths.
`onnxruntime/core/providers/cuda/tensor/pad.cc`	Adds CUDA kernel registrations for opset 19–23 and passes slice/effective dims into CUDA implementations.
`onnxruntime/core/providers/cuda/cuda_execution_provider.cc`	Declares/registers the additional versioned CUDA `Pad` kernels in the EP registry.

Comments suppressed due to low confidence (1)

onnxruntime/test/providers/cpu/tensor/pad_test.cc:1401

This test previously avoided CUDA by using an opset version CUDA didn’t register for. Now that CUDA is expected to support opset 19+, it would be good to make the test actually fail if Pad falls back to CPU (otherwise a future regression could silently reintroduce CPU offload while still passing). Consider running this case with session.disable_cpu_ep_fallback=1 and restricting execution providers to CUDA for this test so it validates the new CUDA registration/support for opset 19–23.

  OpTester test("Pad", 19);
  test.AddInput<float>("data", input_shape, input_data);
  test.AddInput<int64_t>("pads", {static_cast<int64_t>(pads.size())}, pads, true);
  test.AddOutput<float>("output", expected_shape, expected_data);
  test.AddAttribute("mode", "wrap");
  test.ConfigExcludeEps({kDmlExecutionProvider, kQnnExecutionProvider,
                         kTensorrtExecutionProvider, kWebGpuExecutionProvider});
  test.RunWithConfig();

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

hariharans29 · 2026-02-24T20:26:16Z


-  // CUDA registers only up to 18 and does not impl wrap mode
-  // so we force version to 19 to automatically exclude EPs that do not
-  // implement wrap mode similar to the above tests.


I am guessing there are Wrap mode Pad tests already in ?

hariharans29 · 2026-02-24T20:26:48Z

Can you please resolve the conflicts ?

ShirasawaSama · 2026-02-24T20:36:04Z

Sorry, I think I found some errors in my math formula (My final code review). I'll try adding more unit tests to cover them.

ShirasawaSama · 2026-03-03T06:45:19Z

The algorithm has now been modified to use the same formula as the CPU and has passed local testing without any noticeable performance degradation.

tianleiwu · 2026-03-19T20:55:15Z

This PR is superseded by #27774

### Description This PR consolidates PRs #27416 and #27708 to extend CUDA Pad kernel support through opset 25, including wrap mode implementation. ### Motivation and Context The CUDA execution provider previously only registered the Pad kernel up to opset 18 and did not implement wrap mode. When an ONNX model exported with opset 19+ was run on the CUDA executor, the Pad operation was forced to fall back to CPU, resulting in significant performance degradation. This PR aligns CUDA Pad registration with the ONNX Pad schema evolution through opset 25 and provides a correct wrap mode implementation. Related issues: #26393 Related PRs: #27416, #27708 ### Summary of Changes #### Kernel registration and opset coverage | File | Change | |------|--------| | `onnxruntime/core/providers/cuda/tensor/pad.cc` | Adds CUDA Pad kernel registrations for opset ranges 18, 19-20, 21-22, 23, 24, and 25. | | `onnxruntime/core/providers/cuda/cuda_execution_provider.cc` | Registers the new Pad kernel versions in the CUDA EP registry under the existing per-opset sections. | #### CUDA Pad implementation | File | Change | |------|--------| | `onnxruntime/core/providers/cuda/tensor/pad_impl.h` | Extends the Pad kernel interface to pass effective sliced extents and per-axis input offsets. | | `onnxruntime/core/providers/cuda/tensor/pad_impl.cu` | Adds CUDA wrap mode using a `WrapCoordinate` device helper with `if constexpr` compile-time specialization. Removes dead wrap code from the NCHW-specialized kernel path. | | `onnxruntime/core/providers/cuda/tensor/pad.cc` | Computes effective sliced input extents/offsets for wrap behavior with negative pads. Bypasses the NCHW fast-path for wrap mode and routes through the generic implementation. | #### Documentation | File | Change | |------|--------| | `docs/OperatorKernels.md` | Updates the CUDA Pad kernel opset coverage to reflect the new version splits (25+, 24, 23, [21,22], [19,20], 18) up to opset 25. | #### Test coverage | File | Change | |------|--------| | `onnxruntime/test/providers/cpu/tensor/pad_test.cc` | Adds CUDA-only Pad coverage for `edge` across opsets 18-25 and `wrap` across opsets 19-25. Updates existing wrap test comment. | ### Checklist - [x] Tests added/updated - [x] No breaking changes  --- ✨ Let Copilot coding agent [set things up for you](https://github.com/microsoft/onnxruntime/issues/new?title=✨+Set+up+Copilot+instructions&body=Configure%20instructions%20for%20this%20repository%20as%20documented%20in%20%5BBest%20practices%20for%20Copilot%20coding%20agent%20in%20your%20repository%5D%28https://gh.io/copilot-coding-agent-tips%29%2E%0A%0A%3COnboard%20this%20repo%3E&assignees=copilot) — coding agent works faster and does higher quality work when set up for your repo. --------- Co-authored-by: Shirasawa <764798966@qq.com> Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com> Co-authored-by: Tianlei Wu <tlwu@microsoft.com>

ShirasawaSama changed the title ~~Add pad op version from 19 to 23 support for CUDA~~ [CUDA EP] Add pad op version from 19 to 23 support for CUDA Feb 23, 2026

tianleiwu requested a review from Copilot February 24, 2026 19:05

Copilot started reviewing on behalf of tianleiwu February 24, 2026 19:06 View session

Copilot AI reviewed Feb 24, 2026

View reviewed changes

hariharans29 reviewed Feb 24, 2026

View reviewed changes

This comment was marked as outdated.

Sign in to view

ShirasawaSama force-pushed the feature/add-pad-op-version-19-to-23-support-for-CUDA branch 3 times, most recently from 3b7e80a to 37eabee Compare March 2, 2026 19:23

Add pad op version 19 to 23 support for CUDA

3dc473e

ShirasawaSama force-pushed the feature/add-pad-op-version-19-to-23-support-for-CUDA branch from 37eabee to 3dc473e Compare March 9, 2026 14:36

hariharans29 mentioned this pull request Mar 17, 2026

[CUDA] Extend Pad support through opset 25 #27708

Closed

4 tasks

Copilot AI mentioned this pull request Mar 19, 2026

[CUDA] Extend Pad support through opset 25 with wrap mode #27774

Merged

2 tasks

tianleiwu closed this Mar 19, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CUDA EP] Add pad op version from 19 to 23 support for CUDA#27416

[CUDA EP] Add pad op version from 19 to 23 support for CUDA#27416
ShirasawaSama wants to merge 1 commit intomicrosoft:mainfrom
ShirasawaSama:feature/add-pad-op-version-19-to-23-support-for-CUDA

ShirasawaSama commented Feb 22, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

hariharans29 Feb 24, 2026

Uh oh!

ShirasawaSama Feb 25, 2026

Uh oh!

hariharans29 commented Feb 24, 2026

Uh oh!

ShirasawaSama commented Feb 24, 2026 •

edited

Loading

Uh oh!

This comment was marked as outdated.

ShirasawaSama commented Mar 3, 2026

Uh oh!

tianleiwu commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

ShirasawaSama commented Feb 22, 2026

Description

Motivation and Context

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

hariharans29 Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

ShirasawaSama Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

hariharans29 commented Feb 24, 2026

Uh oh!

ShirasawaSama commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as outdated.

ShirasawaSama commented Mar 3, 2026

Uh oh!

tianleiwu commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ShirasawaSama commented Feb 24, 2026 •

edited

Loading