[CUDA] Update Pad kernel to support versioning and add tests for Opset 19 by lukas-folle-snkeos · Pull Request #27701 · microsoft/onnxruntime

lukas-folle-snkeos · 2026-03-17T15:53:36Z

Description

This pull request updates the CUDA kernel registration for the Pad operator in ONNX Runtime to use versioned kernel registration for opsets 18 through 23, and adds test coverage for CUDA-specific behavior.

Motivation and Context

Fixes #26393

…t 19

tianleiwu · 2026-03-17T18:17:11Z

Summary

The direction is good: moving CUDA Pad away from a single opset-18 registration and adding a CUDA-focused regression test is the right shape of change. However, the current registration range is not safe yet because it makes CUDA eligible for opset-19+ Pad(mode="wrap") nodes even though the CUDA kernels still implement only constant/reflect/edge, and it also stops short of the newer Pad schemas that already exist in-tree.

Review

1. CUDA Pad Versioning (`[onnxruntime/core/providers/cuda/tensor/pad.cc](https://github.com/microsoft/onnxruntime/blob/8107200dfd6dede7b8284edcdef1cab17df68525/onnxruntime/core/providers/cuda/tensor/pad.cc)`)

Positive:

Switching Pad from a single opset-18 registration to a versioned registration is the right mechanism for broadening CUDA coverage, and the matching updates in [onnxruntime/core/providers/cuda/cuda_execution_provider.cc](https://github.com/microsoft/onnxruntime/blob/8107200dfd6dede7b8284edcdef1cab17df68525/onnxruntime/core/providers/cuda/cuda_execution_provider.cc) keep the registry declarations consistent.

Concern:

⚠️ wrap mode is now advertised without a CUDA implementation: PadBase parses mode="wrap" into Mode::Wrap in [onnxruntime/core/providers/cpu/tensor/padbase.h](https://github.com/microsoft/onnxruntime/blob/8107200dfd6dede7b8284edcdef1cab17df68525/onnxruntime/core/providers/cpu/tensor/padbase.h), but the CUDA kernels only define three modes and only dispatch pad_mode values 0, 1, and 2 in [onnxruntime/core/providers/cuda/tensor/pad_impl.cu](https://github.com/microsoft/onnxruntime/blob/8107200dfd6dede7b8284edcdef1cab17df68525/onnxruntime/core/providers/cuda/tensor/pad_impl.cu). With this PR, opset-19 through opset-23 Pad nodes become eligible for CUDA placement, which includes mode="wrap" per the in-tree ONNX schema. The added test change only excludes CUDA from wrap tests; it does not stop real models from being assigned to CUDA here. That means a wrap Pad can now be placed on CUDA and hit an unsupported mode path at runtime.
```
// pad_impl.cu currently only dispatches 3 modes.
switch (pad_mode) {
  case 0: /* constant */ break;
  case 1: /* reflect */ break;
  case 2: /* edge */ break;
}
```

2. Schema Coverage And Tests (`[onnxruntime/test/providers/cpu/tensor/pad_test.cc](https://github.com/microsoft/onnxruntime/blob/8107200dfd6dede7b8284edcdef1cab17df68525/onnxruntime/test/providers/cpu/tensor/pad_test.cc)`)

Positive:

The new CUDA-only MLFloat16 test is useful and directly exercises the motivating scenario instead of relying on broader typed coverage to catch the registration change indirectly.

Concern:

⚠️ Coverage still stops before the latest Pad schemas: even if wrap placement is fixed, the new CUDA registration ends at opset 23 while this tree already carries Pad schemas for opsets 24 and 25 in [cmake/external/onnx/onnx/defs/operator_sets.h](https://github.com/microsoft/onnxruntime/blob/8107200dfd6dede7b8284edcdef1cab17df68525/cmake/external/onnx/onnx/defs/operator_sets.h) and [cmake/external/onnx/onnx/defs/tensor/defs.cc](https://github.com/microsoft/onnxruntime/blob/8107200dfd6dede7b8284edcdef1cab17df68525/cmake/external/onnx/onnx/defs/tensor/defs.cc). CPU already registers Pad separately for 24 and 25 in [onnxruntime/core/providers/cpu/tensor/pad.cc](https://github.com/microsoft/onnxruntime/blob/8107200dfd6dede7b8284edcdef1cab17df68525/onnxruntime/core/providers/cpu/tensor/pad.cc), so a non-wrap float/double/MLFloat16/bool Pad in an opset-24 or opset-25 model would still miss CUDA after this PR. I’d expect the CUDA registration and tests to cover the newer schema bands too once the wrap-mode issue is addressed.

Summary of Concerns

#	Severity	Component	Issue
1	High	CUDA Pad versioning	Registers opset-19+ Pad on CUDA even though `wrap` is part of those schemas and is not implemented in the CUDA kernels.
2	Suggestion	Schema coverage and tests	Registration only goes through opset 23, leaving opset 24/25 Pad models without CUDA coverage.

Verdict

REQUEST CHANGES — the current versioned registration would expose unsupported wrap Pad nodes to CUDA placement, so the PR is not safe to merge as-is.

tianleiwu · 2026-03-17T19:00:32Z

@lukas-folle-snkeos, I added a new PR #27708 to extend Pad up to opset 25. Your commit and contribution is kept. Thanks.

lukas-folle-snkeos · 2026-03-18T07:25:51Z

@tianleiwu thanks for getting to this so quickly. I am closing this PR in favor of your #27708
Thanks for keeping my contribution!

[CUDA] Update Pad kernel to support versioning and add tests for Opse…

8107200

…t 19

lukas-folle-snkeos closed this Mar 18, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CUDA] Update Pad kernel to support versioning and add tests for Opset 19#27701

[CUDA] Update Pad kernel to support versioning and add tests for Opset 19#27701
lukas-folle-snkeos wants to merge 1 commit intomicrosoft:mainfrom
lukas-folle-snkeos:main

lukas-folle-snkeos commented Mar 17, 2026

Uh oh!

tianleiwu commented Mar 17, 2026

Uh oh!

tianleiwu commented Mar 17, 2026

Uh oh!

lukas-folle-snkeos commented Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

lukas-folle-snkeos commented Mar 17, 2026

Description

Motivation and Context

Uh oh!

tianleiwu commented Mar 17, 2026

Summary

Review

1. CUDA Pad Versioning ([onnxruntime/core/providers/cuda/tensor/pad.cc](https://github.com/microsoft/onnxruntime/blob/8107200dfd6dede7b8284edcdef1cab17df68525/onnxruntime/core/providers/cuda/tensor/pad.cc))

2. Schema Coverage And Tests ([onnxruntime/test/providers/cpu/tensor/pad_test.cc](https://github.com/microsoft/onnxruntime/blob/8107200dfd6dede7b8284edcdef1cab17df68525/onnxruntime/test/providers/cpu/tensor/pad_test.cc))

Summary of Concerns

Verdict

Uh oh!

tianleiwu commented Mar 17, 2026

Uh oh!

lukas-folle-snkeos commented Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

1. CUDA Pad Versioning (`[onnxruntime/core/providers/cuda/tensor/pad.cc](https://github.com/microsoft/onnxruntime/blob/8107200dfd6dede7b8284edcdef1cab17df68525/onnxruntime/core/providers/cuda/tensor/pad.cc)`)

2. Schema Coverage And Tests (`[onnxruntime/test/providers/cpu/tensor/pad_test.cc](https://github.com/microsoft/onnxruntime/blob/8107200dfd6dede7b8284edcdef1cab17df68525/onnxruntime/test/providers/cpu/tensor/pad_test.cc)`)