Declare Shape, Reshape, Transpose, Squeeze, Unsqueeze for opsets 21, 23 on CUDA by xadupre · Pull Request #26075 · microsoft/onnxruntime

xadupre · 2025-09-18T08:58:36Z

Description

Copilot

Pull Request Overview

This PR adds support for Shape, Reshape, Transpose, Squeeze, and Unsqueeze operations for ONNX opsets 21 and 23 on the CUDA execution provider. This addresses issue #26065 by declaring these tensor operations for the newer opset versions.

Key changes include:

Adding versioned kernel declarations for opsets 21-22 and new opset 23 kernel declarations
Updating existing opset 13 and 19 kernels to be versioned up to opset 20
Adding comprehensive test coverage for the new opset versions

Reviewed Changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated no comments.

File	Description
onnxruntime/core/providers/cuda/tensor/*.cc	Add CUDA kernel declarations for Shape, Reshape, Transpose, Squeeze, Unsqueeze ops for opsets 21-23
onnxruntime/core/providers/cuda/cuda_execution_provider.cc	Update kernel class declarations and registrations to support versioned kernels and new opsets
onnxruntime/test/providers/cpu/tensor/*.cc	Add test cases for opsets 21 and 23
onnxruntime/test/optimizer/transpose_optimizer_test.cc	Update transpose optimizer tests to include opsets 21 and 23

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

yuslepukhin · 2025-09-19T01:53:22Z

Does it fix it?

xadupre · 2025-09-22T15:12:23Z

It works fine on the model I used to test. I did not check the whole list of missing kernels declarations. We could also change the logic behind the kernel selection. An operator is often upgraded because it supports more types but onnxruntime does not always implements the kernel for the new types.

…23 on CUDA (#26075) ### Description Fixes #26065.

…23 on CUDA (microsoft#26075) ### Description Fixes microsoft#26065.

…Loop, Scan, ConstantOfShape, Size (microsoft#27102) When ONNX introduces a new version of an operator in opset 21, the kernel registry's VerifyVersion rejects non-versioned (open-ended) CUDA kernels because kernel_start_version != since_ver while kernel_end_version == INT_MAX. This causes those operators to fall back from CUDA to CPU, introducing unnecessary host↔device copies that can lead to value corruption on Windows. PR microsoft#26075 previously fixed this for Shape, Reshape, Transpose, Squeeze, and Unsqueeze. This commit extends the same fix to the remaining affected operators: Flatten, Identity, If, Loop, Scan, ConstantOfShape, and Size. For each operator: - Cap existing non-versioned kernel to opset 20 (VERSIONED) - Add VERSIONED(21, 22) kernel with identical type constraints - Add non-versioned opset 23 kernel for forward compatibility

…Loop, Scan, ConstantOfShape, Size (#27728) ## Summary - Extend CUDA EP opset 21/23 kernel registrations to 7 additional operators that were updated in ONNX opset 21 but lacked proper CUDA kernel version declarations - Operators fixed: **Flatten**, **Identity**, **If**, **Loop**, **Scan**, **ConstantOfShape**, **Size** - Follows the identical pattern established in PR #26075 for Shape, Reshape, Transpose, Squeeze, Unsqueeze ## Motivation Fixes #27102. When ONNX introduces a new operator version in opset 21, ORT's `VerifyVersion` function in `kernel_registry.cc` rejects non-versioned (open-ended) CUDA kernels. The check at [kernel_registry.cc:L126-L133](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/framework/kernel_registry.cc#L126) requires either an exact version match or a bounded version range — a kernel registered as `since_version=N, end_version=INT_MAX` fails when `since_ver` (from the opset 21 schema) differs from `N`. This causes the affected operators to fall back from CUDA to CPU, introducing unnecessary host↔device memory copies. On Windows with CUDA EP, this fallback path can produce corrupted shape computation values (e.g., `124647109376` instead of `6`), leading to downstream Reshape failures. PR #26075 fixed this for Shape, Reshape, Transpose, Squeeze, and Unsqueeze. This PR extends the same fix to the 7 remaining operators that were updated in ONNX opset 21 and had non-versioned CUDA kernels. ## Changes For each of the 7 operators: 1. **Cap existing non-versioned kernel** to opset 20 (`ONNX_OPERATOR_KERNEL` → `ONNX_OPERATOR_VERSIONED_KERNEL`) 2. **Add VERSIONED(21, 22) kernel** with identical type constraints 3. **Add non-versioned opset 23 kernel** for forward compatibility (opset 23 introduced another schema update for these operators) Files modified: - `onnxruntime/core/providers/cuda/cuda_execution_provider.cc` — forward declarations + `BuildKernelCreateInfo` registration - `onnxruntime/core/providers/cuda/tensor/flatten.cc` - `onnxruntime/core/providers/cuda/tensor/identity_op.cc` - `onnxruntime/core/providers/cuda/tensor/size.cc` - `onnxruntime/core/providers/cuda/generator/constant_of_shape.cc` - `onnxruntime/core/providers/cuda/controlflow/if.cc` - `onnxruntime/core/providers/cuda/controlflow/loop.cc` - `onnxruntime/core/providers/cuda/controlflow/scan.cc` ## Test Plan - [ ] Verify CUDA EP build compiles successfully (CI) - [ ] Existing opset 21 tests for Shape/Reshape/Squeeze/Unsqueeze pass (validates the pattern) - [ ] Verify operators are no longer falling back to CPU when running opset 21 models on CUDA - [ ] No regression in existing CUDA EP tests

xadupre added 2 commits September 18, 2025 10:58

Declare Shape, Reshape, Transpose for opsets 21, 23 on CUDA

d87421c

add missing declaration

e1e7e1b

xadupre changed the title ~~Declare Shape, Reshape, Transpose for opsets 21, 23 on CUDA~~ Declare Shape, Reshape, Transpose, Squeeze, Unsqueeze for opsets 21, 23 on CUDA Sep 18, 2025

xadupre added 3 commits September 18, 2025 15:14

unit test

21ce4d8

fix inplace

33662d5

fixing unittest

004b671

yuslepukhin requested a review from Copilot September 19, 2025 01:51

Copilot AI reviewed Sep 19, 2025

View reviewed changes

docs

081ac15

xadupre added 2 commits September 22, 2025 17:15

fix docs

446a201

doc

0138c0a

xadupre marked this pull request as ready for review September 23, 2025 11:29

justinchuby approved these changes Sep 23, 2025

View reviewed changes

xadupre merged commit 6650e07 into main Sep 25, 2025
97 of 98 checks passed

xadupre deleted the xadupre/missingcudaop branch September 25, 2025 15:39

AzHicham mentioned this pull request Oct 23, 2025

[Performance] Missing CUDA Kernal for Pad, MaxPool, ConvTranspose opset 19-23 #26393

Open

fs-eire pushed a commit that referenced this pull request Oct 24, 2025

Declare Shape, Reshape, Transpose, Squeeze, Unsqueeze for opsets 21, …

a13a580

…23 on CUDA (#26075) ### Description Fixes #26065.

naomiOvad pushed a commit to naomiOvad/onnxruntime that referenced this pull request Nov 2, 2025

Declare Shape, Reshape, Transpose, Squeeze, Unsqueeze for opsets 21, …

48bb713

…23 on CUDA (microsoft#26075) ### Description Fixes microsoft#26065.

Rishi-Dave mentioned this pull request Mar 18, 2026

Add opset 21/23 CUDA kernel registrations for Flatten, Identity, If, Loop, Scan, ConstantOfShape, Size #27728

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Declare Shape, Reshape, Transpose, Squeeze, Unsqueeze for opsets 21, 23 on CUDA#26075

Declare Shape, Reshape, Transpose, Squeeze, Unsqueeze for opsets 21, 23 on CUDA#26075
xadupre merged 8 commits intomainfrom
xadupre/missingcudaop

xadupre commented Sep 18, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

yuslepukhin commented Sep 19, 2025

Uh oh!

xadupre commented Sep 22, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

xadupre commented Sep 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

yuslepukhin commented Sep 19, 2025

Uh oh!

xadupre commented Sep 22, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

xadupre commented Sep 18, 2025 •

edited

Loading