Declare Shape, Reshape, Transpose, Squeeze, Unsqueeze for opsets 21, 23 on CUDA#26075
Declare Shape, Reshape, Transpose, Squeeze, Unsqueeze for opsets 21, 23 on CUDA#26075
Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR adds support for Shape, Reshape, Transpose, Squeeze, and Unsqueeze operations for ONNX opsets 21 and 23 on the CUDA execution provider. This addresses issue #26065 by declaring these tensor operations for the newer opset versions.
Key changes include:
- Adding versioned kernel declarations for opsets 21-22 and new opset 23 kernel declarations
- Updating existing opset 13 and 19 kernels to be versioned up to opset 20
- Adding comprehensive test coverage for the new opset versions
Reviewed Changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| onnxruntime/core/providers/cuda/tensor/*.cc | Add CUDA kernel declarations for Shape, Reshape, Transpose, Squeeze, Unsqueeze ops for opsets 21-23 |
| onnxruntime/core/providers/cuda/cuda_execution_provider.cc | Update kernel class declarations and registrations to support versioned kernels and new opsets |
| onnxruntime/test/providers/cpu/tensor/*.cc | Add test cases for opsets 21 and 23 |
| onnxruntime/test/optimizer/transpose_optimizer_test.cc | Update transpose optimizer tests to include opsets 21 and 23 |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
|
Does it fix it? |
|
It works fine on the model I used to test. I did not check the whole list of missing kernels declarations. We could also change the logic behind the kernel selection. An operator is often upgraded because it supports more types but onnxruntime does not always implements the kernel for the new types. |
…23 on CUDA (microsoft#26075) ### Description Fixes microsoft#26065.
…Loop, Scan, ConstantOfShape, Size (microsoft#27102) When ONNX introduces a new version of an operator in opset 21, the kernel registry's VerifyVersion rejects non-versioned (open-ended) CUDA kernels because kernel_start_version != since_ver while kernel_end_version == INT_MAX. This causes those operators to fall back from CUDA to CPU, introducing unnecessary host↔device copies that can lead to value corruption on Windows. PR microsoft#26075 previously fixed this for Shape, Reshape, Transpose, Squeeze, and Unsqueeze. This commit extends the same fix to the remaining affected operators: Flatten, Identity, If, Loop, Scan, ConstantOfShape, and Size. For each operator: - Cap existing non-versioned kernel to opset 20 (VERSIONED) - Add VERSIONED(21, 22) kernel with identical type constraints - Add non-versioned opset 23 kernel for forward compatibility
…Loop, Scan, ConstantOfShape, Size (microsoft#27102) When ONNX introduces a new version of an operator in opset 21, the kernel registry's VerifyVersion rejects non-versioned (open-ended) CUDA kernels because kernel_start_version != since_ver while kernel_end_version == INT_MAX. This causes those operators to fall back from CUDA to CPU, introducing unnecessary host↔device copies that can lead to value corruption on Windows. PR microsoft#26075 previously fixed this for Shape, Reshape, Transpose, Squeeze, and Unsqueeze. This commit extends the same fix to the remaining affected operators: Flatten, Identity, If, Loop, Scan, ConstantOfShape, and Size. For each operator: - Cap existing non-versioned kernel to opset 20 (VERSIONED) - Add VERSIONED(21, 22) kernel with identical type constraints - Add non-versioned opset 23 kernel for forward compatibility
…Loop, Scan, ConstantOfShape, Size (microsoft#27102) When ONNX introduces a new version of an operator in opset 21, the kernel registry's VerifyVersion rejects non-versioned (open-ended) CUDA kernels because kernel_start_version != since_ver while kernel_end_version == INT_MAX. This causes those operators to fall back from CUDA to CPU, introducing unnecessary host↔device copies that can lead to value corruption on Windows. PR microsoft#26075 previously fixed this for Shape, Reshape, Transpose, Squeeze, and Unsqueeze. This commit extends the same fix to the remaining affected operators: Flatten, Identity, If, Loop, Scan, ConstantOfShape, and Size. For each operator: - Cap existing non-versioned kernel to opset 20 (VERSIONED) - Add VERSIONED(21, 22) kernel with identical type constraints - Add non-versioned opset 23 kernel for forward compatibility
…Loop, Scan, ConstantOfShape, Size (microsoft#27102) When ONNX introduces a new version of an operator in opset 21, the kernel registry's VerifyVersion rejects non-versioned (open-ended) CUDA kernels because kernel_start_version != since_ver while kernel_end_version == INT_MAX. This causes those operators to fall back from CUDA to CPU, introducing unnecessary host↔device copies that can lead to value corruption on Windows. PR microsoft#26075 previously fixed this for Shape, Reshape, Transpose, Squeeze, and Unsqueeze. This commit extends the same fix to the remaining affected operators: Flatten, Identity, If, Loop, Scan, ConstantOfShape, and Size. For each operator: - Cap existing non-versioned kernel to opset 20 (VERSIONED) - Add VERSIONED(21, 22) kernel with identical type constraints - Add non-versioned opset 23 kernel for forward compatibility
…Loop, Scan, ConstantOfShape, Size (#27728) ## Summary - Extend CUDA EP opset 21/23 kernel registrations to 7 additional operators that were updated in ONNX opset 21 but lacked proper CUDA kernel version declarations - Operators fixed: **Flatten**, **Identity**, **If**, **Loop**, **Scan**, **ConstantOfShape**, **Size** - Follows the identical pattern established in PR #26075 for Shape, Reshape, Transpose, Squeeze, Unsqueeze ## Motivation Fixes #27102. When ONNX introduces a new operator version in opset 21, ORT's `VerifyVersion` function in `kernel_registry.cc` rejects non-versioned (open-ended) CUDA kernels. The check at [kernel_registry.cc:L126-L133](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/framework/kernel_registry.cc#L126) requires either an exact version match or a bounded version range — a kernel registered as `since_version=N, end_version=INT_MAX` fails when `since_ver` (from the opset 21 schema) differs from `N`. This causes the affected operators to fall back from CUDA to CPU, introducing unnecessary host↔device memory copies. On Windows with CUDA EP, this fallback path can produce corrupted shape computation values (e.g., `124647109376` instead of `6`), leading to downstream Reshape failures. PR #26075 fixed this for Shape, Reshape, Transpose, Squeeze, and Unsqueeze. This PR extends the same fix to the 7 remaining operators that were updated in ONNX opset 21 and had non-versioned CUDA kernels. ## Changes For each of the 7 operators: 1. **Cap existing non-versioned kernel** to opset 20 (`ONNX_OPERATOR_KERNEL` → `ONNX_OPERATOR_VERSIONED_KERNEL`) 2. **Add VERSIONED(21, 22) kernel** with identical type constraints 3. **Add non-versioned opset 23 kernel** for forward compatibility (opset 23 introduced another schema update for these operators) Files modified: - `onnxruntime/core/providers/cuda/cuda_execution_provider.cc` — forward declarations + `BuildKernelCreateInfo` registration - `onnxruntime/core/providers/cuda/tensor/flatten.cc` - `onnxruntime/core/providers/cuda/tensor/identity_op.cc` - `onnxruntime/core/providers/cuda/tensor/size.cc` - `onnxruntime/core/providers/cuda/generator/constant_of_shape.cc` - `onnxruntime/core/providers/cuda/controlflow/if.cc` - `onnxruntime/core/providers/cuda/controlflow/loop.cc` - `onnxruntime/core/providers/cuda/controlflow/scan.cc` ## Test Plan - [ ] Verify CUDA EP build compiles successfully (CI) - [ ] Existing opset 21 tests for Shape/Reshape/Squeeze/Unsqueeze pass (validates the pattern) - [ ] Verify operators are no longer falling back to CPU when running opset 21 models on CUDA - [ ] No regression in existing CUDA EP tests
Description
Fixes #26065.