Fill Squeeze and Unsqueeze CUDA opset gaps to opset 25#27739
Merged
Conversation
…pset 25 Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com>
Copilot created this pull request from a session on behalf of
tianleiwu
March 18, 2026 18:03
View session
Contributor
|
/azp run Windows GPU Doc Gen CI Pipeline |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Contributor
|
@copilot, please update docs/OperatorKernels.md, and merge latest main branch to this branch. |
…e-unsqueeze-operators
Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com>
Contributor
Author
tianleiwu
approved these changes
Mar 19, 2026
Contributor
There was a problem hiding this comment.
Pull request overview
Extends CUDA Execution Provider kernel registrations for Squeeze and Unsqueeze to cover ONNX opsets 23–25, aligning CUDA coverage with CPU so models exported at opset 24+ can resolve CUDA kernels.
Changes:
- Add CUDA kernel registrations for
Squeeze/Unsqueezeopset 24 and opset 25+, and cap opset 23 to23–23. - Update CUDA EP kernel registry declarations/registrations to match the new opset coverage.
- Update operator kernel documentation entries to reflect the new version coverage rows.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| onnxruntime/core/providers/cuda/tensor/unsqueeze.cc | Adds opset 23/24 versioned registrations and moves unversioned registration to opset 25+ |
| onnxruntime/core/providers/cuda/tensor/squeeze.cc | Adds opset 23/24 versioned registrations and moves unversioned registration to opset 25+ |
| onnxruntime/core/providers/cuda/cuda_execution_provider.cc | Updates forward declarations and kernel registry entries for opset 23–25 Squeeze/Unsqueeze |
| docs/OperatorKernels.md | Updates CUDA doc table rows to show Squeeze/Unsqueeze coverage through opset 25+ with explicit 24 and 23 rows |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
nenad1002
approved these changes
Mar 19, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Extends CUDA EP Squeeze and Unsqueeze kernel registrations from opset 23 to opset 25, matching CPU provider coverage.
squeeze.cc/unsqueeze.cc: Cap opset 23 to versioned23–23, add versioned24–24, add non-versioned25cuda_execution_provider.cc: Add corresponding forward declarations andBuildKernelCreateInforegistry entries for opsets 23 (now versioned), 24, and 25docs/OperatorKernels.md: Update CUDA Squeeze and Unsqueeze entries to reflect25+coverage with individual24and23version rowsNo new computation logic — these ops are shape-only (data is a
cudaMemcpyAsync), so the same kernel implementation covers all new opsets.Motivation and Context
CUDA EP registered Squeeze/Unsqueeze only up to opset 23 while the ONNX spec defines them through opset 25. Models exported at opset 24+ would fail to find a matching CUDA kernel. Part of the broader opset gap audit tracked in #27729.
Limitation
It does not include new data types for float8, float4, int4 etc. That will be added later if needed.