Conversation
tianleiwu
added a commit
that referenced
this pull request
Mar 20, 2026
…1→22) (#27733) ### Description Extends CUDA kernel registrations for `GlobalAveragePool` and `GlobalMaxPool` from opset 1 only to the full opset 1–22 range. Follows the same pattern used for `MaxPool` in #27715. - **`core/providers/cuda/nn/pool.cc`** — Split single opset-1 registrations into versioned 1–21 + opset 22 for both NCHW and NHWC variants - **`core/providers/cuda/cuda_execution_provider.cc`** — Updated class declarations and `BuildKernelCreateInfo` entries (versioned 1–21, added opset 22) - **`core/providers/cuda/cuda_nhwc_kernels.cc`** — Same for NHWC kernel registrations - **`test/providers/cpu/nn/pool_op_test.cc`** — Added `GlobalAveragePool_22_CUDA` test - **`docs/OperatorKernels.md`** — Updated GlobalAveragePool and GlobalMaxPool entries from `1+` to `22+` / `[1, 21]` in both the ai.onnx and com.microsoft.internal.nhwc domains under CUDAExecutionProvider No functional changes to the kernel implementations—opsets 1 through 22 are spec-compatible for these ops. ### Motivation and Context `GlobalAveragePool` and `GlobalMaxPool` were registered at opset 1 only in the CUDA provider, creating a 21-version gap to the latest ONNX opset 22. Models exported at higher opsets would fail to find a matching CUDA kernel. Identified as P1 gaps in #27729. ### Limitations BF16 support for GlobalAveragePool-22 and GlobalMaxPool-22 is not added in this PR. --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com> Co-authored-by: Tianlei Wu <tlwu@microsoft.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This PR extends the CUDA
MaxPooloperator registration to support ONNX opset 22. The CUDA implementation already uses the same pooling semantics and attribute handling that applied to opset 12, so this change primarily exposes the existing kernel path for opset 22 and adds regression coverage to keep both the standard CUDA and CUDA NHWC paths working.Summary of Changes
CUDA Kernel Registration
onnxruntime/core/providers/cuda/nn/pool.ccMaxPoolkernel registration into12-21and22for both standard and NHWC layouts so opset 22 can reuse the existing implementation.onnxruntime/core/providers/cuda/cuda_execution_provider.ccMaxPool(22)on CUDA.onnxruntime/core/providers/cuda/cuda_nhwc_kernels.ccMaxPool(22).Test Coverage
onnxruntime/test/providers/cpu/nn/pool_op_test.ccMaxPoolwith indices output.onnxruntime/test/providers/cuda/nhwc/pool_test.ccMaxPoolat opset 22.Testing
ninjaobject builds:onnxruntime/core/providers/cuda/cuda_execution_provider.cconnxruntime/core/providers/cuda/cuda_nhwc_kernels.cconnxruntime/core/providers/cuda/nn/pool.cconnxruntime/test/providers/cpu/nn/pool_op_test.cconnxruntime/test/providers/cuda/nhwc/pool_test.ccgit diff --checkto confirm the patch is formatting-clean.onnxruntime_provider_testexpanded into a broad rebuild; runtime verification of the new CUDA and NHWC tests should still be run in CI or in a focused local test build.Motivation and Context
Related issue: #26393
The ONNX
MaxPool-22schema keeps the same core pooling behavior used by the existing CUDA-supported opsets for the current CUDA type set, but CUDA registration stopped at opset 12. That meant models usingMaxPoolat opset 22 could miss CUDA kernel assignment even though the implementation path was already compatible.This PR closes that gap by updating kernel registration and test coverage without changing the underlying CUDA compute logic or broadening CUDA type support.
Checklist