[CUDA] MaxPool-22 by tianleiwu · Pull Request #27715 · microsoft/onnxruntime

tianleiwu · 2026-03-17T21:27:31Z

Description

This PR extends the CUDA MaxPool operator registration to support ONNX opset 22. The CUDA implementation already uses the same pooling semantics and attribute handling that applied to opset 12, so this change primarily exposes the existing kernel path for opset 22 and adds regression coverage to keep both the standard CUDA and CUDA NHWC paths working.

Summary of Changes

CUDA Kernel Registration

File	Change
`onnxruntime/core/providers/cuda/nn/pool.cc`	Split CUDA `MaxPool` kernel registration into `12-21` and `22` for both standard and NHWC layouts so opset 22 can reuse the existing implementation.
`onnxruntime/core/providers/cuda/cuda_execution_provider.cc`	Added matching provider-side kernel declarations and registry entries for `MaxPool(22)` on CUDA.
`onnxruntime/core/providers/cuda/cuda_nhwc_kernels.cc`	Added matching NHWC CUDA declarations and registry entries for `MaxPool(22)`.

Test Coverage

File	Change
`onnxruntime/test/providers/cpu/nn/pool_op_test.cc`	Added a CUDA-only opset 22 regression test that validates `MaxPool` with indices output.
`onnxruntime/test/providers/cuda/nhwc/pool_test.cc`	Updated NHWC comparison coverage to instantiate `MaxPool` at opset 22.

Testing

Built the touched translation units successfully with targeted ninja object builds:
- onnxruntime/core/providers/cuda/cuda_execution_provider.cc
- onnxruntime/core/providers/cuda/cuda_nhwc_kernels.cc
- onnxruntime/core/providers/cuda/nn/pool.cc
- onnxruntime/test/providers/cpu/nn/pool_op_test.cc
- onnxruntime/test/providers/cuda/nhwc/pool_test.cc
Ran git diff --check to confirm the patch is formatting-clean.
A focused runtime gtest pass was not completed locally because relinking onnxruntime_provider_test expanded into a broad rebuild; runtime verification of the new CUDA and NHWC tests should still be run in CI or in a focused local test build.

Motivation and Context

Related issue: #26393

The ONNX MaxPool-22 schema keeps the same core pooling behavior used by the existing CUDA-supported opsets for the current CUDA type set, but CUDA registration stopped at opset 12. That meant models using MaxPool at opset 22 could miss CUDA kernel assignment even though the implementation path was already compatible.

This PR closes that gap by updating kernel registration and test coverage without changing the underlying CUDA compute logic or broadening CUDA type support.

Checklist

Tests added/updated
Documentation updated (if applicable)
No breaking changes (or documented in description)
CI passes

…1→22) (#27733) ### Description Extends CUDA kernel registrations for `GlobalAveragePool` and `GlobalMaxPool` from opset 1 only to the full opset 1–22 range. Follows the same pattern used for `MaxPool` in #27715. - **`core/providers/cuda/nn/pool.cc`** — Split single opset-1 registrations into versioned 1–21 + opset 22 for both NCHW and NHWC variants - **`core/providers/cuda/cuda_execution_provider.cc`** — Updated class declarations and `BuildKernelCreateInfo` entries (versioned 1–21, added opset 22) - **`core/providers/cuda/cuda_nhwc_kernels.cc`** — Same for NHWC kernel registrations - **`test/providers/cpu/nn/pool_op_test.cc`** — Added `GlobalAveragePool_22_CUDA` test - **`docs/OperatorKernels.md`** — Updated GlobalAveragePool and GlobalMaxPool entries from `1+` to `22+` / `[1, 21]` in both the ai.onnx and com.microsoft.internal.nhwc domains under CUDAExecutionProvider No functional changes to the kernel implementations—opsets 1 through 22 are spec-compatible for these ops. ### Motivation and Context `GlobalAveragePool` and `GlobalMaxPool` were registered at opset 1 only in the CUDA provider, creating a 21-version gap to the latest ONNX opset 22. Models exported at higher opsets would fail to find a matching CUDA kernel. Identified as P1 gaps in #27729. ### Limitations BF16 support for GlobalAveragePool-22 and GlobalMaxPool-22 is not added in this PR. --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com> Co-authored-by: Tianlei Wu <tlwu@microsoft.com>

CUDA MaxPool-22

a578c0a

tianleiwu marked this pull request as draft March 17, 2026 21:30

tianleiwu mentioned this pull request Mar 18, 2026

[Feature Request] Extend CUDA ONNX Ops to latest opset version #27729

Open

Copilot AI mentioned this pull request Mar 18, 2026

Fill GlobalAveragePool and GlobalMaxPool opset gap in CUDA provider (1→22) #27733

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CUDA] MaxPool-22#27715

[CUDA] MaxPool-22#27715
tianleiwu wants to merge 1 commit intomainfrom
tlwu/20260317/cuda_max_pool

tianleiwu commented Mar 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tianleiwu commented Mar 17, 2026

Description

Summary of Changes

CUDA Kernel Registration

Test Coverage

Testing

Motivation and Context

Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant