Skip to content

[CUDA] MaxPool-22#27715

Draft
tianleiwu wants to merge 1 commit intomainfrom
tlwu/20260317/cuda_max_pool
Draft

[CUDA] MaxPool-22#27715
tianleiwu wants to merge 1 commit intomainfrom
tlwu/20260317/cuda_max_pool

Conversation

@tianleiwu
Copy link
Copy Markdown
Contributor

Description

This PR extends the CUDA MaxPool operator registration to support ONNX opset 22. The CUDA implementation already uses the same pooling semantics and attribute handling that applied to opset 12, so this change primarily exposes the existing kernel path for opset 22 and adds regression coverage to keep both the standard CUDA and CUDA NHWC paths working.

Summary of Changes

CUDA Kernel Registration

File Change
onnxruntime/core/providers/cuda/nn/pool.cc Split CUDA MaxPool kernel registration into 12-21 and 22 for both standard and NHWC layouts so opset 22 can reuse the existing implementation.
onnxruntime/core/providers/cuda/cuda_execution_provider.cc Added matching provider-side kernel declarations and registry entries for MaxPool(22) on CUDA.
onnxruntime/core/providers/cuda/cuda_nhwc_kernels.cc Added matching NHWC CUDA declarations and registry entries for MaxPool(22).

Test Coverage

File Change
onnxruntime/test/providers/cpu/nn/pool_op_test.cc Added a CUDA-only opset 22 regression test that validates MaxPool with indices output.
onnxruntime/test/providers/cuda/nhwc/pool_test.cc Updated NHWC comparison coverage to instantiate MaxPool at opset 22.

Testing

  • Built the touched translation units successfully with targeted ninja object builds:
    • onnxruntime/core/providers/cuda/cuda_execution_provider.cc
    • onnxruntime/core/providers/cuda/cuda_nhwc_kernels.cc
    • onnxruntime/core/providers/cuda/nn/pool.cc
    • onnxruntime/test/providers/cpu/nn/pool_op_test.cc
    • onnxruntime/test/providers/cuda/nhwc/pool_test.cc
  • Ran git diff --check to confirm the patch is formatting-clean.
  • A focused runtime gtest pass was not completed locally because relinking onnxruntime_provider_test expanded into a broad rebuild; runtime verification of the new CUDA and NHWC tests should still be run in CI or in a focused local test build.

Motivation and Context

Related issue: #26393

The ONNX MaxPool-22 schema keeps the same core pooling behavior used by the existing CUDA-supported opsets for the current CUDA type set, but CUDA registration stopped at opset 12. That meant models using MaxPool at opset 22 could miss CUDA kernel assignment even though the implementation path was already compatible.

This PR closes that gap by updating kernel registration and test coverage without changing the underlying CUDA compute logic or broadening CUDA type support.

Checklist

  • Tests added/updated
  • Documentation updated (if applicable)
  • No breaking changes (or documented in description)
  • CI passes

@tianleiwu tianleiwu marked this pull request as draft March 17, 2026 21:30
tianleiwu added a commit that referenced this pull request Mar 20, 2026
…1→22) (#27733)

### Description

Extends CUDA kernel registrations for `GlobalAveragePool` and
`GlobalMaxPool` from opset 1 only to the full opset 1–22 range. Follows
the same pattern used for `MaxPool` in #27715.

- **`core/providers/cuda/nn/pool.cc`** — Split single opset-1
registrations into versioned 1–21 + opset 22 for both NCHW and NHWC
variants
- **`core/providers/cuda/cuda_execution_provider.cc`** — Updated class
declarations and `BuildKernelCreateInfo` entries (versioned 1–21, added
opset 22)
- **`core/providers/cuda/cuda_nhwc_kernels.cc`** — Same for NHWC kernel
registrations
- **`test/providers/cpu/nn/pool_op_test.cc`** — Added
`GlobalAveragePool_22_CUDA` test
- **`docs/OperatorKernels.md`** — Updated GlobalAveragePool and
GlobalMaxPool entries from `1+` to `22+` / `[1, 21]` in both the ai.onnx
and com.microsoft.internal.nhwc domains under CUDAExecutionProvider

No functional changes to the kernel implementations—opsets 1 through 22
are spec-compatible for these ops.

### Motivation and Context

`GlobalAveragePool` and `GlobalMaxPool` were registered at opset 1 only
in the CUDA provider, creating a 21-version gap to the latest ONNX opset
22. Models exported at higher opsets would fail to find a matching CUDA
kernel. Identified as P1 gaps in #27729.

### Limitations

BF16 support for GlobalAveragePool-22 and GlobalMaxPool-22 is not added
in this PR.

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com>
Co-authored-by: Tianlei Wu <tlwu@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant