Skip to content

[CUDA] Update Pad kernel to support versioning and add tests for Opset 19#27701

Closed
lukas-folle-snkeos wants to merge 1 commit intomicrosoft:mainfrom
lukas-folle-snkeos:main
Closed

[CUDA] Update Pad kernel to support versioning and add tests for Opset 19#27701
lukas-folle-snkeos wants to merge 1 commit intomicrosoft:mainfrom
lukas-folle-snkeos:main

Conversation

@lukas-folle-snkeos
Copy link
Copy Markdown
Contributor

Description

This pull request updates the CUDA kernel registration for the Pad operator in ONNX Runtime to use versioned kernel registration for opsets 18 through 23, and adds test coverage for CUDA-specific behavior.

Motivation and Context

Fixes #26393

@tianleiwu
Copy link
Copy Markdown
Contributor

Summary

The direction is good: moving CUDA Pad away from a single opset-18 registration and adding a CUDA-focused regression test is the right shape of change. However, the current registration range is not safe yet because it makes CUDA eligible for opset-19+ Pad(mode="wrap") nodes even though the CUDA kernels still implement only constant/reflect/edge, and it also stops short of the newer Pad schemas that already exist in-tree.


Review

1. CUDA Pad Versioning ([onnxruntime/core/providers/cuda/tensor/pad.cc](https://github.com/microsoft/onnxruntime/blob/8107200dfd6dede7b8284edcdef1cab17df68525/onnxruntime/core/providers/cuda/tensor/pad.cc))

Positive:

  • Switching Pad from a single opset-18 registration to a versioned registration is the right mechanism for broadening CUDA coverage, and the matching updates in [onnxruntime/core/providers/cuda/cuda_execution_provider.cc](https://github.com/microsoft/onnxruntime/blob/8107200dfd6dede7b8284edcdef1cab17df68525/onnxruntime/core/providers/cuda/cuda_execution_provider.cc) keep the registry declarations consistent.

Concern:

  • ⚠️ wrap mode is now advertised without a CUDA implementation: PadBase parses mode="wrap" into Mode::Wrap in [onnxruntime/core/providers/cpu/tensor/padbase.h](https://github.com/microsoft/onnxruntime/blob/8107200dfd6dede7b8284edcdef1cab17df68525/onnxruntime/core/providers/cpu/tensor/padbase.h), but the CUDA kernels only define three modes and only dispatch pad_mode values 0, 1, and 2 in [onnxruntime/core/providers/cuda/tensor/pad_impl.cu](https://github.com/microsoft/onnxruntime/blob/8107200dfd6dede7b8284edcdef1cab17df68525/onnxruntime/core/providers/cuda/tensor/pad_impl.cu). With this PR, opset-19 through opset-23 Pad nodes become eligible for CUDA placement, which includes mode="wrap" per the in-tree ONNX schema. The added test change only excludes CUDA from wrap tests; it does not stop real models from being assigned to CUDA here. That means a wrap Pad can now be placed on CUDA and hit an unsupported mode path at runtime.
    // pad_impl.cu currently only dispatches 3 modes.
    switch (pad_mode) {
      case 0: /* constant */ break;
      case 1: /* reflect */ break;
      case 2: /* edge */ break;
    }

2. Schema Coverage And Tests ([onnxruntime/test/providers/cpu/tensor/pad_test.cc](https://github.com/microsoft/onnxruntime/blob/8107200dfd6dede7b8284edcdef1cab17df68525/onnxruntime/test/providers/cpu/tensor/pad_test.cc))

Positive:

  • The new CUDA-only MLFloat16 test is useful and directly exercises the motivating scenario instead of relying on broader typed coverage to catch the registration change indirectly.

Concern:

  • ⚠️ Coverage still stops before the latest Pad schemas: even if wrap placement is fixed, the new CUDA registration ends at opset 23 while this tree already carries Pad schemas for opsets 24 and 25 in [cmake/external/onnx/onnx/defs/operator_sets.h](https://github.com/microsoft/onnxruntime/blob/8107200dfd6dede7b8284edcdef1cab17df68525/cmake/external/onnx/onnx/defs/operator_sets.h) and [cmake/external/onnx/onnx/defs/tensor/defs.cc](https://github.com/microsoft/onnxruntime/blob/8107200dfd6dede7b8284edcdef1cab17df68525/cmake/external/onnx/onnx/defs/tensor/defs.cc). CPU already registers Pad separately for 24 and 25 in [onnxruntime/core/providers/cpu/tensor/pad.cc](https://github.com/microsoft/onnxruntime/blob/8107200dfd6dede7b8284edcdef1cab17df68525/onnxruntime/core/providers/cpu/tensor/pad.cc), so a non-wrap float/double/MLFloat16/bool Pad in an opset-24 or opset-25 model would still miss CUDA after this PR. I’d expect the CUDA registration and tests to cover the newer schema bands too once the wrap-mode issue is addressed.

Summary of Concerns

# Severity Component Issue
1 High CUDA Pad versioning Registers opset-19+ Pad on CUDA even though wrap is part of those schemas and is not implemented in the CUDA kernels.
2 Suggestion Schema coverage and tests Registration only goes through opset 23, leaving opset 24/25 Pad models without CUDA coverage.

Verdict

REQUEST CHANGES — the current versioned registration would expose unsupported wrap Pad nodes to CUDA placement, so the PR is not safe to merge as-is.

@tianleiwu
Copy link
Copy Markdown
Contributor

@lukas-folle-snkeos, I added a new PR #27708 to extend Pad up to opset 25. Your commit and contribution is kept. Thanks.

@lukas-folle-snkeos
Copy link
Copy Markdown
Contributor Author

@tianleiwu thanks for getting to this so quickly. I am closing this PR in favor of your #27708
Thanks for keeping my contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Performance] Missing CUDA Kernal for Pad, MaxPool, ConvTranspose opset 19-23

2 participants