Skip to content

[CUDA] RoiAlign for opset versions 16 and 22#27646

Merged
tianleiwu merged 17 commits intomainfrom
tlwu/roialign_bf16
Mar 23, 2026
Merged

[CUDA] RoiAlign for opset versions 16 and 22#27646
tianleiwu merged 17 commits intomainfrom
tlwu/roialign_bf16

Conversation

@tianleiwu
Copy link
Copy Markdown
Contributor

Support RoiAlign for opset versions 16 and 22

@tianleiwu tianleiwu marked this pull request as draft March 13, 2026 18:01
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can commit the suggested changes from lintrunner.

Comment thread onnxruntime/core/providers/cuda/object_detection/roialign.cc Outdated
Comment thread onnxruntime/core/providers/cuda/object_detection/roialign.cc Outdated
Comment thread onnxruntime/core/providers/cuda/object_detection/roialign.cc Outdated
Comment thread onnxruntime/core/providers/cuda/object_detection/roialign.cc Outdated
@tianleiwu tianleiwu marked this pull request as ready for review March 13, 2026 18:13
@tianleiwu tianleiwu requested a review from Copilot March 19, 2026 17:48
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds CUDA kernel support for RoiAlign across ONNX opset versions 16 and 22, including additional datatype coverage.

Changes:

  • Register CUDA RoiAlign kernels for opset ranges 10–15, 16–21, and opset 22.
  • Extend CUDA implementation to support MLFloat16 and BFloat16 (with accumulation types).
  • Add CUDA tests covering Float16 (opset 16/22) and BFloat16 (opset 22), and update kernel documentation.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
onnxruntime/test/providers/cpu/object_detection/roialign_test.cc Adds CUDA EP tests for Float16/BFloat16 across opset 16 and 22.
onnxruntime/core/providers/cuda/object_detection/roialign_impl.cu Improves numeric handling via accumulation types; enables half/BFloat16 specializations.
onnxruntime/core/providers/cuda/object_detection/roialign.cc Adds versioned kernel registrations for opset 10–15, 16–21, and opset 22; enables MLFloat16/BFloat16 registrations.
onnxruntime/core/providers/cuda/cuda_execution_provider.cc Adds kernel class declarations and registration entries for new RoiAlign variants/opsets.
docs/OperatorKernels.md Updates documented opset/type support matrix for RoiAlign.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment thread onnxruntime/core/providers/cuda/object_detection/roialign.cc Outdated
Comment thread onnxruntime/core/providers/cuda/object_detection/roialign_impl.cu Outdated
Comment thread onnxruntime/core/providers/cuda/cuda_execution_provider.cc
Comment thread docs/OperatorKernels.md
Comment thread onnxruntime/test/providers/cpu/object_detection/roialign_test.cc Outdated
Comment thread onnxruntime/core/providers/cuda/object_detection/roialign.cc Outdated
Comment thread onnxruntime/core/providers/cuda/cuda_execution_provider.cc Outdated
Comment thread onnxruntime/core/providers/cuda/object_detection/roialign_impl.cu Outdated
Comment thread onnxruntime/test/providers/cpu/object_detection/roialign_test.cc
Comment thread onnxruntime/core/providers/cuda/object_detection/roialign_impl.cu
@yuslepukhin
Copy link
Copy Markdown
Member

yuslepukhin commented Mar 20, 2026

There is also an issue in the header which is a pre-existing issue but might as well be fixed now.

Default coordinate_transformation_mode for opset 16+:

roialign.h:129-131 — The attribute is only read inside the if (GetAttr(...).IsOK()) guard. If the attribute is absent (which is valid per the spec), the if body is skipped and half_pixel_ stays at its roialign.h:143. But the ONNX spec for opset 16+ says the default for coordinate_transformation_mode is "half_pixel", which should set half_pixel_ = true. So when the attribute is omitted, the kernel silently behaves as "output_half_pixel" instead of the spec-mandated "half_pixel".

Note: for opset 10 (which has no coordinate_transformation_mode attribute), the false default is correct — it matches the original opset 10 behavior. #Resolved

Copy link
Copy Markdown
Member

@yuslepukhin yuslepukhin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🕐

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds CUDA Execution Provider support for RoiAlign across multiple ONNX operator versions, extending datatype coverage to FP16/BF16 where supported by the corresponding opset versions.

Changes:

  • Register CUDA RoiAlign kernels for opset ranges 10–15 and 16–21, and opset 22 (including MLFloat16/BFloat16 where applicable).
  • Update CUDA RoiAlign implementation to accumulate using a wider accumulation type for FP16/BF16 paths.
  • Add CUDA-focused unit tests for opset 16 and 22 (FP16/BF16) and adjust default coordinate_transformation_mode behavior for opset 16+.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
onnxruntime/test/providers/cpu/object_detection/roialign_test.cc Adds CUDA EP tests for opset 16/22 with FP16/BF16 and additional coordinate mode/sampling coverage.
onnxruntime/core/providers/cuda/object_detection/roialign_impl.cu Uses AccumulationType_t for safer FP16/BF16 accumulation; adds explicit instantiations for half/BFloat16.
onnxruntime/core/providers/cuda/object_detection/roialign.cc Updates CUDA kernel registrations to be versioned for opsets 10–15 and 16–21, plus opset 22 typed registrations.
onnxruntime/core/providers/cuda/cuda_execution_provider.cc Wires new RoiAlign kernel registrations into CUDA EP kernel registry and class declarations.
onnxruntime/core/providers/cpu/object_detection/roialign.h Sets default half_pixel_ behavior for opset 16+ when the attribute is absent, aligning with spec defaulting.
docs/OperatorKernels.md Updates generated kernel support table entries for CUDA RoiAlign across opset ranges and types.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread onnxruntime/core/providers/cuda/object_detection/roialign_impl.cu Outdated
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@yuslepukhin
Copy link
Copy Markdown
Member

roialign.h:129-131 — The attribute is only read inside the if (GetAttr(...).IsOK()) guard.

If the attribute is absent (which is valid per the spec), the if body is skipped and half_pixel_ stays at its roialign.h:143.

But the ONNX spec for opset 16+ says the default for coordinate_transformation_mode is "half_pixel", which should set half_pixel_ = true. So when the attribute is omitted, the kernel silently behaves as "output_half_pixel" instead of the spec-mandated "half_pixel".

@tianleiwu
Copy link
Copy Markdown
Contributor Author

tianleiwu commented Mar 20, 2026

roialign.h:129-131 — The attribute is only read inside the if (GetAttr(...).IsOK()) guard.

If the attribute is absent (which is valid per the spec), the if body is skipped and half_pixel_ stays at its roialign.h:143.

But the ONNX spec for opset 16+ says the default for coordinate_transformation_mode is "half_pixel", which should set half_pixel_ = true. So when the attribute is omitted, the kernel silently behaves as "output_half_pixel" instead of the spec-mandated "half_pixel".

Current logic:

If there is coordinate_transformation_mode attribute
    set half_pixel_ according to whether the attribute value is "half_pixel". 
else
    use the default value according to spec. The default is "half_pixel" for opset 16+, and "output_half_pixel" for opset 10.
``` #Resolved

yuslepukhin
yuslepukhin previously approved these changes Mar 20, 2026
Copy link
Copy Markdown
Member

@yuslepukhin yuslepukhin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

Copy link
Copy Markdown
Member

@yuslepukhin yuslepukhin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@tianleiwu tianleiwu merged commit 45b5900 into main Mar 23, 2026
91 checks passed
@tianleiwu tianleiwu deleted the tlwu/roialign_bf16 branch March 23, 2026 18:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants