Skip to content

Fix RoiAlign heap out-of-bounds read via unchecked batch_indices#27543

Merged
tianleiwu merged 5 commits intomainfrom
tlwu/fix_roialign_oob
Mar 5, 2026
Merged

Fix RoiAlign heap out-of-bounds read via unchecked batch_indices#27543
tianleiwu merged 5 commits intomainfrom
tlwu/fix_roialign_oob

Conversation

@tianleiwu
Copy link
Copy Markdown
Contributor

@tianleiwu tianleiwu commented Mar 3, 2026

Fix RoiAlign heap out-of-bounds read via unchecked batch_indices

Description

Add value-range validation for batch_indices in the RoiAlign operator to prevent out-of-bounds heap reads from maliciously crafted ONNX models.

CheckROIAlignValidInput() previously validated tensor shapes but never checked that the values in batch_indices fall within [0, batch_size). An attacker could supply batch_indices containing values exceeding the batch dimension of the input tensor X, causing the kernel to read arbitrary heap memory at:

  • CPU: roialign.cc:212roi_batch_ind used as unchecked index into bottom_data
  • CUDA: roialign_impl.cu:109batch_indices_ptr[n] used as unchecked index into bottom_data on GPU

Impact

  • Vulnerability type: Heap out-of-bounds read
  • Impact: Arbitrary heap memory read, potential information disclosure, program crash
  • Trigger: Construct batch_indices with values ≥ batch_size or < 0
  • Affected providers: CPU and CUDA (both call CheckROIAlignValidInput())

Changes

onnxruntime/core/providers/cpu/object_detection/roialign.cc

  • Added per-element bounds check in CheckROIAlignValidInput(): each batch_indices[i] must satisfy 0 <= value < X.shape[0]
  • Returns INVALID_ARGUMENT with a descriptive error message on violation
  • Guarded by batch_indices_ptr->Location().device.Type() == OrtDevice::CPU so it only runs when the tensor data is host-accessible (CPU EP and CropAndResize). For the CUDA EP, batch_indices lives in GPU memory and cannot be safely dereferenced on the host.

onnxruntime/test/providers/cpu/object_detection/roialign_test.cc

  • Added BatchIndicesOutOfRange test: batch_indices={1} with batch_size=1 (exercises >= batch_size path)
  • Added BatchIndicesNegative test: batch_indices={-1} (exercises < 0 path)

Known Limitation

The CUDA execution path is not protected by this bounds check because batch_indices is a GPU tensor and cannot be read on the host. Adding a device-side bounds check would require passing batch_size into the CUDA kernel — this is tracked as a follow-up.

Note: Using .InputMemoryType(OrtMemTypeCPUInput, 2) was considered but rejected because it would force a GPU→CPU transfer of batch_indices, breaking CUDA graph capture for models like Masked R-CNN where batch_indices is produced by upstream GPU ops.

Validation

  • Full RoiAlignTest.* suite passes (12/12 tests) on CPU build
  • Full RoiAlignTest.* suite passes (12/12 tests) on CUDA build
  • No regressions in existing positive or negative tests

yuslepukhin
yuslepukhin previously approved these changes Mar 3, 2026
Copy link
Copy Markdown
Member

@yuslepukhin yuslepukhin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds runtime validation to RoiAlign to prevent out-of-bounds reads caused by malicious batch_indices values, and introduces negative tests to cover the new validation behavior.

Changes:

  • Add per-element range checking for batch_indices values in CheckROIAlignValidInput().
  • Add CPU provider tests for out-of-range and negative batch_indices inputs.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
onnxruntime/core/providers/cpu/object_detection/roialign.cc Adds value-range validation for batch_indices during RoiAlign input validation.
onnxruntime/test/providers/cpu/object_detection/roialign_test.cc Adds negative tests to verify invalid batch_indices values are rejected with an error.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

@yuslepukhin yuslepukhin dismissed their stale review March 4, 2026 00:04

Copilot comments

@tianleiwu tianleiwu requested a review from yuslepukhin March 4, 2026 19:09
@tianleiwu tianleiwu enabled auto-merge (squash) March 4, 2026 23:18
@tianleiwu tianleiwu merged commit 2da1a30 into main Mar 5, 2026
91 checks passed
@tianleiwu tianleiwu deleted the tlwu/fix_roialign_oob branch March 5, 2026 00:49
tianleiwu added a commit that referenced this pull request Mar 5, 2026
)

# Fix RoiAlign heap out-of-bounds read via unchecked batch_indices

## Description

Add value-range validation for `batch_indices` in the RoiAlign operator
to prevent out-of-bounds heap reads from maliciously crafted ONNX
models.

`CheckROIAlignValidInput()` previously validated tensor shapes but never
checked that the **values** in `batch_indices` fall within `[0,
batch_size)`. An attacker could supply `batch_indices` containing values
exceeding the batch dimension of the input tensor `X`, causing the
kernel to read arbitrary heap memory at:

- **CPU:** `roialign.cc:212` — `roi_batch_ind` used as unchecked index
into `bottom_data`
- **CUDA:** `roialign_impl.cu:109` — `batch_indices_ptr[n]` used as
unchecked index into `bottom_data` on GPU

## Impact

- **Vulnerability type:** Heap out-of-bounds read
- **Impact:** Arbitrary heap memory read, potential information
disclosure, program crash
- **Trigger:** Construct `batch_indices` with values ≥ `batch_size` or <
0
- **Affected providers:** CPU and CUDA (both call
`CheckROIAlignValidInput()`)

## Changes

### `onnxruntime/core/providers/cpu/object_detection/roialign.cc`
- Added per-element bounds check in `CheckROIAlignValidInput()`: each
`batch_indices[i]` must satisfy `0 <= value < X.shape[0]`
- Returns `INVALID_ARGUMENT` with a descriptive error message on
violation
- Guarded by `batch_indices_ptr->Location().device.Type() ==
OrtDevice::CPU` so it only runs when the tensor data is host-accessible
(CPU EP and CropAndResize). For the CUDA EP, `batch_indices` lives in
GPU memory and cannot be safely dereferenced on the host.

### `onnxruntime/test/providers/cpu/object_detection/roialign_test.cc`
- Added `BatchIndicesOutOfRange` test: `batch_indices={1}` with
`batch_size=1` (exercises `>= batch_size` path)
- Added `BatchIndicesNegative` test: `batch_indices={-1}` (exercises `<
0` path)

## Known Limitation

The CUDA execution path is **not** protected by this bounds check
because `batch_indices` is a GPU tensor and cannot be read on the host.
Adding a device-side bounds check would require passing `batch_size`
into the CUDA kernel — this is tracked as a follow-up.

Note: Using `.InputMemoryType(OrtMemTypeCPUInput, 2)` was considered but
rejected because it would force a GPU→CPU transfer of `batch_indices`,
breaking CUDA graph capture for models like Masked R-CNN where
`batch_indices` is produced by upstream GPU ops.

## Validation

- Full `RoiAlignTest.*` suite passes (12/12 tests) on CPU build
- Full `RoiAlignTest.*` suite passes (12/12 tests) on CUDA build
- No regressions in existing positive or negative tests
tianleiwu added a commit that referenced this pull request Mar 5, 2026
This cherry-picks the following commits for the release:

| Commit ID | PR Number | Commit Title |
|-----------|-----------|-------------|
| d5387d8 | #27192 | Avoid repetitive creation of fp4/fp8
native-custom-op domains for NvTensorRtRtx EP |
| 0b9906a | #27454 | Suppress spurious Array Out of Bounds warnings
produced by GCC 14.2 compiler on Linux builds |
| 4a80b0b | #27471 | Fix double-free in TRT EP custom op domain
Release functions |
| c7c939f | #27499 | Fix -Warray-bounds build error in MLAS on clang
17+ |
| f99dcca | #27514 | [Build] Fix pybind11 vcpkg configuration |
| ef04b10 | #27518 | [CXX Lora] Prevent heap OOB from maliciously
crafted Lora Adapters. |
| 0b2b6d0 | #27288 | [NvTensorRTRTX EP]: Add missing override
specifiers to suppress warnings |
| c1d8f5c | #27522 | Add "library_path" metadata entry to OrtEpDevice
instances for plugin and provider bridge EPs |
| fdead1c | #27537 | Account for ORT_NO_EXCEPTIONS builds in Lora
test |
| 3d1365e | #27521 | increase kMaxValueLength to 8192 |
| df8f4a7 | #27535 | Add OrtEnv.DisableDllImportResolver to prevent
fatal error on resolver conflict |
| bdd672a | #27356 | Add/Update telemetry events |
| 2da1a30 | #27543 | Fix RoiAlign heap out-of-bounds read via
unchecked batch_indices |
| 5c3f544 | #27466 | DQ→MatMulNBits fusion transformer for
NvTensorRtRtx ep |

---------

Co-authored-by: vishalpandya1990 <vipandya@nvidia.com>
Co-authored-by: Vishnudas Thaniel S <vishnudas.thaniel.s@intel.com>
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: Dmitri Smirnov <yuslepukhin@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Stephan Seitz <sseitz@nvidia.com>
Co-authored-by: Scott McKay <skottmckay@gmail.com>
Co-authored-by: xieofxie <xieofxie@126.com>
Co-authored-by: hualxie <hualxie@microsoft.com>
Co-authored-by: Darshak Bhatti <47045043+dabhattimsft@users.noreply.github.com>
Co-authored-by: Darshak Bhatti <dabhatti@micorsoft.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: anujj <ajalota@nvidia.com>
Co-authored-by: praneshgo <227579474+praneshgo@users.noreply.github.com>
tianleiwu added a commit that referenced this pull request Mar 12, 2026
## Description
This PR implements a device-side bounds check for `batch_indices` in the
RoiAlign CUDA operator. This is a follow-up to
#27543, which fixed the
same vulnerability in the CPU implementation.

Previously, CheckROIAlignValidInput() only validated `batch_indices`
when they were accessible on the host (CPU). For the CUDA EP,
`batch_indices` reside in GPU memory, so host-side validation would
require an expensive GPU-to-CPU copy, which could also break CUDA graph
capture.

This change:
1.  Passes `batch_size` from the host to the CUDA kernel.
2. Adds a check within the `RoIAlignForward` kernel to ensure `0 <=
batch_index < batch_size`.
3. If an invalid `batch_index` is encountered, the kernel sets the
output value for that specific RoI element to 0 and returns early for
that thread.

## Impact
-   **Vulnerability fixed:** Heap out-of-bounds read on GPU.
- **Performance:** Negligible impact as it's a simple range check within
the existing kernel.
-   **Compatibility:** No changes to ONNX models or public APIs.

## Validation
-   Existing `RoiAlignTest` suite.
- Added two new test cases: `BatchIndicesOutOfRange_CUDA` and
`BatchIndicesNegative_CUDA` to verify that the CUDA provider correctly
handles out-of-range batch indices.
- Verified that the CUDA provider handles opset 10 without falling back
to the CPU EP for these tests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants