Fix RoiAlign heap out-of-bounds read via unchecked batch_indices by tianleiwu · Pull Request #27543 · microsoft/onnxruntime

tianleiwu · 2026-03-03T23:43:26Z

Fix RoiAlign heap out-of-bounds read via unchecked batch_indices

Description

Add value-range validation for batch_indices in the RoiAlign operator to prevent out-of-bounds heap reads from maliciously crafted ONNX models.

CheckROIAlignValidInput() previously validated tensor shapes but never checked that the values in batch_indices fall within [0, batch_size). An attacker could supply batch_indices containing values exceeding the batch dimension of the input tensor X, causing the kernel to read arbitrary heap memory at:

CPU: roialign.cc:212 — roi_batch_ind used as unchecked index into bottom_data
CUDA: roialign_impl.cu:109 — batch_indices_ptr[n] used as unchecked index into bottom_data on GPU

Impact

Vulnerability type: Heap out-of-bounds read
Impact: Arbitrary heap memory read, potential information disclosure, program crash
Trigger: Construct batch_indices with values ≥ batch_size or < 0
Affected providers: CPU and CUDA (both call CheckROIAlignValidInput())

Changes

`onnxruntime/core/providers/cpu/object_detection/roialign.cc`

Added per-element bounds check in CheckROIAlignValidInput(): each batch_indices[i] must satisfy 0 <= value < X.shape[0]
Returns INVALID_ARGUMENT with a descriptive error message on violation
Guarded by batch_indices_ptr->Location().device.Type() == OrtDevice::CPU so it only runs when the tensor data is host-accessible (CPU EP and CropAndResize). For the CUDA EP, batch_indices lives in GPU memory and cannot be safely dereferenced on the host.

`onnxruntime/test/providers/cpu/object_detection/roialign_test.cc`

Added BatchIndicesOutOfRange test: batch_indices={1} with batch_size=1 (exercises >= batch_size path)
Added BatchIndicesNegative test: batch_indices={-1} (exercises < 0 path)

Known Limitation

The CUDA execution path is not protected by this bounds check because batch_indices is a GPU tensor and cannot be read on the host. Adding a device-side bounds check would require passing batch_size into the CUDA kernel — this is tracked as a follow-up.

Note: Using .InputMemoryType(OrtMemTypeCPUInput, 2) was considered but rejected because it would force a GPU→CPU transfer of batch_indices, breaking CUDA graph capture for models like Masked R-CNN where batch_indices is produced by upstream GPU ops.

Validation

Full RoiAlignTest.* suite passes (12/12 tests) on CPU build
Full RoiAlignTest.* suite passes (12/12 tests) on CUDA build
No regressions in existing positive or negative tests

yuslepukhin

LGTM

Copilot

Pull request overview

Adds runtime validation to RoiAlign to prevent out-of-bounds reads caused by malicious batch_indices values, and introduces negative tests to cover the new validation behavior.

Changes:

Add per-element range checking for batch_indices values in CheckROIAlignValidInput().
Add CPU provider tests for out-of-range and negative batch_indices inputs.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
onnxruntime/core/providers/cpu/object_detection/roialign.cc	Adds value-range validation for `batch_indices` during RoiAlign input validation.
onnxruntime/test/providers/cpu/object_detection/roialign_test.cc	Adds negative tests to verify invalid `batch_indices` values are rejected with an error.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

onnxruntime/core/providers/cpu/object_detection/roialign.cc

Copilot comments

onnxruntime/core/providers/cpu/object_detection/roialign.cc

) # Fix RoiAlign heap out-of-bounds read via unchecked batch_indices ## Description Add value-range validation for `batch_indices` in the RoiAlign operator to prevent out-of-bounds heap reads from maliciously crafted ONNX models. `CheckROIAlignValidInput()` previously validated tensor shapes but never checked that the **values** in `batch_indices` fall within `[0, batch_size)`. An attacker could supply `batch_indices` containing values exceeding the batch dimension of the input tensor `X`, causing the kernel to read arbitrary heap memory at: - **CPU:** `roialign.cc:212` — `roi_batch_ind` used as unchecked index into `bottom_data` - **CUDA:** `roialign_impl.cu:109` — `batch_indices_ptr[n]` used as unchecked index into `bottom_data` on GPU ## Impact - **Vulnerability type:** Heap out-of-bounds read - **Impact:** Arbitrary heap memory read, potential information disclosure, program crash - **Trigger:** Construct `batch_indices` with values ≥ `batch_size` or < 0 - **Affected providers:** CPU and CUDA (both call `CheckROIAlignValidInput()`) ## Changes ### `onnxruntime/core/providers/cpu/object_detection/roialign.cc` - Added per-element bounds check in `CheckROIAlignValidInput()`: each `batch_indices[i]` must satisfy `0 <= value < X.shape[0]` - Returns `INVALID_ARGUMENT` with a descriptive error message on violation - Guarded by `batch_indices_ptr->Location().device.Type() == OrtDevice::CPU` so it only runs when the tensor data is host-accessible (CPU EP and CropAndResize). For the CUDA EP, `batch_indices` lives in GPU memory and cannot be safely dereferenced on the host. ### `onnxruntime/test/providers/cpu/object_detection/roialign_test.cc` - Added `BatchIndicesOutOfRange` test: `batch_indices={1}` with `batch_size=1` (exercises `>= batch_size` path) - Added `BatchIndicesNegative` test: `batch_indices={-1}` (exercises `< 0` path) ## Known Limitation The CUDA execution path is **not** protected by this bounds check because `batch_indices` is a GPU tensor and cannot be read on the host. Adding a device-side bounds check would require passing `batch_size` into the CUDA kernel — this is tracked as a follow-up. Note: Using `.InputMemoryType(OrtMemTypeCPUInput, 2)` was considered but rejected because it would force a GPU→CPU transfer of `batch_indices`, breaking CUDA graph capture for models like Masked R-CNN where `batch_indices` is produced by upstream GPU ops. ## Validation - Full `RoiAlignTest.*` suite passes (12/12 tests) on CPU build - Full `RoiAlignTest.*` suite passes (12/12 tests) on CUDA build - No regressions in existing positive or negative tests

This cherry-picks the following commits for the release: | Commit ID | PR Number | Commit Title | |-----------|-----------|-------------| | d5387d8 | #27192 | Avoid repetitive creation of fp4/fp8 native-custom-op domains for NvTensorRtRtx EP | | 0b9906a | #27454 | Suppress spurious Array Out of Bounds warnings produced by GCC 14.2 compiler on Linux builds | | 4a80b0b | #27471 | Fix double-free in TRT EP custom op domain Release functions | | c7c939f | #27499 | Fix -Warray-bounds build error in MLAS on clang 17+ | | f99dcca | #27514 | [Build] Fix pybind11 vcpkg configuration | | ef04b10 | #27518 | [CXX Lora] Prevent heap OOB from maliciously crafted Lora Adapters. | | 0b2b6d0 | #27288 | [NvTensorRTRTX EP]: Add missing override specifiers to suppress warnings | | c1d8f5c | #27522 | Add "library_path" metadata entry to OrtEpDevice instances for plugin and provider bridge EPs | | fdead1c | #27537 | Account for ORT_NO_EXCEPTIONS builds in Lora test | | 3d1365e | #27521 | increase kMaxValueLength to 8192 | | df8f4a7 | #27535 | Add OrtEnv.DisableDllImportResolver to prevent fatal error on resolver conflict | | bdd672a | #27356 | Add/Update telemetry events | | 2da1a30 | #27543 | Fix RoiAlign heap out-of-bounds read via unchecked batch_indices | | 5c3f544 | #27466 | DQ→MatMulNBits fusion transformer for NvTensorRtRtx ep | --------- Co-authored-by: vishalpandya1990 <vipandya@nvidia.com> Co-authored-by: Vishnudas Thaniel S <vishnudas.thaniel.s@intel.com> Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com> Co-authored-by: Dmitri Smirnov <yuslepukhin@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Stephan Seitz <sseitz@nvidia.com> Co-authored-by: Scott McKay <skottmckay@gmail.com> Co-authored-by: xieofxie <xieofxie@126.com> Co-authored-by: hualxie <hualxie@microsoft.com> Co-authored-by: Darshak Bhatti <47045043+dabhattimsft@users.noreply.github.com> Co-authored-by: Darshak Bhatti <dabhatti@micorsoft.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: anujj <ajalota@nvidia.com> Co-authored-by: praneshgo <227579474+praneshgo@users.noreply.github.com>

## Description This PR implements a device-side bounds check for `batch_indices` in the RoiAlign CUDA operator. This is a follow-up to #27543, which fixed the same vulnerability in the CPU implementation. Previously, CheckROIAlignValidInput() only validated `batch_indices` when they were accessible on the host (CPU). For the CUDA EP, `batch_indices` reside in GPU memory, so host-side validation would require an expensive GPU-to-CPU copy, which could also break CUDA graph capture. This change: 1. Passes `batch_size` from the host to the CUDA kernel. 2. Adds a check within the `RoIAlignForward` kernel to ensure `0 <= batch_index < batch_size`. 3. If an invalid `batch_index` is encountered, the kernel sets the output value for that specific RoI element to 0 and returns early for that thread. ## Impact - **Vulnerability fixed:** Heap out-of-bounds read on GPU. - **Performance:** Negligible impact as it's a simple range check within the existing kernel. - **Compatibility:** No changes to ONNX models or public APIs. ## Validation - Existing `RoiAlignTest` suite. - Added two new test cases: `BatchIndicesOutOfRange_CUDA` and `BatchIndicesNegative_CUDA` to verify that the CUDA provider correctly handles out-of-range batch indices. - Verified that the CUDA provider handles opset 10 without falling back to the CPU EP for these tests.

Fix RoiAlign heap out-of-bounds read via unchecked batch_indices

0e98606

tianleiwu requested a review from Copilot March 3, 2026 23:43

tianleiwu added the release:1.24.3 label Mar 3, 2026

Copilot started reviewing on behalf of tianleiwu March 3, 2026 23:45 View session

yuslepukhin previously approved these changes Mar 3, 2026

View reviewed changes

Copilot AI reviewed Mar 3, 2026

View reviewed changes

onnxruntime/core/providers/cpu/object_detection/roialign.cc Outdated Show resolved Hide resolved

tianleiwu added 3 commits March 3, 2026 16:39

fix cuda

083d910

revert cuda

fb3a5da

handle int32 batch_indices

5402f8a

tianleiwu requested a review from yuslepukhin March 4, 2026 19:09

update test

41f6492

hariharans29 reviewed Mar 4, 2026

View reviewed changes

onnxruntime/core/providers/cpu/object_detection/roialign.cc Show resolved Hide resolved

hariharans29 approved these changes Mar 4, 2026

View reviewed changes

tianleiwu enabled auto-merge (squash) March 4, 2026 23:18

tianleiwu merged commit 2da1a30 into main Mar 5, 2026
91 checks passed

tianleiwu deleted the tlwu/fix_roialign_oob branch March 5, 2026 00:49

tianleiwu mentioned this pull request Mar 5, 2026

ORT 1.24.3 release cherry pick round 4 #27558

Merged

This was referenced Mar 9, 2026

Bump Microsoft.ML.OnnxRuntime.Gpu from 1.23.2 to 1.24.3 yuniko-software/bge-m3-onnx#66

Closed

deps(nuget): Bump the microsoft-packages group with 2 updates Ellerbach/azure-ai-search-simulator#73

Closed

tianleiwu mentioned this pull request Mar 9, 2026

[CUDA] Add Validation of batch_indices in RoiAlign #27603

Merged

This was referenced Mar 30, 2026

deps(nuget): Bump the microsoft-packages group with 11 updates Ellerbach/azure-ai-search-simulator#86

Closed

deps(nuget): Bump the nuget-all group with 11 updates msbrettorg/maenifold#83

Open

vraspar mentioned this pull request Apr 1, 2026

Fix heap out-of-bounds write in MaxPoolGrad via unchecked indices #27932

Closed

dependabot bot mentioned this pull request Apr 6, 2026

deps(nuget): Bump Microsoft.AspNetCore.Authentication.JwtBearer and 10 others Ellerbach/azure-ai-search-simulator#92

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix RoiAlign heap out-of-bounds read via unchecked batch_indices#27543

Fix RoiAlign heap out-of-bounds read via unchecked batch_indices#27543
tianleiwu merged 5 commits intomainfrom
tlwu/fix_roialign_oob

tianleiwu commented Mar 3, 2026 •

edited

Loading

Uh oh!

yuslepukhin left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

tianleiwu commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Fix RoiAlign heap out-of-bounds read via unchecked batch_indices

Description

Impact

Changes

onnxruntime/core/providers/cpu/object_detection/roialign.cc

onnxruntime/test/providers/cpu/object_detection/roialign_test.cc

Known Limitation

Validation

Uh oh!

yuslepukhin left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

tianleiwu commented Mar 3, 2026 •

edited

Loading

`onnxruntime/core/providers/cpu/object_detection/roialign.cc`

`onnxruntime/test/providers/cpu/object_detection/roialign_test.cc`