Fix RoiAlign heap out-of-bounds read via unchecked batch_indices#27543
Merged
Fix RoiAlign heap out-of-bounds read via unchecked batch_indices#27543
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
Adds runtime validation to RoiAlign to prevent out-of-bounds reads caused by malicious batch_indices values, and introduces negative tests to cover the new validation behavior.
Changes:
- Add per-element range checking for
batch_indicesvalues inCheckROIAlignValidInput(). - Add CPU provider tests for out-of-range and negative
batch_indicesinputs.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| onnxruntime/core/providers/cpu/object_detection/roialign.cc | Adds value-range validation for batch_indices during RoiAlign input validation. |
| onnxruntime/test/providers/cpu/object_detection/roialign_test.cc | Adds negative tests to verify invalid batch_indices values are rejected with an error. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
hariharans29
reviewed
Mar 4, 2026
hariharans29
approved these changes
Mar 4, 2026
tianleiwu
added a commit
that referenced
this pull request
Mar 5, 2026
) # Fix RoiAlign heap out-of-bounds read via unchecked batch_indices ## Description Add value-range validation for `batch_indices` in the RoiAlign operator to prevent out-of-bounds heap reads from maliciously crafted ONNX models. `CheckROIAlignValidInput()` previously validated tensor shapes but never checked that the **values** in `batch_indices` fall within `[0, batch_size)`. An attacker could supply `batch_indices` containing values exceeding the batch dimension of the input tensor `X`, causing the kernel to read arbitrary heap memory at: - **CPU:** `roialign.cc:212` — `roi_batch_ind` used as unchecked index into `bottom_data` - **CUDA:** `roialign_impl.cu:109` — `batch_indices_ptr[n]` used as unchecked index into `bottom_data` on GPU ## Impact - **Vulnerability type:** Heap out-of-bounds read - **Impact:** Arbitrary heap memory read, potential information disclosure, program crash - **Trigger:** Construct `batch_indices` with values ≥ `batch_size` or < 0 - **Affected providers:** CPU and CUDA (both call `CheckROIAlignValidInput()`) ## Changes ### `onnxruntime/core/providers/cpu/object_detection/roialign.cc` - Added per-element bounds check in `CheckROIAlignValidInput()`: each `batch_indices[i]` must satisfy `0 <= value < X.shape[0]` - Returns `INVALID_ARGUMENT` with a descriptive error message on violation - Guarded by `batch_indices_ptr->Location().device.Type() == OrtDevice::CPU` so it only runs when the tensor data is host-accessible (CPU EP and CropAndResize). For the CUDA EP, `batch_indices` lives in GPU memory and cannot be safely dereferenced on the host. ### `onnxruntime/test/providers/cpu/object_detection/roialign_test.cc` - Added `BatchIndicesOutOfRange` test: `batch_indices={1}` with `batch_size=1` (exercises `>= batch_size` path) - Added `BatchIndicesNegative` test: `batch_indices={-1}` (exercises `< 0` path) ## Known Limitation The CUDA execution path is **not** protected by this bounds check because `batch_indices` is a GPU tensor and cannot be read on the host. Adding a device-side bounds check would require passing `batch_size` into the CUDA kernel — this is tracked as a follow-up. Note: Using `.InputMemoryType(OrtMemTypeCPUInput, 2)` was considered but rejected because it would force a GPU→CPU transfer of `batch_indices`, breaking CUDA graph capture for models like Masked R-CNN where `batch_indices` is produced by upstream GPU ops. ## Validation - Full `RoiAlignTest.*` suite passes (12/12 tests) on CPU build - Full `RoiAlignTest.*` suite passes (12/12 tests) on CUDA build - No regressions in existing positive or negative tests
tianleiwu
added a commit
that referenced
this pull request
Mar 5, 2026
This cherry-picks the following commits for the release: | Commit ID | PR Number | Commit Title | |-----------|-----------|-------------| | d5387d8 | #27192 | Avoid repetitive creation of fp4/fp8 native-custom-op domains for NvTensorRtRtx EP | | 0b9906a | #27454 | Suppress spurious Array Out of Bounds warnings produced by GCC 14.2 compiler on Linux builds | | 4a80b0b | #27471 | Fix double-free in TRT EP custom op domain Release functions | | c7c939f | #27499 | Fix -Warray-bounds build error in MLAS on clang 17+ | | f99dcca | #27514 | [Build] Fix pybind11 vcpkg configuration | | ef04b10 | #27518 | [CXX Lora] Prevent heap OOB from maliciously crafted Lora Adapters. | | 0b2b6d0 | #27288 | [NvTensorRTRTX EP]: Add missing override specifiers to suppress warnings | | c1d8f5c | #27522 | Add "library_path" metadata entry to OrtEpDevice instances for plugin and provider bridge EPs | | fdead1c | #27537 | Account for ORT_NO_EXCEPTIONS builds in Lora test | | 3d1365e | #27521 | increase kMaxValueLength to 8192 | | df8f4a7 | #27535 | Add OrtEnv.DisableDllImportResolver to prevent fatal error on resolver conflict | | bdd672a | #27356 | Add/Update telemetry events | | 2da1a30 | #27543 | Fix RoiAlign heap out-of-bounds read via unchecked batch_indices | | 5c3f544 | #27466 | DQ→MatMulNBits fusion transformer for NvTensorRtRtx ep | --------- Co-authored-by: vishalpandya1990 <vipandya@nvidia.com> Co-authored-by: Vishnudas Thaniel S <vishnudas.thaniel.s@intel.com> Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com> Co-authored-by: Dmitri Smirnov <yuslepukhin@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Stephan Seitz <sseitz@nvidia.com> Co-authored-by: Scott McKay <skottmckay@gmail.com> Co-authored-by: xieofxie <xieofxie@126.com> Co-authored-by: hualxie <hualxie@microsoft.com> Co-authored-by: Darshak Bhatti <47045043+dabhattimsft@users.noreply.github.com> Co-authored-by: Darshak Bhatti <dabhatti@micorsoft.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: anujj <ajalota@nvidia.com> Co-authored-by: praneshgo <227579474+praneshgo@users.noreply.github.com>
This was referenced Mar 9, 2026
tianleiwu
added a commit
that referenced
this pull request
Mar 12, 2026
## Description This PR implements a device-side bounds check for `batch_indices` in the RoiAlign CUDA operator. This is a follow-up to #27543, which fixed the same vulnerability in the CPU implementation. Previously, CheckROIAlignValidInput() only validated `batch_indices` when they were accessible on the host (CPU). For the CUDA EP, `batch_indices` reside in GPU memory, so host-side validation would require an expensive GPU-to-CPU copy, which could also break CUDA graph capture. This change: 1. Passes `batch_size` from the host to the CUDA kernel. 2. Adds a check within the `RoIAlignForward` kernel to ensure `0 <= batch_index < batch_size`. 3. If an invalid `batch_index` is encountered, the kernel sets the output value for that specific RoI element to 0 and returns early for that thread. ## Impact - **Vulnerability fixed:** Heap out-of-bounds read on GPU. - **Performance:** Negligible impact as it's a simple range check within the existing kernel. - **Compatibility:** No changes to ONNX models or public APIs. ## Validation - Existing `RoiAlignTest` suite. - Added two new test cases: `BatchIndicesOutOfRange_CUDA` and `BatchIndicesNegative_CUDA` to verify that the CUDA provider correctly handles out-of-range batch indices. - Verified that the CUDA provider handles opset 10 without falling back to the CPU EP for these tests.
This was referenced Mar 16, 2026
deps(nuget): Bump the microsoft-packages group with 8 updates
Ellerbach/azure-ai-search-simulator#76
Closed
This was referenced Mar 23, 2026
deps(nuget): Bump the microsoft-packages group with 8 updates
Ellerbach/azure-ai-search-simulator#80
Closed
This was referenced Mar 30, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fix RoiAlign heap out-of-bounds read via unchecked batch_indices
Description
Add value-range validation for
batch_indicesin the RoiAlign operator to prevent out-of-bounds heap reads from maliciously crafted ONNX models.CheckROIAlignValidInput()previously validated tensor shapes but never checked that the values inbatch_indicesfall within[0, batch_size). An attacker could supplybatch_indicescontaining values exceeding the batch dimension of the input tensorX, causing the kernel to read arbitrary heap memory at:roialign.cc:212—roi_batch_indused as unchecked index intobottom_dataroialign_impl.cu:109—batch_indices_ptr[n]used as unchecked index intobottom_dataon GPUImpact
batch_indiceswith values ≥batch_sizeor < 0CheckROIAlignValidInput())Changes
onnxruntime/core/providers/cpu/object_detection/roialign.ccCheckROIAlignValidInput(): eachbatch_indices[i]must satisfy0 <= value < X.shape[0]INVALID_ARGUMENTwith a descriptive error message on violationbatch_indices_ptr->Location().device.Type() == OrtDevice::CPUso it only runs when the tensor data is host-accessible (CPU EP and CropAndResize). For the CUDA EP,batch_indiceslives in GPU memory and cannot be safely dereferenced on the host.onnxruntime/test/providers/cpu/object_detection/roialign_test.ccBatchIndicesOutOfRangetest:batch_indices={1}withbatch_size=1(exercises>= batch_sizepath)BatchIndicesNegativetest:batch_indices={-1}(exercises< 0path)Known Limitation
The CUDA execution path is not protected by this bounds check because
batch_indicesis a GPU tensor and cannot be read on the host. Adding a device-side bounds check would require passingbatch_sizeinto the CUDA kernel — this is tracked as a follow-up.Note: Using
.InputMemoryType(OrtMemTypeCPUInput, 2)was considered but rejected because it would force a GPU→CPU transfer ofbatch_indices, breaking CUDA graph capture for models like Masked R-CNN wherebatch_indicesis produced by upstream GPU ops.Validation
RoiAlignTest.*suite passes (12/12 tests) on CPU buildRoiAlignTest.*suite passes (12/12 tests) on CUDA build