Skip to content

Fix GatherCopyData Integer Truncation Leading to Heap Out-of-Bounds Read/Write #27444

Merged
chilo-ms merged 3 commits intomainfrom
chi/fix_oob_read_write
Feb 26, 2026
Merged

Fix GatherCopyData Integer Truncation Leading to Heap Out-of-Bounds Read/Write #27444
chilo-ms merged 3 commits intomainfrom
chi/fix_oob_read_write

Conversation

@chilo-ms
Copy link
Copy Markdown
Contributor

@chilo-ms chilo-ms commented Feb 24, 2026

Description

This pull request improves the robustness and correctness of the CPU implementation of the Gather operator in ONNX Runtime. The key changes focus on preventing integer overflow issues in parallel processing and output shape calculations, as well as enhancing test coverage to verify these safeguards.

Enhancements to overflow handling and parallel processing:

  • Changed the lambda function in GatherCopyData to use ptrdiff_t instead of int64_t for the index, and explicitly cast batch and i variables, ensuring safer arithmetic for large tensor sizes.
  • Updated the parallel loop in GatherCopyData to iterate using ptrdiff_t indices, preventing potential overflow when processing large tensors.

Testing improvements:

  • Added a new unit test Gather_overflow_check in gather_op_test.cc to verify that the Gather operator correctly handles very large output shapes without overflowing, specifically testing dimensions that exceed the 32-bit integer limit.

Motivation and Context

Copy link
Copy Markdown
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can commit the suggested changes from lintrunner.

@adrianlizarraga
Copy link
Copy Markdown
Contributor

Thanks @chilo-ms . There's a linter error but otherwise looks good to me!

@chilo-ms chilo-ms merged commit 355278a into main Feb 26, 2026
89 of 93 checks passed
@chilo-ms chilo-ms deleted the chi/fix_oob_read_write branch February 26, 2026 16:46
tianleiwu pushed a commit that referenced this pull request Feb 26, 2026
…ead/Write (#27444)

### Description
This pull request improves the robustness and correctness of the CPU
implementation of the Gather operator in ONNX Runtime. The key changes
focus on preventing integer overflow issues in parallel processing and
output shape calculations, as well as enhancing test coverage to verify
these safeguards.

Enhancements to overflow handling and parallel processing:

* Changed the lambda function in `GatherCopyData` to use `ptrdiff_t`
instead of `int64_t` for the index, and explicitly cast batch and i
variables, ensuring safer arithmetic for large tensor sizes.
* Updated the parallel loop in `GatherCopyData` to iterate using
`ptrdiff_t` indices, preventing potential overflow when processing large
tensors.

Testing improvements:

* Added a new unit test `Gather_overflow_check` in `gather_op_test.cc`
to verify that the Gather operator correctly handles very large output
shapes without overflowing, specifically testing dimensions that exceed
the 32-bit integer limit.



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
tianleiwu added a commit that referenced this pull request Feb 27, 2026
This cherry-picks the following commits for the release:

| Commit ID | PR Number | Commit Title |
|-----------|-----------|-------------|
| decd177 | #27090 | Fix GatherND division by zero when batch
dimensions mismatch |
| 55f8234 | #27360 | Fix QMoE CPU Operator |
| df9146f | #27403 | [MLAS] Adding DynamicQGemm function pointers and
ukernel interface |
| 0f93853 | #27318 | [js/web] Use embedded WASM module in Blob URL
workers when wasmBinary is provided |
| b2a6e69 | #27364 | QMoE CPU Performance Update (Up to 4x on 4-bit)
|
| f501e1d | #27413 | Fix refcount bug in map input conversion that
caused shutdown segfault |
| b32b205 | #27421 | Fix error where bytes is not assigned for
dynamic qgemm pack b size |
| 426b006 | #27397 | Fix DllImportResolver |
| 0982844 | #27412 | MatmulNBits prepacking scales fix |
| 9afb0d2 | #27430 | Fix validation for external data paths for
models loaded from bytes |
| 71d2cd0 | #27401 | Enable Python 3.14 CI and Upgrade Dependencies |
| 79e0676 | #27419 | fix: out of bounds access for resize operation |
| 82eb99c | #27459 | Fix SkipLayerNorm fusion incorrectly applied
when gamma/beta are not 1D |
| 355278a | #27444 | Fix GatherCopyData Integer Truncation Leading to
Heap Out-of-Bounds Read/Write |
| cf96123 | #27411 | [web] fix usage of wasmBinary together with a
blob URL for .mjs |
| 1131a86 | #27399 | [web] remove the unhelpful "Unknown CPU vendor"
warning. |
| ffbbc4f | #27316 | Build Windows ARM64X binaries as part of
packaging pipeline |

---------

Signed-off-by: Jonathan Clohessy <Jonathan.Clohessy@arm.com>
Co-authored-by: patryk-kaiser-ARM <patryk.kaiser@arm.com>
Co-authored-by: don <70039285+0-don@users.noreply.github.com>
Co-authored-by: Jonathan Clohessy <jonathan.clohessy@arm.com>
Co-authored-by: Hariharan Seshadri <shariharan91@gmail.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Adrian Lizarraga <adlizarraga@microsoft.com>
Co-authored-by: Lukas Folle <126877803+lukas-folle-snkeos@users.noreply.github.com>
Co-authored-by: Chi Lo <54722500+chilo-ms@users.noreply.github.com>
Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>
Co-authored-by: Chaya <cha182350@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: Erik <erscor@microsoft.com>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
adrianlizarraga added a commit that referenced this pull request Feb 28, 2026
#27483)

### Description
This PR reduces the size of the memory allocation for expected outputs
from ~4GiB to ~2GiB in the Gather_overflow_check test. The updated test
still verifies that the integer overflow fix from PR
#27444 is valid. That is,
that the CPU Gather operator correctly handles output tensors with
element counts that exceed INT32_MAX.

Changes:

- Reduced test dimension from 65537 to 46341 (output shape from
65537×65537 to 46341×46341), which results in a total number of elements
that is just over INT32_MAX, which is required to test the bug fix.
  - The peak memory usage is reduced to ~4GiB + overhead.
- Increase Android emulator memory to 5GiB (from 4GiB) to be able to run
the test.

### Motivation
Android CI fails to run the unit test introduced in
#27444 due to memory usage
that exceeds the Android emulator's default memory of 4GiB. This PR
lowers the peak memory usage of the unit test and increases the Android
emulator's memory by 1GiB.
tianleiwu pushed a commit that referenced this pull request Feb 28, 2026
#27483)

### Description
This PR reduces the size of the memory allocation for expected outputs
from ~4GiB to ~2GiB in the Gather_overflow_check test. The updated test
still verifies that the integer overflow fix from PR
#27444 is valid. That is,
that the CPU Gather operator correctly handles output tensors with
element counts that exceed INT32_MAX.

Changes:

- Reduced test dimension from 65537 to 46341 (output shape from
65537×65537 to 46341×46341), which results in a total number of elements
that is just over INT32_MAX, which is required to test the bug fix.
  - The peak memory usage is reduced to ~4GiB + overhead.
- Increase Android emulator memory to 5GiB (from 4GiB) to be able to run
the test.

### Motivation
Android CI fails to run the unit test introduced in
#27444 due to memory usage
that exceeds the Android emulator's default memory of 4GiB. This PR
lowers the peak memory usage of the unit test and increases the Android
emulator's memory by 1GiB.
chilo-ms added a commit that referenced this pull request Mar 12, 2026
…rite (#27544)

### Description
<!-- Describe your changes. -->
This pull request refactors several tensor operation kernels
(`GatherND`, `ScatterND`, and `GatherGrad`) to improve type safety and
consistency in parallelized code execution. The main change is replacing
`int` loop indices with `ptrdiff_t` to avoid overflow.

### Parallelization and Type Safety Improvements

* Updated lambda functions and parallel loop indices in `gather_nd.cc`
(`GatherNDBase::PrepareForCompute`, `GatherND::GatherNumber`, and
`GatherND::GatherString`) to use `ptrdiff_t` instead of `int64_t`, and
replaced index arithmetic with explicit casts to maintain correctness.
[[1]](diffhunk://#diff-a456934cd8ef2c51197e04af32ecbef5b531dae83f7f8c2aca46802b7a5e7b7bL96-R100)
[[2]](diffhunk://#diff-a456934cd8ef2c51197e04af32ecbef5b531dae83f7f8c2aca46802b7a5e7b7bL121-R121)
[[3]](diffhunk://#diff-a456934cd8ef2c51197e04af32ecbef5b531dae83f7f8c2aca46802b7a5e7b7bL192-R216)
* Refactored `scatter_nd.cc` (`ScatterNDDispatchTarget`) to use
`ptrdiff_t` for loop indices and index arithmetic in all reduction
cases, ensuring consistent type usage in parallel execution.
* Modified `gather_grad.cc` (`GatherGrad::ComputeImpl`) to use
`ptrdiff_t` for parallel loop indices, aligning with the changes in
other tensor kernels.




### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Another same issue was fixed in
#27444
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants