Skip to content

Fix integer overflow in GridSample GsReflect causing heap OOB read#27975

Open
GopalakrishnanN wants to merge 3 commits intomicrosoft:mainfrom
GopalakrishnanN:dev/FixIntegerOverflow
Open

Fix integer overflow in GridSample GsReflect causing heap OOB read#27975
GopalakrishnanN wants to merge 3 commits intomicrosoft:mainfrom
GopalakrishnanN:dev/FixIntegerOverflow

Conversation

@GopalakrishnanN
Copy link
Copy Markdown
Contributor

@GopalakrishnanN GopalakrishnanN commented Apr 3, 2026

Description

The reflection branch of GridSample's PixelAtGrid / PixelAtGrid3D could produce out-of-range indices when grid coordinates were extreme. Two minimal changes:

  1. In GsReflect, widen int n = static_cast<int>(dx / range) to int64_t n = static_cast<int64_t>(dx / range) in both branches. For large (but finite) inputs dx / range can exceed INT_MAX, making the narrow int cast undefined behavior.
  2. In PixelAtGrid / PixelAtGrid3D, clamp the int64_t index returned by GsReflect back into [0, W-1] / [0, H-1] / [0, D-1] before indexing the image buffer. This keeps the final load in bounds as a defense-in-depth measure for any remaining edge case (coordinates ≥ 2^63, NaN, Inf, etc.).

Motivation and Context

Reflection padding indexes the image with the result of GsReflect. The internal narrow int cast on a large floating-point value is undefined behavior and could yield a wild index, leading to an out-of-bounds read on the input image buffer.

Tests

Adds one regression test, test_grid_sample_20_4D_nearest_reflection_extreme_coords, that runs GridSample with mode=nearest and padding_mode=reflection on a 2x2 constant-valued image with grid coordinates of magnitude 1e+10 (well above INT_MAX). It is parameterized over float and MLFloat16 and asserts the kernel returns the constant value, which can only happen if every sampled index is in bounds.

All existing *GridSample* tests continue to pass (102/102 locally).

Files changed

  • onnxruntime/core/providers/cpu/tensor/grid_sample.cc
  • onnxruntime/test/providers/cpu/tensor/grid_sample_test_custom.inc

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Addresses a security vulnerability in the CPU GridSample operator (reflection padding) where extreme/invalid grid coordinates could lead to integer overflow/undefined behavior and subsequent heap out-of-bounds reads.

Changes:

  • Updates GsReflect to handle non-finite inputs and uses fmod + clamping as a defense-in-depth measure.
  • Adds extra index clamping in PixelAtGrid / PixelAtGrid3D for reflection padding before buffer access.
  • Adds regression tests covering extreme, NaN, and Inf grid coordinates for reflection padding.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
onnxruntime/core/providers/cpu/tensor/grid_sample.cc Adds non-finite guards, modifies reflection math, and clamps indices before image access.
onnxruntime/test/providers/cpu/tensor/grid_sample_test_custom.inc Adds custom regression tests for extreme/NaN/Inf grid coordinates under reflection padding.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread onnxruntime/core/providers/cpu/tensor/grid_sample.cc
Comment thread onnxruntime/core/providers/cpu/tensor/grid_sample.cc Outdated
Comment thread onnxruntime/core/providers/cpu/tensor/grid_sample.cc Outdated
Comment thread onnxruntime/test/providers/cpu/tensor/grid_sample_test_custom.inc Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread onnxruntime/core/providers/cpu/tensor/grid_sample.cc Outdated
Comment thread onnxruntime/core/providers/cpu/tensor/grid_sample.cc Outdated
Comment thread onnxruntime/test/providers/cpu/tensor/grid_sample_test_custom.inc Outdated
@GopalakrishnanN GopalakrishnanN force-pushed the dev/FixIntegerOverflow branch 2 times, most recently from b7368e6 to 1a6f651 Compare April 17, 2026 02:05
@GopalakrishnanN GopalakrishnanN requested a review from Copilot April 17, 2026 03:47
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread onnxruntime/core/providers/cpu/tensor/grid_sample.cc
Comment thread .github/workflows/react_native.yml Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread onnxruntime/core/providers/cpu/tensor/grid_sample.cc
GopalakrishnanN pushed a commit to GopalakrishnanN/onnxruntime that referenced this pull request Apr 17, 2026
Return 0 early in PixelAtGrid and PixelAtGrid3D when any spatial dimension is <= 0. This avoids std::clamp's lo <= hi precondition being violated (e.g., std::clamp(c, 0, W-1) when W=0) and prevents an out-of-bounds read from the empty image buffer for Border and Reflection padding modes. Addresses review feedback on PR microsoft#27975.
@GopalakrishnanN GopalakrishnanN requested a review from Copilot April 17, 2026 04:46
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread onnxruntime/core/providers/cuda/tensor/grid_sample_impl.cu Outdated
Comment thread .github/workflows/react_native.yml Outdated
GopalakrishnanN pushed a commit to GopalakrishnanN/onnxruntime that referenced this pull request Apr 23, 2026
Previously, any finite coordinate whose magnitude exceeded a safe float->int conversion range was short-circuited to zero output in every padding mode. That is incorrect for Border and Reflection, whose spec semantics require clamping or reflecting the coordinate into the valid image range. The same change applies to the CUDA 2D and 3D kernels. Only non-finite coordinates (NaN/Inf) are now forced to zero across all modes; large finite coordinates are reduced in floating point according to the active padding mode before any integer conversion, keeping the later int casts well defined while respecting padding_mode. For Zeros padding, a coordinate whose magnitude overflows a safe integer conversion is well outside the image and still produces zero. Regression tests now cover all three padding modes with extreme finite, NaN, and Inf coordinates. Addresses review feedback on PR microsoft#27975.
@GopalakrishnanN GopalakrishnanN force-pushed the dev/FixIntegerOverflow branch from e9b0d37 to a8959a0 Compare April 23, 2026 23:12
int n = static_cast<int>(dx / range);
// Use int64_t rather than int: for extreme (but finite) inputs, dx / range can exceed
// INT_MAX, making a float->int cast undefined behavior.
int64_t n = static_cast<int64_t>(dx / range);
Copy link
Copy Markdown
Member

@yuslepukhin yuslepukhin Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tatic_cast<int64_t>(dx / range);

Here and in all other places with static_cast<>()

static_cast<int64_t>(dx / range) is still UB for extreme inputs

For float, values up to ~3.4e38 are representable. dx / range can produce a value > 9.2e18 (INT64_MAX). The C++ standard says static_cast<int64_t> on a floating-point value outside the representable range is undefined behavior (not just implementation-defined).

For double (which is also a registered type — line 34), the range extends to ~1.8e308, making this trivially exploitable.

The std::clamp in PixelAtGrid catches the fallout in practice, but the UB in GsReflect itself means the compiler is free to do anything. A defensive fix would use std::fmod for the reflection calculation entirely in floating point, avoiding any float-to-integer cast on large values.

#Resolved

@yuslepukhin
Copy link
Copy Markdown
Member

yuslepukhin commented Apr 24, 2026

                int64_t y0 = static_cast<int64_t>(std::floor(y)) - 1;

Here and also on lines 519-537

The issue report mentions NaN grid coordinates as an attack vector. The PR does not guard against NaN or Inf before the static_cast<int64_t> calls in the Compute function. These casts occur in:

Nearest mode (line 420): static_cast<int64_t>(y), static_cast<int64_t>(x) — if x/y is NaN or Inf, this is UB.
Linear mode (line 423): static_cast<int64_t>(std::floor(x)) — std::floor(NaN) returns NaN, and static_cast<int64_t>(NaN) is UB.
Cubic mode (line 436): Same pattern.
3D paths (lines 519-537): Same pattern.
For zeros padding mode, the subsequent bounds check in PixelAtGrid (c >= 0 && c < W) will fail for wild int64 values, returning 0 — so no OOB. For border mode, std::clamp normalizes wild values. For reflection, the new std::clamp normalizes them. So the defense-in-depth clamp makes this safe in practice on most compilers, but it's still technically UB #Resolved


Refers to: onnxruntime/core/providers/cpu/tensor/grid_sample.cc:437 in 5e59984. [](commit_id = 5e59984, deletion_comment = False)

@yuslepukhin
Copy link
Copy Markdown
Member

yuslepukhin commented Apr 24, 2026

const int64_t y1 = static_cast<int64_t>(std::floor(y));

Line 206: const int64_t x1 = static_cast<int64_t>(std::floor(x)) — This fast path is only used for mode==Linear && padding_mode==Zeros && !align_corners, so OOB is prevented by the mask check. But the static_cast on NaN/Inf is still UB. The mask logic relies on comparisons like x1 >= 0 && x1 < W_in which will fail for wild values, causing mask = 0 and producing zero output, which is correct behavior. But UB is UB. #Resolved


Refers to: onnxruntime/core/providers/cpu/tensor/grid_sample.cc:206 in 5e59984. [](commit_id = 5e59984, deletion_comment = False)

// Extreme grid coordinates can overflow that cast, so clamp the resulting indices
// back into the image range before indexing.
c = std::clamp<int64_t>(c, 0, W - 1);
r = std::clamp<int64_t>(r, 0, H - 1);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::clamp requires lo <= hi. When W == 0, the call becomes std::clamp(c, 0, -1) which violates this precondition — the behavior is undefined per the C++ standard.

While the Compute function returns early if Y.Shape().Size() == 0, this check examines N * C * H_out * W_out. If H_in == 0 or W_in == 0 but H_out > 0 and W_out > 0 (possible since grid determines output spatial dims), the code will proceed to call PixelAtGrid with H == 0 or W == 0, hitting this UB.

This applies to both PixelAtGrid and PixelAtGrid3D, in the Border branch (pre-existing) and the new Reflection branch.

Also at other locations for clamp including lines 124-125.

int n = static_cast<int>(dx / range);
// Use int64_t rather than int: for extreme (but finite) inputs, dx / range can exceed
// INT_MAX, making a float->int cast undefined behavior.
int64_t n = static_cast<int64_t>(dx / range);
Copy link
Copy Markdown
Member

@yuslepukhin yuslepukhin Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

int64_t n = static_cast<int64_t>(dx / range);

When align_corners == true and a spatial dimension is 1: x_max = 0, x_min = 0, so range = 0. GsReflect computes dx / range = division by zero. For floating point this produces +Inf, and the subsequent static_cast<int64_t>(Inf) is UB. The std::clamp in PixelAtGrid catches this, but the way clamps are used is not /wo problems (see other comments). #Resolved

@yuslepukhin
Copy link
Copy Markdown
Member

yuslepukhin commented Apr 24, 2026

#include

The file uses std::clamp (introduced in C++17) but doesn't include <algorithm>. This works today because it's transitively included, but is non-portable and could break on different standard library implementations. #Resolved


Refers to: onnxruntime/core/providers/cpu/tensor/grid_sample.cc:4 in 5e59984. [](commit_id = 5e59984, deletion_comment = False)

@yuslepukhin
Copy link
Copy Markdown
Member

Test Coverage Assessment
For the fix specifically: The added test (test_grid_sample_20_4D_nearest_reflection_extreme_coords) is reasonable — it uses coordinates of magnitude 1e+10 (exceeding INT_MAX) with a constant-valued input image so the expected output is deterministic regardless of which reflected pixel is hit. It covers both float and MLFloat16.

Gaps in test coverage:

No NaN/Inf test: The issue report explicitly calls out NaN grid coordinates as an attack vector, yet there is no test with NaN or Inf grid values. This is an important gap since those are distinct code paths (NaN fails all comparisons in GsReflect, Inf has different behavior from large finite values).

No test for bilinear/cubic + reflection + extreme coords: The test only covers nearest mode. The bilinear and cubic paths have their own static_cast<int64_t>(std::floor(x)) calls before reaching PixelAtGrid, and those are also vulnerable to NaN/Inf/extreme values.

No test for 3D (5-D input) + reflection + extreme coords: The PixelAtGrid3D received the same clamp fix but has no test.

No negative test for border padding with extreme coords: While border mode was reportedly safe (via std::clamp), a regression test would ensure it stays safe.

No test with dimension size 1 + align_corners=true + reflection (triggers range==0 division-by-zero in GsReflect).

GopalakrishnanN pushed a commit to GopalakrishnanN/onnxruntime that referenced this pull request Apr 30, 2026
… CPU

Addresses yuslepukhin's review comments on PR microsoft#27975:

- GsReflect: add explicit guard for non-finite input or non-positive range
  (returns x_min) before the existing reduction logic so the int64 cast
  inside GsReflect is never invoked on NaN/Inf or with range == 0.
- Introduce IsSafeForInt64Conversion<T>() helper and sanitize unsafe
  coordinates to the in-range border value (x_min/y_min/z_min) before
  the float->int64 conversions in:
    * GridSample<T>::Compute 2D loop (Nearest/Linear/Cubic),
    * GridSample<T>::Compute 3D loop,
    * PrecomputeBilinearSamplePlan2D fast path (substitutes -0.5 so the
      mask logic continues to reject the out-of-range neighbors).
  This eliminates undefined behavior from static_cast<int64_t> on values
  outside [INT64_MIN, INT64_MAX] or NaN/Inf.
- Add <algorithm> and <cmath> includes.
- Expand custom tests to cover NaN, +/-Inf, extreme finite coords across
  bilinear/cubic + reflection, 3D nearest + reflection, bilinear + border,
  and dim=1 with align_corners=1 (range == 0 in GsReflect).
…inates

Extreme grid coordinates could produce out-of-range indices in the reflection branch of PixelAtGrid. Two minimal changes:

1. In GsReflect, widen the int n = static_cast<int>(dx / range) computation to int64_t. For large (but finite) inputs, dx / range can exceed INT_MAX, making the narrow int cast undefined behavior.

2. In PixelAtGrid / PixelAtGrid3D, clamp the int64_t index returned by GsReflect back into [0, W-1] / [0, H-1] / [0, D-1] before indexing the image buffer, guaranteeing the final load stays in bounds.

Adds a regression test covering reflection padding with large-magnitude grid coordinates.
… CPU

Addresses yuslepukhin's review comments on PR microsoft#27975:

- GsReflect: add explicit guard for non-finite input or non-positive range
  (returns x_min) before the existing reduction logic so the int64 cast
  inside GsReflect is never invoked on NaN/Inf or with range == 0.
- Introduce IsSafeForInt64Conversion<T>() helper and sanitize unsafe
  coordinates to the in-range border value (x_min/y_min/z_min) before
  the float->int64 conversions in:
    * GridSample<T>::Compute 2D loop (Nearest/Linear/Cubic),
    * GridSample<T>::Compute 3D loop,
    * PrecomputeBilinearSamplePlan2D fast path (substitutes -0.5 so the
      mask logic continues to reject the out-of-range neighbors).
  This eliminates undefined behavior from static_cast<int64_t> on values
  outside [INT64_MIN, INT64_MAX] or NaN/Inf.
- Add <algorithm> and <cmath> includes.
- Expand custom tests to cover NaN, +/-Inf, extreme finite coords across
  bilinear/cubic + reflection, 3D nearest + reflection, bilinear + border,
  and dim=1 with align_corners=1 (range == 0 in GsReflect).
@GopalakrishnanN GopalakrishnanN force-pushed the dev/FixIntegerOverflow branch from d96e90e to 5e59984 Compare April 30, 2026 21:08
@yuslepukhin
Copy link
Copy Markdown
Member

Need copyright.
Can we rename this file into .cc?
Is it included into multuple files?


Refers to: onnxruntime/test/providers/cpu/tensor/grid_sample_test_custom.inc:3 in 5e59984. [](commit_id = 5e59984, deletion_comment = False)

Promote the custom GridSample tests from a header-style .inc included
into grid_sample_test.cc to a standalone translation unit:
- Add MIT copyright header, includes, and namespace wrapping.
- Move GetExecutionProviders/RunTests helpers into an anonymous
  namespace inside the new file (separate internal-linkage copies, so
  there is no ODR conflict with the versions in grid_sample_test.cc).
- Remove the #include from grid_sample_test.cc.
- Add the new file to the Emscripten exclusion list in
  cmake/onnxruntime_unittests.cmake alongside grid_sample_test.cc.
@GopalakrishnanN GopalakrishnanN force-pushed the dev/FixIntegerOverflow branch from 72b4a55 to bcca1e8 Compare April 30, 2026 22:15
yuslepukhin pushed a commit that referenced this pull request May 2, 2026
…ords (#28302)

## Description

Hardens the CPU `GridSample` operator against undefined behavior caused
by `static_cast<int64_t>` of NaN, ±Inf, or out-of-range floating-point
coordinates. Tracks IcM 31000000575970.

## Motivation and Context

`GridSample` previously fed denormalized grid coordinates directly into
integer casts (`std::floor`, `std::nearbyint`) inside its 2D and 3D
Compute paths, the bilinear fast path, and `GsReflect`. With
pathological grid values (NaN, ±Inf, magnitudes beyond `INT64_MAX`), the
casts invoke C++ undefined behavior. On some toolchains/sanitizers this
corrupts indices, leading to out-of-bounds reads (a security concern)
and on most platforms it just silently returns garbage indices that
`PixelAtGrid` later clamps.

## Changes

### `onnxruntime/core/providers/cpu/tensor/grid_sample.cc`

- **`GsReflect`**: explicit guard that returns `x_min` when the input is
non-finite or `range <= 0` (covers `align_corners=true` with a 1×1 image
where reflection range collapses to zero).
- **New helper `IsSafeForInt64Conversion<T>(v)`**: rejects NaN/Inf and
any magnitude > 2⁶².
- **2D Compute loop**: sanitizes unsafe `x`/`y` to `x_min`/`y_min`
before all `static_cast<int64_t>` calls (Nearest, Linear, Cubic).
- **3D Compute loop**: same sanitization for `x`/`y`/`z`.
- **`PrecomputeBilinearSamplePlan2D`** (bilinear fast path): substitutes
`-0.5` for unsafe coords; the existing mask logic correctly rejects the
resulting out-of-range neighbors.
- Adds `<algorithm>` and `<cmath>` includes.

### `onnxruntime/test/providers/cpu/tensor/grid_sample_test_custom.cc`

- New standalone TU (renamed from a previously included `.inc`) with MIT
copyright header.
- Adds 7 regression tests across `<float, MLFloat16>`:
  - NaN coordinates (nearest + reflection)
  - ±Inf coordinates (bilinear + reflection)
  - Extreme finite coordinates (bilinear + reflection)
- Cubic + reflection with extreme coordinates (float-only by ONNX type
constraint)
  - 3D / 5-D nearest + reflection with extreme coordinates
  - Bilinear + border with extreme coordinates
  - 1×1 image + `align_corners=1` (zero reflection range)

### `cmake/onnxruntime_unittests.cmake`

- Adds the new `.cc` test to the Emscripten exclusion list alongside the
existing `grid_sample_test.cc`.

## Validation

- Local Windows Release build: `cmake --build . --target
onnxruntime_provider_test`.
- All `*GridSample*` tests pass: 116 (96 generated + 8 contrib op + 12
custom across float/MLFloat16).
- Other unrelated test failures observed in the full provider test run
are pre-existing and unrelated to GridSample.

## Related

Supersedes the work in #27975 (left open). Filed via origin remote per
workflow guidance.

---------

Co-authored-by: Gopalakrishnan Nallasamy <gopalakrishnan.nallasamy@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants