Fix integer overflow in GridSample GsReflect causing heap OOB read by GopalakrishnanN · Pull Request #27975 · microsoft/onnxruntime

GopalakrishnanN · 2026-04-03T23:45:35Z

Description

The reflection branch of GridSample's PixelAtGrid / PixelAtGrid3D could produce out-of-range indices when grid coordinates were extreme. Two minimal changes:

In GsReflect, widen int n = static_cast<int>(dx / range) to int64_t n = static_cast<int64_t>(dx / range) in both branches. For large (but finite) inputs dx / range can exceed INT_MAX, making the narrow int cast undefined behavior.
In PixelAtGrid / PixelAtGrid3D, clamp the int64_t index returned by GsReflect back into [0, W-1] / [0, H-1] / [0, D-1] before indexing the image buffer. This keeps the final load in bounds as a defense-in-depth measure for any remaining edge case (coordinates ≥ 2^63, NaN, Inf, etc.).

Motivation and Context

Reflection padding indexes the image with the result of GsReflect. The internal narrow int cast on a large floating-point value is undefined behavior and could yield a wild index, leading to an out-of-bounds read on the input image buffer.

Tests

Adds one regression test, test_grid_sample_20_4D_nearest_reflection_extreme_coords, that runs GridSample with mode=nearest and padding_mode=reflection on a 2x2 constant-valued image with grid coordinates of magnitude 1e+10 (well above INT_MAX). It is parameterized over float and MLFloat16 and asserts the kernel returns the constant value, which can only happen if every sampled index is in bounds.

All existing *GridSample* tests continue to pass (102/102 locally).

Files changed

onnxruntime/core/providers/cpu/tensor/grid_sample.cc
onnxruntime/test/providers/cpu/tensor/grid_sample_test_custom.inc

Copilot

Pull request overview

Addresses a security vulnerability in the CPU GridSample operator (reflection padding) where extreme/invalid grid coordinates could lead to integer overflow/undefined behavior and subsequent heap out-of-bounds reads.

Changes:

Updates GsReflect to handle non-finite inputs and uses fmod + clamping as a defense-in-depth measure.
Adds extra index clamping in PixelAtGrid / PixelAtGrid3D for reflection padding before buffer access.
Adds regression tests covering extreme, NaN, and Inf grid coordinates for reflection padding.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File	Description
`onnxruntime/core/providers/cpu/tensor/grid_sample.cc`	Adds non-finite guards, modifies reflection math, and clamps indices before image access.
`onnxruntime/test/providers/cpu/tensor/grid_sample_test_custom.inc`	Adds custom regression tests for extreme/NaN/Inf grid coordinates under reflection padding.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Return 0 early in PixelAtGrid and PixelAtGrid3D when any spatial dimension is <= 0. This avoids std::clamp's lo <= hi precondition being violated (e.g., std::clamp(c, 0, W-1) when W=0) and prevents an out-of-bounds read from the empty image buffer for Border and Reflection padding modes. Addresses review feedback on PR microsoft#27975.

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Previously, any finite coordinate whose magnitude exceeded a safe float->int conversion range was short-circuited to zero output in every padding mode. That is incorrect for Border and Reflection, whose spec semantics require clamping or reflecting the coordinate into the valid image range. The same change applies to the CUDA 2D and 3D kernels. Only non-finite coordinates (NaN/Inf) are now forced to zero across all modes; large finite coordinates are reduced in floating point according to the active padding mode before any integer conversion, keeping the later int casts well defined while respecting padding_mode. For Zeros padding, a coordinate whose magnitude overflows a safe integer conversion is well outside the image and still produces zero. Regression tests now cover all three padding modes with extreme finite, NaN, and Inf coordinates. Addresses review feedback on PR microsoft#27975.

yuslepukhin · 2026-04-24T02:23:09Z

-    int n = static_cast<int>(dx / range);
+    // Use int64_t rather than int: for extreme (but finite) inputs, dx / range can exceed
+    // INT_MAX, making a float->int cast undefined behavior.
+    int64_t n = static_cast<int64_t>(dx / range);


tatic_cast<int64_t>(dx / range);

Here and in all other places with static_cast<>()

static_cast<int64_t>(dx / range) is still UB for extreme inputs

For float, values up to ~3.4e38 are representable. dx / range can produce a value > 9.2e18 (INT64_MAX). The C++ standard says static_cast<int64_t> on a floating-point value outside the representable range is undefined behavior (not just implementation-defined).

For double (which is also a registered type — line 34), the range extends to ~1.8e308, making this trivially exploitable.

The std::clamp in PixelAtGrid catches the fallout in practice, but the UB in GsReflect itself means the compiler is free to do anything. A defensive fix would use std::fmod for the reflection calculation entirely in floating point, avoiding any float-to-integer cast on large values.

#Resolved

yuslepukhin · 2026-04-24T02:27:47Z

                int64_t y0 = static_cast<int64_t>(std::floor(y)) - 1;

Here and also on lines 519-537

The issue report mentions NaN grid coordinates as an attack vector. The PR does not guard against NaN or Inf before the static_cast<int64_t> calls in the Compute function. These casts occur in:

Nearest mode (line 420): static_cast<int64_t>(y), static_cast<int64_t>(x) — if x/y is NaN or Inf, this is UB.
Linear mode (line 423): static_cast<int64_t>(std::floor(x)) — std::floor(NaN) returns NaN, and static_cast<int64_t>(NaN) is UB.
Cubic mode (line 436): Same pattern.
3D paths (lines 519-537): Same pattern.
For zeros padding mode, the subsequent bounds check in PixelAtGrid (c >= 0 && c < W) will fail for wild int64 values, returning 0 — so no OOB. For border mode, std::clamp normalizes wild values. For reflection, the new std::clamp normalizes them. So the defense-in-depth clamp makes this safe in practice on most compilers, but it's still technically UB #Resolved

Refers to: onnxruntime/core/providers/cpu/tensor/grid_sample.cc:437 in 5e59984. [](commit_id = 5e59984, deletion_comment = False)

yuslepukhin · 2026-04-24T02:29:30Z

const int64_t y1 = static_cast<int64_t>(std::floor(y));

Line 206: const int64_t x1 = static_cast<int64_t>(std::floor(x)) — This fast path is only used for mode==Linear && padding_mode==Zeros && !align_corners, so OOB is prevented by the mask check. But the static_cast on NaN/Inf is still UB. The mask logic relies on comparisons like x1 >= 0 && x1 < W_in which will fail for wild values, causing mask = 0 and producing zero output, which is correct behavior. But UB is UB. #Resolved

Refers to: onnxruntime/core/providers/cpu/tensor/grid_sample.cc:206 in 5e59984. [](commit_id = 5e59984, deletion_comment = False)

yuslepukhin · 2026-04-24T02:31:28Z

+    // Extreme grid coordinates can overflow that cast, so clamp the resulting indices
+    // back into the image range before indexing.
+    c = std::clamp<int64_t>(c, 0, W - 1);
+    r = std::clamp<int64_t>(r, 0, H - 1);


std::clamp requires lo <= hi. When W == 0, the call becomes std::clamp(c, 0, -1) which violates this precondition — the behavior is undefined per the C++ standard.

While the Compute function returns early if Y.Shape().Size() == 0, this check examines N * C * H_out * W_out. If H_in == 0 or W_in == 0 but H_out > 0 and W_out > 0 (possible since grid determines output spatial dims), the code will proceed to call PixelAtGrid with H == 0 or W == 0, hitting this UB.

This applies to both PixelAtGrid and PixelAtGrid3D, in the Border branch (pre-existing) and the new Reflection branch.

Also at other locations for clamp including lines 124-125.

yuslepukhin · 2026-04-24T02:33:07Z

-    int n = static_cast<int>(dx / range);
+    // Use int64_t rather than int: for extreme (but finite) inputs, dx / range can exceed
+    // INT_MAX, making a float->int cast undefined behavior.
+    int64_t n = static_cast<int64_t>(dx / range);


int64_t n = static_cast<int64_t>(dx / range);

When align_corners == true and a spatial dimension is 1: x_max = 0, x_min = 0, so range = 0. GsReflect computes dx / range = division by zero. For floating point this produces +Inf, and the subsequent static_cast<int64_t>(Inf) is UB. The std::clamp in PixelAtGrid catches this, but the way clamps are used is not /wo problems (see other comments). #Resolved

yuslepukhin · 2026-04-24T02:33:34Z

#include

The file uses std::clamp (introduced in C++17) but doesn't include <algorithm>. This works today because it's transitively included, but is non-portable and could break on different standard library implementations. #Resolved

Refers to: onnxruntime/core/providers/cpu/tensor/grid_sample.cc:4 in 5e59984. [](commit_id = 5e59984, deletion_comment = False)

yuslepukhin · 2026-04-24T02:34:43Z

Test Coverage Assessment
For the fix specifically: The added test (test_grid_sample_20_4D_nearest_reflection_extreme_coords) is reasonable — it uses coordinates of magnitude 1e+10 (exceeding INT_MAX) with a constant-valued input image so the expected output is deterministic regardless of which reflected pixel is hit. It covers both float and MLFloat16.

Gaps in test coverage:

No NaN/Inf test: The issue report explicitly calls out NaN grid coordinates as an attack vector, yet there is no test with NaN or Inf grid values. This is an important gap since those are distinct code paths (NaN fails all comparisons in GsReflect, Inf has different behavior from large finite values).

No test for bilinear/cubic + reflection + extreme coords: The test only covers nearest mode. The bilinear and cubic paths have their own static_cast<int64_t>(std::floor(x)) calls before reaching PixelAtGrid, and those are also vulnerable to NaN/Inf/extreme values.

No test for 3D (5-D input) + reflection + extreme coords: The PixelAtGrid3D received the same clamp fix but has no test.

No negative test for border padding with extreme coords: While border mode was reportedly safe (via std::clamp), a regression test would ensure it stays safe.

No test with dimension size 1 + align_corners=true + reflection (triggers range==0 division-by-zero in GsReflect).

… CPU Addresses yuslepukhin's review comments on PR microsoft#27975: - GsReflect: add explicit guard for non-finite input or non-positive range (returns x_min) before the existing reduction logic so the int64 cast inside GsReflect is never invoked on NaN/Inf or with range == 0. - Introduce IsSafeForInt64Conversion<T>() helper and sanitize unsafe coordinates to the in-range border value (x_min/y_min/z_min) before the float->int64 conversions in: * GridSample<T>::Compute 2D loop (Nearest/Linear/Cubic), * GridSample<T>::Compute 3D loop, * PrecomputeBilinearSamplePlan2D fast path (substitutes -0.5 so the mask logic continues to reject the out-of-range neighbors). This eliminates undefined behavior from static_cast<int64_t> on values outside [INT64_MIN, INT64_MAX] or NaN/Inf. - Add <algorithm> and <cmath> includes. - Expand custom tests to cover NaN, +/-Inf, extreme finite coords across bilinear/cubic + reflection, 3D nearest + reflection, bilinear + border, and dim=1 with align_corners=1 (range == 0 in GsReflect).

…inates Extreme grid coordinates could produce out-of-range indices in the reflection branch of PixelAtGrid. Two minimal changes: 1. In GsReflect, widen the int n = static_cast<int>(dx / range) computation to int64_t. For large (but finite) inputs, dx / range can exceed INT_MAX, making the narrow int cast undefined behavior. 2. In PixelAtGrid / PixelAtGrid3D, clamp the int64_t index returned by GsReflect back into [0, W-1] / [0, H-1] / [0, D-1] before indexing the image buffer, guaranteeing the final load stays in bounds. Adds a regression test covering reflection padding with large-magnitude grid coordinates.

… CPU Addresses yuslepukhin's review comments on PR microsoft#27975: - GsReflect: add explicit guard for non-finite input or non-positive range (returns x_min) before the existing reduction logic so the int64 cast inside GsReflect is never invoked on NaN/Inf or with range == 0. - Introduce IsSafeForInt64Conversion<T>() helper and sanitize unsafe coordinates to the in-range border value (x_min/y_min/z_min) before the float->int64 conversions in: * GridSample<T>::Compute 2D loop (Nearest/Linear/Cubic), * GridSample<T>::Compute 3D loop, * PrecomputeBilinearSamplePlan2D fast path (substitutes -0.5 so the mask logic continues to reject the out-of-range neighbors). This eliminates undefined behavior from static_cast<int64_t> on values outside [INT64_MIN, INT64_MAX] or NaN/Inf. - Add <algorithm> and <cmath> includes. - Expand custom tests to cover NaN, +/-Inf, extreme finite coords across bilinear/cubic + reflection, 3D nearest + reflection, bilinear + border, and dim=1 with align_corners=1 (range == 0 in GsReflect).

yuslepukhin · 2026-04-30T21:13:40Z

Need copyright.
Can we rename this file into .cc?
Is it included into multuple files?

Refers to: onnxruntime/test/providers/cpu/tensor/grid_sample_test_custom.inc:3 in 5e59984. [](commit_id = 5e59984, deletion_comment = False)

Promote the custom GridSample tests from a header-style .inc included into grid_sample_test.cc to a standalone translation unit: - Add MIT copyright header, includes, and namespace wrapping. - Move GetExecutionProviders/RunTests helpers into an anonymous namespace inside the new file (separate internal-linkage copies, so there is no ODR conflict with the versions in grid_sample_test.cc). - Remove the #include from grid_sample_test.cc. - Add the new file to the Emscripten exclusion list in cmake/onnxruntime_unittests.cmake alongside grid_sample_test.cc.

…ords (#28302) ## Description Hardens the CPU `GridSample` operator against undefined behavior caused by `static_cast<int64_t>` of NaN, ±Inf, or out-of-range floating-point coordinates. Tracks IcM 31000000575970. ## Motivation and Context `GridSample` previously fed denormalized grid coordinates directly into integer casts (`std::floor`, `std::nearbyint`) inside its 2D and 3D Compute paths, the bilinear fast path, and `GsReflect`. With pathological grid values (NaN, ±Inf, magnitudes beyond `INT64_MAX`), the casts invoke C++ undefined behavior. On some toolchains/sanitizers this corrupts indices, leading to out-of-bounds reads (a security concern) and on most platforms it just silently returns garbage indices that `PixelAtGrid` later clamps. ## Changes ### `onnxruntime/core/providers/cpu/tensor/grid_sample.cc` - **`GsReflect`**: explicit guard that returns `x_min` when the input is non-finite or `range <= 0` (covers `align_corners=true` with a 1×1 image where reflection range collapses to zero). - **New helper `IsSafeForInt64Conversion<T>(v)`**: rejects NaN/Inf and any magnitude > 2⁶². - **2D Compute loop**: sanitizes unsafe `x`/`y` to `x_min`/`y_min` before all `static_cast<int64_t>` calls (Nearest, Linear, Cubic). - **3D Compute loop**: same sanitization for `x`/`y`/`z`. - **`PrecomputeBilinearSamplePlan2D`** (bilinear fast path): substitutes `-0.5` for unsafe coords; the existing mask logic correctly rejects the resulting out-of-range neighbors. - Adds `<algorithm>` and `<cmath>` includes. ### `onnxruntime/test/providers/cpu/tensor/grid_sample_test_custom.cc` - New standalone TU (renamed from a previously included `.inc`) with MIT copyright header. - Adds 7 regression tests across `<float, MLFloat16>`: - NaN coordinates (nearest + reflection) - ±Inf coordinates (bilinear + reflection) - Extreme finite coordinates (bilinear + reflection) - Cubic + reflection with extreme coordinates (float-only by ONNX type constraint) - 3D / 5-D nearest + reflection with extreme coordinates - Bilinear + border with extreme coordinates - 1×1 image + `align_corners=1` (zero reflection range) ### `cmake/onnxruntime_unittests.cmake` - Adds the new `.cc` test to the Emscripten exclusion list alongside the existing `grid_sample_test.cc`. ## Validation - Local Windows Release build: `cmake --build . --target onnxruntime_provider_test`. - All `*GridSample*` tests pass: 116 (96 generated + 8 contrib op + 12 custom across float/MLFloat16). - Other unrelated test failures observed in the full provider test run are pre-existing and unrelated to GridSample. ## Related Supersedes the work in #27975 (left open). Filed via origin remote per workflow guidance. --------- Co-authored-by: Gopalakrishnan Nallasamy <gopalakrishnan.nallasamy@microsoft.com>

GopalakrishnanN requested a review from Copilot April 9, 2026 00:29

Copilot started reviewing on behalf of GopalakrishnanN April 9, 2026 00:30 View session

Copilot AI reviewed Apr 9, 2026

View reviewed changes

GopalakrishnanN force-pushed the dev/FixIntegerOverflow branch from c8ea7e6 to 02907fb Compare April 16, 2026 20:44

GopalakrishnanN requested a review from Copilot April 16, 2026 21:09

Copilot started reviewing on behalf of GopalakrishnanN April 16, 2026 21:10 View session

Copilot AI reviewed Apr 16, 2026

View reviewed changes

Comment thread onnxruntime/core/providers/cpu/tensor/grid_sample.cc Outdated

Comment thread onnxruntime/core/providers/cpu/tensor/grid_sample.cc Outdated

Comment thread onnxruntime/test/providers/cpu/tensor/grid_sample_test_custom.inc Outdated

GopalakrishnanN force-pushed the dev/FixIntegerOverflow branch 2 times, most recently from b7368e6 to 1a6f651 Compare April 17, 2026 02:05

GopalakrishnanN requested a review from Copilot April 17, 2026 03:47

Copilot started reviewing on behalf of GopalakrishnanN April 17, 2026 03:48 View session

Copilot AI reviewed Apr 17, 2026

View reviewed changes

Comment thread onnxruntime/core/providers/cpu/tensor/grid_sample.cc

Comment thread .github/workflows/react_native.yml Outdated

GopalakrishnanN requested a review from Copilot April 17, 2026 04:07

Copilot started reviewing on behalf of GopalakrishnanN April 17, 2026 04:08 View session

Copilot AI reviewed Apr 17, 2026

View reviewed changes

Comment thread onnxruntime/core/providers/cpu/tensor/grid_sample.cc

GopalakrishnanN requested a review from Copilot April 17, 2026 04:46

Copilot started reviewing on behalf of GopalakrishnanN April 17, 2026 04:46 View session

Copilot AI reviewed Apr 17, 2026

View reviewed changes

Comment thread onnxruntime/core/providers/cuda/tensor/grid_sample_impl.cu Outdated

Comment thread .github/workflows/react_native.yml Outdated

GopalakrishnanN force-pushed the dev/FixIntegerOverflow branch from e9b0d37 to a8959a0 Compare April 23, 2026 23:12

yuslepukhin reviewed Apr 24, 2026

View reviewed changes

GopalakrishnanN force-pushed the dev/FixIntegerOverflow branch from d96e90e to 5e59984 Compare April 30, 2026 21:08

GopalakrishnanN force-pushed the dev/FixIntegerOverflow branch from 72b4a55 to bcca1e8 Compare April 30, 2026 22:15

GopalakrishnanN mentioned this pull request Apr 30, 2026

GridSample: harden float->int64 casts against NaN/Inf/out-of-range coords #28302

Merged

Conversation

GopalakrishnanN commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

Tests

Files changed

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

yuslepukhin Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yuslepukhin commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yuslepukhin commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yuslepukhin Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

yuslepukhin Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yuslepukhin commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yuslepukhin commented Apr 24, 2026

Uh oh!

yuslepukhin commented Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

GopalakrishnanN commented Apr 3, 2026 •

edited

Loading

yuslepukhin Apr 24, 2026 •

edited

Loading

yuslepukhin commented Apr 24, 2026 •

edited

Loading

yuslepukhin commented Apr 24, 2026 •

edited

Loading

yuslepukhin Apr 24, 2026 •

edited

Loading

yuslepukhin commented Apr 24, 2026 •

edited

Loading