Fix integer overflow in GridSample GsReflect causing heap OOB read#27975
Fix integer overflow in GridSample GsReflect causing heap OOB read#27975GopalakrishnanN wants to merge 3 commits intomicrosoft:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Addresses a security vulnerability in the CPU GridSample operator (reflection padding) where extreme/invalid grid coordinates could lead to integer overflow/undefined behavior and subsequent heap out-of-bounds reads.
Changes:
- Updates
GsReflectto handle non-finite inputs and usesfmod+ clamping as a defense-in-depth measure. - Adds extra index clamping in
PixelAtGrid/PixelAtGrid3Dfor reflection padding before buffer access. - Adds regression tests covering extreme, NaN, and Inf grid coordinates for reflection padding.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
onnxruntime/core/providers/cpu/tensor/grid_sample.cc |
Adds non-finite guards, modifies reflection math, and clamps indices before image access. |
onnxruntime/test/providers/cpu/tensor/grid_sample_test_custom.inc |
Adds custom regression tests for extreme/NaN/Inf grid coordinates under reflection padding. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
c8ea7e6 to
02907fb
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
b7368e6 to
1a6f651
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Return 0 early in PixelAtGrid and PixelAtGrid3D when any spatial dimension is <= 0. This avoids std::clamp's lo <= hi precondition being violated (e.g., std::clamp(c, 0, W-1) when W=0) and prevents an out-of-bounds read from the empty image buffer for Border and Reflection padding modes. Addresses review feedback on PR microsoft#27975.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Previously, any finite coordinate whose magnitude exceeded a safe float->int conversion range was short-circuited to zero output in every padding mode. That is incorrect for Border and Reflection, whose spec semantics require clamping or reflecting the coordinate into the valid image range. The same change applies to the CUDA 2D and 3D kernels. Only non-finite coordinates (NaN/Inf) are now forced to zero across all modes; large finite coordinates are reduced in floating point according to the active padding mode before any integer conversion, keeping the later int casts well defined while respecting padding_mode. For Zeros padding, a coordinate whose magnitude overflows a safe integer conversion is well outside the image and still produces zero. Regression tests now cover all three padding modes with extreme finite, NaN, and Inf coordinates. Addresses review feedback on PR microsoft#27975.
e9b0d37 to
a8959a0
Compare
| int n = static_cast<int>(dx / range); | ||
| // Use int64_t rather than int: for extreme (but finite) inputs, dx / range can exceed | ||
| // INT_MAX, making a float->int cast undefined behavior. | ||
| int64_t n = static_cast<int64_t>(dx / range); |
There was a problem hiding this comment.
Here and in all other places with static_cast<>()
static_cast<int64_t>(dx / range) is still UB for extreme inputs
For float, values up to ~3.4e38 are representable. dx / range can produce a value > 9.2e18 (INT64_MAX). The C++ standard says static_cast<int64_t> on a floating-point value outside the representable range is undefined behavior (not just implementation-defined).
For double (which is also a registered type — line 34), the range extends to ~1.8e308, making this trivially exploitable.
The std::clamp in PixelAtGrid catches the fallout in practice, but the UB in GsReflect itself means the compiler is free to do anything. A defensive fix would use std::fmod for the reflection calculation entirely in floating point, avoiding any float-to-integer cast on large values.
#Resolved
Here and also on lines 519-537 The issue report mentions NaN grid coordinates as an attack vector. The PR does not guard against NaN or Inf before the static_cast<int64_t> calls in the Compute function. These casts occur in: Nearest mode (line 420): static_cast<int64_t>(y), static_cast<int64_t>(x) — if x/y is NaN or Inf, this is UB. Refers to: onnxruntime/core/providers/cpu/tensor/grid_sample.cc:437 in 5e59984. [](commit_id = 5e59984, deletion_comment = False) |
Line 206: const int64_t x1 = static_cast<int64_t>(std::floor(x)) — This fast path is only used for mode==Linear && padding_mode==Zeros && !align_corners, so OOB is prevented by the mask check. But the static_cast on NaN/Inf is still UB. The mask logic relies on comparisons like x1 >= 0 && x1 < W_in which will fail for wild values, causing mask = 0 and producing zero output, which is correct behavior. But UB is UB. #Resolved Refers to: onnxruntime/core/providers/cpu/tensor/grid_sample.cc:206 in 5e59984. [](commit_id = 5e59984, deletion_comment = False) |
| // Extreme grid coordinates can overflow that cast, so clamp the resulting indices | ||
| // back into the image range before indexing. | ||
| c = std::clamp<int64_t>(c, 0, W - 1); | ||
| r = std::clamp<int64_t>(r, 0, H - 1); |
There was a problem hiding this comment.
std::clamp requires lo <= hi. When W == 0, the call becomes std::clamp(c, 0, -1) which violates this precondition — the behavior is undefined per the C++ standard.
While the Compute function returns early if Y.Shape().Size() == 0, this check examines N * C * H_out * W_out. If H_in == 0 or W_in == 0 but H_out > 0 and W_out > 0 (possible since grid determines output spatial dims), the code will proceed to call PixelAtGrid with H == 0 or W == 0, hitting this UB.
This applies to both PixelAtGrid and PixelAtGrid3D, in the Border branch (pre-existing) and the new Reflection branch.
Also at other locations for clamp including lines 124-125.
| int n = static_cast<int>(dx / range); | ||
| // Use int64_t rather than int: for extreme (but finite) inputs, dx / range can exceed | ||
| // INT_MAX, making a float->int cast undefined behavior. | ||
| int64_t n = static_cast<int64_t>(dx / range); |
There was a problem hiding this comment.
When align_corners == true and a spatial dimension is 1: x_max = 0, x_min = 0, so range = 0. GsReflect computes dx / range = division by zero. For floating point this produces +Inf, and the subsequent static_cast<int64_t>(Inf) is UB. The std::clamp in PixelAtGrid catches this, but the way clamps are used is not /wo problems (see other comments). #Resolved
The file uses std::clamp (introduced in C++17) but doesn't include Refers to: onnxruntime/core/providers/cpu/tensor/grid_sample.cc:4 in 5e59984. [](commit_id = 5e59984, deletion_comment = False) |
|
Test Coverage Assessment Gaps in test coverage: No NaN/Inf test: The issue report explicitly calls out NaN grid coordinates as an attack vector, yet there is no test with NaN or Inf grid values. This is an important gap since those are distinct code paths (NaN fails all comparisons in GsReflect, Inf has different behavior from large finite values). No test for bilinear/cubic + reflection + extreme coords: The test only covers nearest mode. The bilinear and cubic paths have their own static_cast<int64_t>(std::floor(x)) calls before reaching PixelAtGrid, and those are also vulnerable to NaN/Inf/extreme values. No test for 3D (5-D input) + reflection + extreme coords: The PixelAtGrid3D received the same clamp fix but has no test. No negative test for border padding with extreme coords: While border mode was reportedly safe (via std::clamp), a regression test would ensure it stays safe. No test with dimension size 1 + align_corners=true + reflection (triggers range==0 division-by-zero in GsReflect). |
… CPU Addresses yuslepukhin's review comments on PR microsoft#27975: - GsReflect: add explicit guard for non-finite input or non-positive range (returns x_min) before the existing reduction logic so the int64 cast inside GsReflect is never invoked on NaN/Inf or with range == 0. - Introduce IsSafeForInt64Conversion<T>() helper and sanitize unsafe coordinates to the in-range border value (x_min/y_min/z_min) before the float->int64 conversions in: * GridSample<T>::Compute 2D loop (Nearest/Linear/Cubic), * GridSample<T>::Compute 3D loop, * PrecomputeBilinearSamplePlan2D fast path (substitutes -0.5 so the mask logic continues to reject the out-of-range neighbors). This eliminates undefined behavior from static_cast<int64_t> on values outside [INT64_MIN, INT64_MAX] or NaN/Inf. - Add <algorithm> and <cmath> includes. - Expand custom tests to cover NaN, +/-Inf, extreme finite coords across bilinear/cubic + reflection, 3D nearest + reflection, bilinear + border, and dim=1 with align_corners=1 (range == 0 in GsReflect).
…inates Extreme grid coordinates could produce out-of-range indices in the reflection branch of PixelAtGrid. Two minimal changes: 1. In GsReflect, widen the int n = static_cast<int>(dx / range) computation to int64_t. For large (but finite) inputs, dx / range can exceed INT_MAX, making the narrow int cast undefined behavior. 2. In PixelAtGrid / PixelAtGrid3D, clamp the int64_t index returned by GsReflect back into [0, W-1] / [0, H-1] / [0, D-1] before indexing the image buffer, guaranteeing the final load stays in bounds. Adds a regression test covering reflection padding with large-magnitude grid coordinates.
… CPU Addresses yuslepukhin's review comments on PR microsoft#27975: - GsReflect: add explicit guard for non-finite input or non-positive range (returns x_min) before the existing reduction logic so the int64 cast inside GsReflect is never invoked on NaN/Inf or with range == 0. - Introduce IsSafeForInt64Conversion<T>() helper and sanitize unsafe coordinates to the in-range border value (x_min/y_min/z_min) before the float->int64 conversions in: * GridSample<T>::Compute 2D loop (Nearest/Linear/Cubic), * GridSample<T>::Compute 3D loop, * PrecomputeBilinearSamplePlan2D fast path (substitutes -0.5 so the mask logic continues to reject the out-of-range neighbors). This eliminates undefined behavior from static_cast<int64_t> on values outside [INT64_MIN, INT64_MAX] or NaN/Inf. - Add <algorithm> and <cmath> includes. - Expand custom tests to cover NaN, +/-Inf, extreme finite coords across bilinear/cubic + reflection, 3D nearest + reflection, bilinear + border, and dim=1 with align_corners=1 (range == 0 in GsReflect).
d96e90e to
5e59984
Compare
Promote the custom GridSample tests from a header-style .inc included into grid_sample_test.cc to a standalone translation unit: - Add MIT copyright header, includes, and namespace wrapping. - Move GetExecutionProviders/RunTests helpers into an anonymous namespace inside the new file (separate internal-linkage copies, so there is no ODR conflict with the versions in grid_sample_test.cc). - Remove the #include from grid_sample_test.cc. - Add the new file to the Emscripten exclusion list in cmake/onnxruntime_unittests.cmake alongside grid_sample_test.cc.
72b4a55 to
bcca1e8
Compare
…ords (#28302) ## Description Hardens the CPU `GridSample` operator against undefined behavior caused by `static_cast<int64_t>` of NaN, ±Inf, or out-of-range floating-point coordinates. Tracks IcM 31000000575970. ## Motivation and Context `GridSample` previously fed denormalized grid coordinates directly into integer casts (`std::floor`, `std::nearbyint`) inside its 2D and 3D Compute paths, the bilinear fast path, and `GsReflect`. With pathological grid values (NaN, ±Inf, magnitudes beyond `INT64_MAX`), the casts invoke C++ undefined behavior. On some toolchains/sanitizers this corrupts indices, leading to out-of-bounds reads (a security concern) and on most platforms it just silently returns garbage indices that `PixelAtGrid` later clamps. ## Changes ### `onnxruntime/core/providers/cpu/tensor/grid_sample.cc` - **`GsReflect`**: explicit guard that returns `x_min` when the input is non-finite or `range <= 0` (covers `align_corners=true` with a 1×1 image where reflection range collapses to zero). - **New helper `IsSafeForInt64Conversion<T>(v)`**: rejects NaN/Inf and any magnitude > 2⁶². - **2D Compute loop**: sanitizes unsafe `x`/`y` to `x_min`/`y_min` before all `static_cast<int64_t>` calls (Nearest, Linear, Cubic). - **3D Compute loop**: same sanitization for `x`/`y`/`z`. - **`PrecomputeBilinearSamplePlan2D`** (bilinear fast path): substitutes `-0.5` for unsafe coords; the existing mask logic correctly rejects the resulting out-of-range neighbors. - Adds `<algorithm>` and `<cmath>` includes. ### `onnxruntime/test/providers/cpu/tensor/grid_sample_test_custom.cc` - New standalone TU (renamed from a previously included `.inc`) with MIT copyright header. - Adds 7 regression tests across `<float, MLFloat16>`: - NaN coordinates (nearest + reflection) - ±Inf coordinates (bilinear + reflection) - Extreme finite coordinates (bilinear + reflection) - Cubic + reflection with extreme coordinates (float-only by ONNX type constraint) - 3D / 5-D nearest + reflection with extreme coordinates - Bilinear + border with extreme coordinates - 1×1 image + `align_corners=1` (zero reflection range) ### `cmake/onnxruntime_unittests.cmake` - Adds the new `.cc` test to the Emscripten exclusion list alongside the existing `grid_sample_test.cc`. ## Validation - Local Windows Release build: `cmake --build . --target onnxruntime_provider_test`. - All `*GridSample*` tests pass: 116 (96 generated + 8 contrib op + 12 custom across float/MLFloat16). - Other unrelated test failures observed in the full provider test run are pre-existing and unrelated to GridSample. ## Related Supersedes the work in #27975 (left open). Filed via origin remote per workflow guidance. --------- Co-authored-by: Gopalakrishnan Nallasamy <gopalakrishnan.nallasamy@microsoft.com>
Description
The reflection branch of
GridSample'sPixelAtGrid/PixelAtGrid3Dcould produce out-of-range indices when grid coordinates were extreme. Two minimal changes:GsReflect, widenint n = static_cast<int>(dx / range)toint64_t n = static_cast<int64_t>(dx / range)in both branches. For large (but finite) inputsdx / rangecan exceedINT_MAX, making the narrowintcast undefined behavior.PixelAtGrid/PixelAtGrid3D, clamp theint64_tindex returned byGsReflectback into[0, W-1]/[0, H-1]/[0, D-1]before indexing the image buffer. This keeps the final load in bounds as a defense-in-depth measure for any remaining edge case (coordinates ≥ 2^63,NaN,Inf, etc.).Motivation and Context
Reflection padding indexes the image with the result of
GsReflect. The internal narrowintcast on a large floating-point value is undefined behavior and could yield a wild index, leading to an out-of-bounds read on the input image buffer.Tests
Adds one regression test,
test_grid_sample_20_4D_nearest_reflection_extreme_coords, that runsGridSamplewithmode=nearestandpadding_mode=reflectionon a 2x2 constant-valued image with grid coordinates of magnitude 1e+10 (well aboveINT_MAX). It is parameterized overfloatandMLFloat16and asserts the kernel returns the constant value, which can only happen if every sampled index is in bounds.All existing
*GridSample*tests continue to pass (102/102 locally).Files changed
onnxruntime/core/providers/cpu/tensor/grid_sample.cconnxruntime/test/providers/cpu/tensor/grid_sample_test_custom.inc