GridSample: harden float->int64 casts against NaN/Inf/out-of-range coords#28302
Merged
yuslepukhin merged 6 commits intomainfrom May 2, 2026
Merged
GridSample: harden float->int64 casts against NaN/Inf/out-of-range coords#28302yuslepukhin merged 6 commits intomainfrom
yuslepukhin merged 6 commits intomainfrom
Conversation
added 3 commits
April 30, 2026 14:07
…inates Extreme grid coordinates could produce out-of-range indices in the reflection branch of PixelAtGrid. Two minimal changes: 1. In GsReflect, widen the int n = static_cast<int>(dx / range) computation to int64_t. For large (but finite) inputs, dx / range can exceed INT_MAX, making the narrow int cast undefined behavior. 2. In PixelAtGrid / PixelAtGrid3D, clamp the int64_t index returned by GsReflect back into [0, W-1] / [0, H-1] / [0, D-1] before indexing the image buffer, guaranteeing the final load stays in bounds. Adds a regression test covering reflection padding with large-magnitude grid coordinates.
… CPU Addresses yuslepukhin's review comments on PR #27975: - GsReflect: add explicit guard for non-finite input or non-positive range (returns x_min) before the existing reduction logic so the int64 cast inside GsReflect is never invoked on NaN/Inf or with range == 0. - Introduce IsSafeForInt64Conversion<T>() helper and sanitize unsafe coordinates to the in-range border value (x_min/y_min/z_min) before the float->int64 conversions in: * GridSample<T>::Compute 2D loop (Nearest/Linear/Cubic), * GridSample<T>::Compute 3D loop, * PrecomputeBilinearSamplePlan2D fast path (substitutes -0.5 so the mask logic continues to reject the out-of-range neighbors). This eliminates undefined behavior from static_cast<int64_t> on values outside [INT64_MIN, INT64_MAX] or NaN/Inf. - Add <algorithm> and <cmath> includes. - Expand custom tests to cover NaN, +/-Inf, extreme finite coords across bilinear/cubic + reflection, 3D nearest + reflection, bilinear + border, and dim=1 with align_corners=1 (range == 0 in GsReflect).
Promote the custom GridSample tests from a header-style .inc included into grid_sample_test.cc to a standalone translation unit: - Add MIT copyright header, includes, and namespace wrapping. - Move GetExecutionProviders/RunTests helpers into an anonymous namespace inside the new file (separate internal-linkage copies, so there is no ODR conflict with the versions in grid_sample_test.cc). - Remove the #include from grid_sample_test.cc. - Add the new file to the Emscripten exclusion list in cmake/onnxruntime_unittests.cmake alongside grid_sample_test.cc.
The pathological-coordinate regression tests (NaN/Inf/extreme grid values) target CPU-side hardening. Running them against non-CPU EPs (CUDA/CoreML/iOS/WebGPU) introduces unrelated backend behavior and broad provider-test failure cascades. Restrict those tests to the CPU EP via a new RunCpuOnly helper. The cubic test is also converted to a non-typed TEST over float only, since ONNX GridSample cubic mode constrains T1 to float (the MLFloat16 instantiation was a no-op). The two boundary-condition tests (linear_zeros_mixed_bounds_*) continue to run on all available EPs as cross-EP correctness checks.
Member
|
Gaps in test coverage:
|
Member
CUDA is not fixed.The CUDA implementation in grid_sample_impl.cu has the exact same vulnerability (this is a contrib op under onnxruntime::contrib::cuda): Additionally:
WebGPU/JS implementations are not vulnerable — WGSL integer conversions have defined saturation semantics. File a follow-up for CUDA. |
yuslepukhin
reviewed
May 1, 2026
Add 7 new CPU-only regression tests for the float->int64 cast hardening: - linear+zeros+extreme/NaN/Inf: confirms IsSafeForInt64Conversion does not break the bilinear zeros-padding fast path. - cubic+NaN/Inf: covers NaN/+Inf/-Inf paths through the cubic kernel (float-only). - 5D linear+extreme: 3D trilinear path with extreme finite coordinates. - 5D nearest+NaN/Inf: 3D path with non-finite values along different axes. - align_corners=1 non-degenerate (3x3) image with extreme/NaN coords: verifies sanitization to x_min=0 maps to pixel(0,0). - negative-only extreme on non-constant image: verifies sanitization is direction-agnostic and rounds to pixel(0,0). - mixed normal+pathological grid in one call on a non-constant image: verifies normal points sample correctly and only unsafe coords are sanitized.
Contributor
Author
|
Work item to cover the same vulnerability in CUDA - https://aiinfra.visualstudio.com/ONNX%20Runtime/_workitems/edit/93864 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Hardens the CPU
GridSampleoperator against undefined behavior caused bystatic_cast<int64_t>of NaN, ±Inf, or out-of-range floating-point coordinates. Tracks IcM 31000000575970.Motivation and Context
GridSamplepreviously fed denormalized grid coordinates directly into integer casts (std::floor,std::nearbyint) inside its 2D and 3D Compute paths, the bilinear fast path, andGsReflect. With pathological grid values (NaN, ±Inf, magnitudes beyondINT64_MAX), the casts invoke C++ undefined behavior. On some toolchains/sanitizers this corrupts indices, leading to out-of-bounds reads (a security concern) and on most platforms it just silently returns garbage indices thatPixelAtGridlater clamps.Changes
onnxruntime/core/providers/cpu/tensor/grid_sample.ccGsReflect: explicit guard that returnsx_minwhen the input is non-finite orrange <= 0(coversalign_corners=truewith a 1×1 image where reflection range collapses to zero).IsSafeForInt64Conversion<T>(v): rejects NaN/Inf and any magnitude > 2⁶².x/ytox_min/y_minbefore allstatic_cast<int64_t>calls (Nearest, Linear, Cubic).x/y/z.PrecomputeBilinearSamplePlan2D(bilinear fast path): substitutes-0.5for unsafe coords; the existing mask logic correctly rejects the resulting out-of-range neighbors.<algorithm>and<cmath>includes.onnxruntime/test/providers/cpu/tensor/grid_sample_test_custom.cc.inc) with MIT copyright header.<float, MLFloat16>:align_corners=1(zero reflection range)cmake/onnxruntime_unittests.cmake.cctest to the Emscripten exclusion list alongside the existinggrid_sample_test.cc.Validation
cmake --build . --target onnxruntime_provider_test.*GridSample*tests pass: 116 (96 generated + 8 contrib op + 12 custom across float/MLFloat16).Related
Supersedes the work in #27975 (left open). Filed via origin remote per workflow guidance.