fix: out of bounds access for resize operation by lukas-folle-snkeos · Pull Request #27419 · microsoft/onnxruntime

lukas-folle-snkeos · 2026-02-23T09:31:39Z

Description

This PR fixes:

An out-of-bounds write in CUDA Resize for LINEAR mode when running trilinear paths (3D/5D)
A race condition for the reduction kernel

Root cause

The temporary dims-mapping buffer for LINEAR mode was sized using only H+W, while the trilinear coordinate mapping kernel writes D+H+W entries.
shared-memory race in the block-level reduction loop inside reduction_functions.cu. The condition allowed threads outside the active lower half to update shared memory in the same stride phase, creating overlapping read/write hazards

My colleague @korbinian-mechlem-snkeos noticed this warning from compute-sanitzer

========= Invalid global write of size 4 bytes
========= at void onnxruntime::cuda::_ResizeTrilinearCoordinateMapping<float, onnxruntime::cuda::TransformCoordinate_HALF_PIXEL>(long long, long long, long long, long long, long long, long long, float, float, float, float, float, float, float, float, float, unsigned long long, bool, const T2 &, onnxruntime::cuda::LinearMappingInfo *)+0x400
========= by thread (17,0,0) in block (2,0,0)
========= Address 0xb28fff7cc is out of bounds
========= and is 205 bytes after the nearest allocation at 0xb28fff400 of size 768 bytes
========= Saved host backtrace up to driver entry point at kernel launch time

AND

========= Warning: Race reported between Read access at void onnxruntime::cuda::detail::reduce_matrix_columns_kernel<float, float, float, onnxruntime::cuda::Identity, onnxruntime::cuda::Identity, (bool)0>(int, int, const T1 *, T2 *, T3 *, int *)+0xe80
========= and Write access at void onnxruntime::cuda::detail::reduce_matrix_columns_kernel<float, float, float, onnxruntime::cuda::Identity, onnxruntime::cuda::Identity, (bool)0>(int, int, const T1 *, T2 *, T3 *, int *)+0xea0 [337920 hazards]

Motivation and Context

Update LINEAR buffer size calculation to:

use H+W for bilinear (2D/4D)
use D+H+W for trilinear (3D/5D)

Prevents invalid global writes and intermittent CUDA memory errors in trilinear resize workloads.

@johannes-rehm-snkeos

lukas-folle-snkeos · 2026-02-23T09:32:19Z

@microsoft-github-policy-service agree

xadupre · 2026-02-23T16:22:57Z

A unit test would be good.

lukas-folle-snkeos · 2026-02-24T16:08:36Z

@xadupre our internal testing showed that the onnxruntime memory context corruption did indeed disappear with those two fixes in place.

Copilot

Pull request overview

Fixes two CUDA correctness issues identified via compute-sanitizer: an out-of-bounds write in Resize LINEAR trilinear coordinate mapping (3D/5D paths) and a shared-memory race in the reduction block-level reduction loop.

Changes:

Adjust CUDA Resize LINEAR temp buffer sizing to account for trilinear mapping requiring D+H+W entries.
Fix shared-memory reduction loop condition to avoid overlapping reads/writes within a reduction stride.
Add regression tests covering repeated column-reduction execution and a CUDA-only 5D trilinear Resize case.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File	Description
`onnxruntime/test/providers/cuda/test_cases/reduction_functions_test.cc`	Adds a repeated-run test to help catch intermittent reduction race issues.
`onnxruntime/test/providers/cpu/tensor/resize_op_test.cc`	Adds a CUDA-only trilinear Resize regression test exercising the 5D linear path.
`onnxruntime/core/providers/cuda/tensor/resize_impl.cu`	Fixes LINEAR mapping buffer size calculation for trilinear (3D/5D) to prevent OOB writes.
`onnxruntime/core/providers/cuda/reduction/reduction_functions.cu`	Fixes block-level reduction condition to prevent shared-memory race hazards.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

tianleiwu · 2026-02-24T19:35:30Z

/azp run Linux QNN CI Pipeline, Win_TRT_Minimal_CUDA_Test_CI, Windows ARM64 QNN CI Pipeline, Windows GPU Doc Gen CI Pipeline

azure-pipelines · 2026-02-24T19:35:49Z

Azure Pipelines successfully started running 4 pipeline(s).

@korbinian-mechlem-snkeos

### Description This PR fixes: * An out-of-bounds write in CUDA Resize for LINEAR mode when running trilinear paths (3D/5D) * A race condition for the reduction kernel ### Root cause 1. The temporary dims-mapping buffer for LINEAR mode was sized using only H+W, while the trilinear coordinate mapping kernel writes D+H+W entries. 2. shared-memory race in the block-level reduction loop inside [reduction_functions.cu](vscode-file://vscode-app/c:/Users/lukas.folle/AppData/Local/Programs/Microsoft%20VS%20Code/072586267e/resources/app/out/vs/code/electron-browser/workbench/workbench.html). The condition allowed threads outside the active lower half to update shared memory in the same stride phase, creating overlapping read/write hazards My colleague @korbinian-mechlem-snkeos noticed this warning from compute-sanitzer > ========= Invalid __global__ write of size 4 bytes ========= at void onnxruntime::cuda::_ResizeTrilinearCoordinateMapping<float, onnxruntime::cuda::TransformCoordinate_HALF_PIXEL>(long long, long long, long long, long long, long long, long long, float, float, float, float, float, float, float, float, float, unsigned long long, bool, const T2 &, onnxruntime::cuda::LinearMappingInfo *)+0x400 ========= by thread (17,0,0) in block (2,0,0) ========= Address 0xb28fff7cc is out of bounds ========= and is 205 bytes after the nearest allocation at 0xb28fff400 of size 768 bytes ========= Saved host backtrace up to driver entry point at kernel launch time AND > ========= Warning: Race reported between Read access at void onnxruntime::cuda::detail::reduce_matrix_columns_kernel<float, float, float, onnxruntime::cuda::Identity, onnxruntime::cuda::Identity, (bool)0>(int, int, const T1 *, T2 *, T3 *, int *)+0xe80 ========= and Write access at void onnxruntime::cuda::detail::reduce_matrix_columns_kernel<float, float, float, onnxruntime::cuda::Identity, onnxruntime::cuda::Identity, (bool)0>(int, int, const T1 *, T2 *, T3 *, int *)+0xea0 [337920 hazards] ### Motivation and Context Update LINEAR buffer size calculation to: * use H+W for bilinear (2D/4D) * use D+H+W for trilinear (3D/5D) Prevents invalid global writes and intermittent CUDA memory errors in trilinear resize workloads. @johannes-rehm-snkeos

This cherry-picks the following commits for the release: | Commit ID | PR Number | Commit Title | |-----------|-----------|-------------| | decd177 | #27090 | Fix GatherND division by zero when batch dimensions mismatch | | 55f8234 | #27360 | Fix QMoE CPU Operator | | df9146f | #27403 | [MLAS] Adding DynamicQGemm function pointers and ukernel interface | | 0f93853 | #27318 | [js/web] Use embedded WASM module in Blob URL workers when wasmBinary is provided | | b2a6e69 | #27364 | QMoE CPU Performance Update (Up to 4x on 4-bit) | | f501e1d | #27413 | Fix refcount bug in map input conversion that caused shutdown segfault | | b32b205 | #27421 | Fix error where bytes is not assigned for dynamic qgemm pack b size | | 426b006 | #27397 | Fix DllImportResolver | | 0982844 | #27412 | MatmulNBits prepacking scales fix | | 9afb0d2 | #27430 | Fix validation for external data paths for models loaded from bytes | | 71d2cd0 | #27401 | Enable Python 3.14 CI and Upgrade Dependencies | | 79e0676 | #27419 | fix: out of bounds access for resize operation | | 82eb99c | #27459 | Fix SkipLayerNorm fusion incorrectly applied when gamma/beta are not 1D | | 355278a | #27444 | Fix GatherCopyData Integer Truncation Leading to Heap Out-of-Bounds Read/Write | | cf96123 | #27411 | [web] fix usage of wasmBinary together with a blob URL for .mjs | | 1131a86 | #27399 | [web] remove the unhelpful "Unknown CPU vendor" warning. | | ffbbc4f | #27316 | Build Windows ARM64X binaries as part of packaging pipeline | --------- Signed-off-by: Jonathan Clohessy <Jonathan.Clohessy@arm.com> Co-authored-by: patryk-kaiser-ARM <patryk.kaiser@arm.com> Co-authored-by: don <70039285+0-don@users.noreply.github.com> Co-authored-by: Jonathan Clohessy <jonathan.clohessy@arm.com> Co-authored-by: Hariharan Seshadri <shariharan91@gmail.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Adrian Lizarraga <adlizarraga@microsoft.com> Co-authored-by: Lukas Folle <126877803+lukas-folle-snkeos@users.noreply.github.com> Co-authored-by: Chi Lo <54722500+chilo-ms@users.noreply.github.com> Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com> Co-authored-by: Chaya <cha182350@gmail.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com> Co-authored-by: Erik <erscor@microsoft.com> Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>

fix: out of bounds access for resize operation

22f6c28

lukas-folle-snkeos marked this pull request as ready for review February 23, 2026 10:10

lukas-folle-snkeos added 2 commits February 23, 2026 11:21

removed warning

c5909c8

removed unecessary newline

702ecad

lukas-folle-snkeos added 3 commits February 24, 2026 07:34

formatted

e234898

added unit test

2fd92cb

added fix for reduction

3e56a31

tianleiwu requested a review from Copilot February 24, 2026 19:16

Copilot started reviewing on behalf of tianleiwu February 24, 2026 19:17 View session

Copilot AI reviewed Feb 24, 2026

View reviewed changes

tianleiwu approved these changes Feb 24, 2026

View reviewed changes

tianleiwu added the release:1.24.3 label Feb 24, 2026

tianleiwu merged commit 79e0676 into microsoft:main Feb 25, 2026
91 of 92 checks passed

tianleiwu mentioned this pull request Feb 27, 2026

ORT 1.24.3 release cherry pick round 1 #27476

Merged

tianleiwu removed the release:1.24.3 label Feb 28, 2026

This was referenced Mar 9, 2026

Bump Microsoft.ML.OnnxRuntime.Gpu from 1.23.2 to 1.24.3 yuniko-software/bge-m3-onnx#66

Closed

deps(nuget): Bump the microsoft-packages group with 2 updates Ellerbach/azure-ai-search-simulator#73

Closed

dependabot bot mentioned this pull request Mar 16, 2026

deps(nuget): Bump the microsoft-packages group with 8 updates Ellerbach/azure-ai-search-simulator#76

Closed

tianleiwu mentioned this pull request Mar 16, 2026

ORT 1.24.4 release cherry pick round 1 #27682

Merged

dependabot bot mentioned this pull request Apr 6, 2026

deps(nuget): Bump Microsoft.AspNetCore.Authentication.JwtBearer and 10 others Ellerbach/azure-ai-search-simulator#92

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: out of bounds access for resize operation#27419

fix: out of bounds access for resize operation#27419
tianleiwu merged 6 commits intomicrosoft:mainfrom
lukas-folle-snkeos:main

lukas-folle-snkeos commented Feb 23, 2026 •

edited

Loading

Uh oh!

lukas-folle-snkeos commented Feb 23, 2026

Uh oh!

xadupre commented Feb 23, 2026

Uh oh!

lukas-folle-snkeos commented Feb 24, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

tianleiwu commented Feb 24, 2026

Uh oh!

azure-pipelines bot commented Feb 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

lukas-folle-snkeos commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Root cause

Motivation and Context

Uh oh!

lukas-folle-snkeos commented Feb 23, 2026

Uh oh!

xadupre commented Feb 23, 2026

Uh oh!

lukas-folle-snkeos commented Feb 24, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

tianleiwu commented Feb 24, 2026

Uh oh!

azure-pipelines bot commented Feb 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

lukas-folle-snkeos commented Feb 23, 2026 •

edited

Loading