ORT 1.24.3 release cherry pick round 4 by tianleiwu · Pull Request #27558 · microsoft/onnxruntime

tianleiwu · 2026-03-05T02:38:05Z

This cherry-picks the following commits for the release:

Commit ID	PR Number	Commit Title
`d5387d8`	#27192	Avoid repetitive creation of fp4/fp8 native-custom-op domains for NvTensorRtRtx EP
`0b9906a`	#27454	Suppress spurious Array Out of Bounds warnings produced by GCC 14.2 compiler on Linux builds
`4a80b0b`	#27471	Fix double-free in TRT EP custom op domain Release functions
`c7c939f`	#27499	Fix -Warray-bounds build error in MLAS on clang 17+
`f99dcca`	#27514	[Build] Fix pybind11 vcpkg configuration
`ef04b10`	#27518	[CXX Lora] Prevent heap OOB from maliciously crafted Lora Adapters.
`0b2b6d0`	#27288	[NvTensorRTRTX EP]: Add missing override specifiers to suppress warnings
`c1d8f5c`	#27522	Add "library_path" metadata entry to OrtEpDevice instances for plugin and provider bridge EPs
`fdead1c`	#27537	Account for ORT_NO_EXCEPTIONS builds in Lora test
`3d1365e`	#27521	increase kMaxValueLength to 8192
`df8f4a7`	#27535	Add OrtEnv.DisableDllImportResolver to prevent fatal error on resolver conflict
`bdd672a`	#27356	Add/Update telemetry events
`2da1a30`	#27543	Fix RoiAlign heap out-of-bounds read via unchecked batch_indices
`5c3f544`	#27466	DQ→MatMulNBits fusion transformer for NvTensorRtRtx ep

…ensorRtRtx EP (#27192) ### Description - Avoid repetitive creation of FP4/FP8 native custom-ops in create method for custom-op domains (leaving plugin-based custom-op handling as is, as it was before native-custom-ops addition in [PR-26555](#26555)). - Avoid deleting the custom-op domains at destructor time, since those are created with static scope, so avoid potential double-delete. ### Motivation and Context - Repetitive checks and creation of custom-ops domain is redundant. So, cleaning it up a bit. - Explicit deletion of static objects in destructor can lead to double-delete. So, avoiding it.

…ompiler on Linux builds (#27454) ### Description Suppress spurious Array Out of Bounds warnings produced by GCC 14.2 compiler on Linux builds ### Motivation and Context Linux build fails when compiled with GCC 14.2 due to spurious Array Out of Bounds warnings (Warnings Treated as Errors)

@tianleiwu

### Description Apply the same double-free fix from NvTensorRtRtx EP ([PR #27192](#27192)) to the TRT EP. `CreateTensorRTCustomOpDomainList()` owns domains/ops via static `unique_ptr`s, but `ReleaseTensorRTCustomOpDomain()` was manually `delete`-ing the same objects through raw pointers — double-free at program exit. - `ReleaseTensorRTCustomOpDomain()` → no-op (static `unique_ptr`s own the lifetime) - `ReleaseTensorRTCustomOpDomainList()` → `clear()` the reference vector only - Added ownership comments to static members matching NvTensorRtRtx EP style ### Motivation and Context PR #27192 review ([thread](#27192 (comment))) identified TRT EP has the identical bug pattern that was fixed in NvTensorRtRtx EP. The TRT EP code was the original source this pattern was borrowed from. @tianleiwu noted a follow-up PR was needed.  --- 🔒 GitHub Advanced Security automatically protects Copilot coding agent pull requests. You can protect all pull requests by enabling Advanced Security for your repositories. [Learn more about Advanced Security.](https://gh.io/cca-advanced-security) --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com>

### Description The `-Warray-bounds` suppression pragma in `sqnbitgemm_kernel_avx2_int8_blklen32.h` was gated on `defined(HAS_ARRAY_BOUNDS)`, which is set in `onnxruntime_config.h`. MLAS never includes that header, so the guard was dead code and the pragma never fired. Changed the guard to `#ifdef __clang__`: ```cpp // Before: HAS_ARRAY_BOUNDS never defined in MLAS TU #if defined(__clang__) && defined(HAS_ARRAY_BOUNDS) // After #ifdef __clang__ ``` Note: `__has_warning("-Warray-bounds")` was considered but the C preprocessor does not short-circuit `&&`, so GCC fails to parse it even behind `defined(__clang__)`. ### Motivation and Context Build fails on Intel Mac with Apple Clang 17.0.0 (`-Werror,-Warray-bounds`). Clang raises a false-positive array-bounds warning on `acc[4..7]` inside an `if constexpr (NCols4 == 8)` branch that is dead when `NCols4 == 4`.  <details> <summary>Original prompt</summary> > > ---- > > *This section details on the original issue you should resolve* > > <issue_title>[Build] error: array index 4 is past the end of the array (that has type '__m256[4]') [-Werror,-Warray-bounds]</issue_title> > <issue_description>### Describe the issue > > Unable to build from main branch (0768f42 as of time writing this issue) on Intel Mac > > ``` > /usr/bin/c++ --version > Apple clang version 17.0.0 (clang-1700.0.13.5) > Target: x86_64-apple-darwin24.5.0 > Thread model: posix > InstalledDir: /Library/Developer/CommandLineTools/usr/bin > ``` > > > ### Urgency > > _No response_ > > ### Target platform > > MacOS > > ### Build script > > ./build.sh --config RelWithDebInfo --build_shared_lib --parallel --cmake_extra_defines CMAKE_OSX_ARCHITECTURES=x86_64 > > ### Error / output > > [ 18%] Building CXX object CMakeFiles/onnxruntime_mlas.dir/onnxruntime/onnxruntime/core/mlas/lib/sqnbitgemm_kernel_avx2.cpp.o > In file included from /onnxruntime/onnxruntime/core/mlas/lib/sqnbitgemm_kernel_avx2.cpp:26: > /onnxruntime/onnxruntime/core/mlas/lib/sqnbitgemm_kernel_avx2_int8_blklen32.h:1677:49: error: array index 4 is past the end of the array (that has type '__m256[4]') [-Werror,-Warray-bounds] > 1677 | __m128 acc_1 = FoldAccumulators(acc[4], acc[5], acc[6], acc[7]); > | ^ ~ > /onnxruntime/onnxruntime/core/mlas/lib/sqnbitgemm_kernel_avx2_int8_blklen32.h:1531:13: note: array 'acc' declared here > 1531 | __m256 acc[NCols4]; > | ^ > /onnxruntime/onnxruntime/core/mlas/lib/sqnbitgemm_kernel_avx2_int8_blklen32.h:1677:57: error: array index 5 is past the end of the array (that has type '__m256[4]') [-Werror,-Warray-bounds] > 1677 | __m128 acc_1 = FoldAccumulators(acc[4], acc[5], acc[6], acc[7]); > | ^ ~ > /onnxruntime/onnxruntime/core/mlas/lib/sqnbitgemm_kernel_avx2_int8_blklen32.h:1531:13: note: array 'acc' declared here > 1531 | __m256 acc[NCols4]; > | ^ > /onnxruntime/onnxruntime/core/mlas/lib/sqnbitgemm_kernel_avx2_int8_blklen32.h:1677:65: error: array index 6 is past the end of the array (that has type '__m256[4]') [-Werror,-Warray-bounds] > 1677 | __m128 acc_1 = FoldAccumulators(acc[4], acc[5], acc[6], acc[7]); > | ^ ~ > /onnxruntime/onnxruntime/core/mlas/lib/sqnbitgemm_kernel_avx2_int8_blklen32.h:1531:13: note: array 'acc' declared here > 1531 | __m256 acc[NCols4]; > | ^ > /onnxruntime/onnxruntime/core/mlas/lib/sqnbitgemm_kernel_avx2_int8_blklen32.h:1677:73: error: array index 7 is past the end of the array (that has type '__m256[4]') [-Werror,-Warray-bounds] > 1677 | __m128 acc_1 = FoldAccumulators(acc[4], acc[5], acc[6], acc[7]); > | ^ ~ > /onnxruntime/onnxruntime/core/mlas/lib/sqnbitgemm_kernel_avx2_int8_blklen32.h:1531:13: note: array 'acc' declared here > 1531 | __m256 acc[NCols4]; > | ^ > 4 errors generated. > > ### Visual Studio Version > > _No response_ > > ### GCC / Compiler Version > > Apple clang version 17.0.0 (clang-1700.0.13.5)</issue_description> > > <agent_instructions>Please investigate the build error. If code need fix, create a pull requests. Otherwise, suggest ways to avoid the build errors.</agent_instructions> > > ## Comments on the Issue (you are @copilot in this section) > > <comments> > </comments> > </details>  - Fixes #27497  --- 🔒 GitHub Advanced Security automatically protects Copilot coding agent pull requests. You can protect all pull requests by enabling Advanced Security for your repositories. [Learn more about Advanced Security.](https://gh.io/cca-advanced-security) --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com>

There is build error using `--use_vcpkg` without `--use_vcpkg_ms_internal_asset_cache`, the error is like: ``` C:\code\onnxruntime\cmake\./vcpkg-ports\pybind11: info: installing overlay port from here Downloading https://github.com/pybind/pybind11/archive/v3.0.2.tar.gz -> pybind-pybind11-v3.0.2.tar.gz pybind-pybind11-v3.0.2.tar.gz.33772.part: error: download from https://github.com/pybind/pybind11/archive/v3.0.2.tar.gz had an unexpected hash note: Expected: 786b1bf534ac67a8d5669f8babf67bb13e48b3a3da1b6344e43ae10a84b80bbc8fea5f12a65fd18739c341fefef5622c5dc096db964dff33cc62ea4259b2e2c1 note: Actual : 19bee2c76320e25202ee078b5680ff8a7acfb33494dec29dad984ab04de8bcb01340d9fec37c8cc5ac9015dfc367e60312dcd8506e66ce8f0af4c49db562ddef CMake Error at scripts/cmake/vcpkg_download_distfile.cmake:136 (message): Download failed, halting portfile. ``` The root cause is that I uploaded zip file to cache server. Without `--use_vcpkg_ms_internal_asset_cache`, vcpkg will try download tar.gz file from github, and the SHA is different from the one of zip file. In this PR, I configure the portfile to download zip file to avoid the issue.

…27518) ### Description  Detect and test mismatch between raw data size and declared data type and shape of the lora adapter parameter. ### Motivation and Context  Disallow maliciously crafted lora adapters leading to heap OOB. --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

…ngs (#27288) ### Description  Compilation on Clang toolchains on Linux currently fails due to this warning (among others) since ONNX runtime compiles with -Werror by default. We address `-Winconsistent-missing-override` with this PR in TRT NV EP. ``` /home/stephan/projects/onnxruntime-winai/onnxruntime/core/providers/nv_tensorrt_rtx/nv_execution_provider.h:309:7: error: 'GetDeviceId' overrides a member function but is not marked 'override' [-Werror,-Winconsistent-missing-override] 309 | int GetDeviceId() const { return device_id_; } | ^ /home/stephan/projects/onnxruntime-winai/include/onnxruntime/core/framework/execution_provider.h:183:15: note: overridden virtual function is here 183 | virtual int GetDeviceId() const { return default_device_.Id(); } | ^ In file included from /home/stephan/projects/onnxruntime-winai/onnxruntime/core/providers/nv_tensorrt_rtx/nv_provider_factory.cc:18: /home/stephan/projects/onnxruntime-winai/onnxruntime/core/providers/nv_tensorrt_rtx/nv_execution_provider.h:310:10: error: 'Sync' overrides a member function but is not marked 'override' [-Werror,-Winconsistent-missing-override] 310 | Status Sync() const; | ^ /home/stephan/projects/onnxruntime-winai/include/onnxruntime/core/framework/execution_provider.h:231:26: note: overridden virtual function is here 231 | virtual common::Status Sync() const { return Status::OK(); } | ^ /home/stephan/projects/onnxruntime-winai/onnxruntime/core/providers/nv_tensorrt_rtx/nv_provider_factory.cc:63:39: error: 'CreateProvider' overrides a member function but is not marked 'override' [-Werror,-Winconsistent-missing-override] 63 | std::unique_ptr<IExecutionProvider> CreateProvider(const OrtSessionOptions& session_options, | ^ /home/stephan/projects/onnxruntime-winai/include/onnxruntime/core/providers/providers.h:29:47: note: overridden virtual function is here 29 | virtual std::unique_ptr<IExecutionProvider> CreateProvider(const OrtSessionOptions& session_options, | ^ /home/stephan/projects/onnxruntime-winai/onnxruntime/core/providers/nv_tensorrt_rtx/nv_provider_factory.cc:112:46: error: 'CreateExecutionProviderFactory' overrides a member function but is not marked 'override' [-Werror,-Winconsistent-missing-override] 112 | std::shared_ptr<IExecutionProviderFactory> CreateExecutionProviderFactory(const void* param) { | ^ /home/stephan/projects/onnxruntime-winai/onnxruntime/core/providers/shared_library/provider_host_api.h:19:54: note: overridden virtual function is here 19 | virtual std::shared_ptr<IExecutionProviderFactory> CreateExecutionProviderFactory(const void* /*provider_options*/) { return nullptr; /home/stephan/projects/onnxruntime-winai/onnxruntime/core/providers/nv_tensorrt_rtx/nv_execution_provider.h:309:7: error: 'GetDeviceId' overrides a member function but is not marked 'override' [-Werror,-Winconsistent-missing-override] 309 | int GetDeviceId() const { return device_id_; } | ^ /home/stephan/projects/onnxruntime-winai/include/onnxruntime/core/framework/execution_provider.h:183:15: note: overridden virtual function is here 183 | virtual int GetDeviceId() const { return default_device_.Id(); } ``` ### Motivation and Context  Fixing clang warnings enables builds with clang on Linux since `-Werror` enforces warning-free builds.

… and provider bridge EPs (#27522) ### Description  Set "library_path" metadata entry in OrtEpDevice instances for plugin and provider bridge EPs. ### Motivation and Context  Make available everywhere. Required by GenAI to load custom ops library. #27496

### Description  Non required builds fail because Lora Tests use `ASSERT_THROW` while RTTI is disabled. Followup: #27518

### Motivation and Context Change this because for 32B model like Qwen2.5-coder-32B in TRTRTX ep, there is a long string in GenAI https://github.com/microsoft/onnxruntime-genai/blob/3c47932e9d7afa0d44db0b3918e479bbdd4c5353/src/models/model.cpp#L516 Example ``` AddConfigEntry: ep.nvtensorrtrtxexecutionprovider.nv_profile_min_shapes (length=4364) = input_ids:1x1,attention_mask:1x1,past_key_values.0.key:1x8x0x128,past_key_values.0.value:1x8x0x128,past_key_values.1.key:1x8x0x128,past_key_values.1.value:1x8x0x128,past_key_values.2.key:1x8x0x128,past_key_values.2.value:1x8x0x128,past_key_values.3.key:1x8x0x128,past_key_values.3.value:1x8x0x128,past_key_values.4.key:1x8x0x128,past_key_values.4.value:1x8x0x128,past_key_values.5.key:1x8x0x128,past_key_values.5.value:1x8x0x128,past_key_values.6.key:1x8x0x128,past_key_values.6.value:1x8x0x128,past_key_values.7.key:1x8x0x128,past_key_values.7.value:1x8x0x128,past_key_values.8.key:1x8x0x128,past_key_values.8.value:1x8x0x128,past_key_values.9.key:1x8x0x128,past_key_values.9.value:1x8x0x128,past_key_values.10.key:1x8x0x128,past_key_values.10.value:1x8x0x128,past_key_values.11.key:1x8x0x128,past_key_values.11.value:1x8x0x128,past_key_values.12.key:1x8x0x128,past_key_values.12.value:1x8x0x128,past_key_values.13.key:1x8x0x128,past_key_values.13.value:1x8x0x128,past_key_values.14.key:1x8x0x128,past_key_values.14.value:1x8x0x128,past_key_values.15.key:1x8x0x128,past_key_values.15.value:1x8x0x128,past_key_values.16.key:1x8x0x128,past_key_values.16.value:1x8x0x128,past_key_values.17.key:1x8x0x128,past_key_values.17.value:1x8x0x128,past_key_values.18.key:1x8x0x128,past_key_values.18.value:1x8x0x128,past_key_values.19.key:1x8x0x128,past_key_values.19.value:1x8x0x128,past_key_values.20.key:1x8x0x128,past_key_values.20.value:1x8x0x128,past_key_values.21.key:1x8x0x128,past_key_values.21.value:1x8x0x128,past_key_values.22.key:1x8x0x128,past_key_values.22.value:1x8x0x128,past_key_values.23.key:1x8x0x128,past_key_values.23.value:1x8x0x128,past_key_values.24.key:1x8x0x128,past_key_values.24.value:1x8x0x128,past_key_values.25.key:1x8x0x128,past_key_values.25.value:1x8x0x128,past_key_values.26.key:1x8x0x128,past_key_values.26.value:1x8x0x128,past_key_values.27.key:1x8x0x128,past_key_values.27.value:1x8x0x128,past_key_values.28.key:1x8x0x128,past_key_values.28.value:1x8x0x128,past_key_values.29.key:1x8x0x128,past_key_values.29.value:1x8x0x128,past_key_values.30.key:1x8x0x128,past_key_values.30.value:1x8x0x128,past_key_values.31.key:1x8x0x128,past_key_values.31.value:1x8x0x128,past_key_values.32.key:1x8x0x128,past_key_values.32.value:1x8x0x128,past_key_values.33.key:1x8x0x128,past_key_values.33.value:1x8x0x128,past_key_values.34.key:1x8x0x128,past_key_values.34.value:1x8x0x128,past_key_values.35.key:1x8x0x128,past_key_values.35.value:1x8x0x128,past_key_values.36.key:1x8x0x128,past_key_values.36.value:1x8x0x128,past_key_values.37.key:1x8x0x128,past_key_values.37.value:1x8x0x128,past_key_values.38.key:1x8x0x128,past_key_values.38.value:1x8x0x128,past_key_values.39.key:1x8x0x128,past_key_values.39.value:1x8x0x128,past_key_values.40.key:1x8x0x128,past_key_values.40.value:1x8x0x128,past_key_values.41.key:1x8x0x128,past_key_values.41.value:1x8x0x128,past_key_values.42.key:1x8x0x128,past_key_values.42.value:1x8x0x128,past_key_values.43.key:1x8x0x128,past_key_values.43.value:1x8x0x128,past_key_values.44.key:1x8x0x128,past_key_values.44.value:1x8x0x128,past_key_values.45.key:1x8x0x128,past_key_values.45.value:1x8x0x128,past_key_values.46.key:1x8x0x128,past_key_values.46.value:1x8x0x128,past_key_values.47.key:1x8x0x128,past_key_values.47.value:1x8x0x128,past_key_values.48.key:1x8x0x128,past_key_values.48.value:1x8x0x128,past_key_values.49.key:1x8x0x128,past_key_values.49.value:1x8x0x128,past_key_values.50.key:1x8x0x128,past_key_values.50.value:1x8x0x128,past_key_values.51.key:1x8x0x128,past_key_values.51.value:1x8x0x128,past_key_values.52.key:1x8x0x128,past_key_values.52.value:1x8x0x128,past_key_values.53.key:1x8x0x128,past_key_values.53.value:1x8x0x128,past_key_values.54.key:1x8x0x128,past_key_values.54.value:1x8x0x128,past_key_values.55.key:1x8x0x128,past_key_values.55.value:1x8x0x128,past_key_values.56.key:1x8x0x128,past_key_values.56.value:1x8x0x128,past_key_values.57.key:1x8x0x128,past_key_values.57.value:1x8x0x128,past_key_values.58.key:1x8x0x128,past_key_values.58.value:1x8x0x128,past_key_values.59.key:1x8x0x128,past_key_values.59.value:1x8x0x128,past_key_values.60.key:1x8x0x128,past_key_values.60.value:1x8x0x128,past_key_values.61.key:1x8x0x128,past_key_values.61.value:1x8x0x128,past_key_values.62.key:1x8x0x128,past_key_values.62.value:1x8x0x128,past_key_values.63.key:1x8x0x128,past_key_values.63.value:1x8x0x128 Traceback (most recent call last): File "Convert to NVIDIA TRT for RTX_32B\test_config.py", line 2, in <module> model = og.Model("Convert to NVIDIA TRT for RTX_32B\\model") ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ RuntimeError: Config value is longer than maximum length: 4096 ``` --------- Co-authored-by: hualxie <hualxie@microsoft.com>

…r conflict (#27535) ## Description `NativeLibrary.SetDllImportResolver()` can only be called once per assembly. If a host application registers its own `DllImportResolver` for the ONNX Runtime assembly before any ORT API is used, ORT's internal call to `SetDllImportResolver` in the `NativeMethods` static constructor throws an `InvalidOperationException`, which surfaces as a fatal `TypeInitializationException` — making ONNX Runtime completely unusable. This PR adds two complementary safeguards: 1. **`try/catch(InvalidOperationException)`** around the `SetDllImportResolver` call in `NativeMethods..cctor`, so that if a resolver is already registered, ORT logs a diagnostic trace and continues normally. 2. **`OrtEnv.DisableDllImportResolver`** — a public static `bool` property that allows callers to explicitly opt out of ORT's resolver registration before any ORT type is accessed. This is useful when the host application needs full control over native library resolution. ### Usage ```csharp // Option 1: Opt out before any ORT usage OrtEnv.DisableDllImportResolver = true; NativeLibrary.SetDllImportResolver(typeof(OrtEnv).Assembly, MyCustomResolver); var env = OrtEnv.Instance(); // Option 2: Do nothing — if a resolver is already registered, // ORT catches the conflict and continues using the existing resolver. ``` ## Motivation and Context When a library client has already called `SetDllImportResolver()` for the ORT assembly (e.g., to handle platform-specific library loading), ORT's attempt to register its own resolver causes a fatal, unrecoverable error. This change makes ORT resilient to this scenario and gives clients explicit control. ## Changes ### `csharp/src/Microsoft.ML.OnnxRuntime/OrtEnv.shared.cs` - Added `public static bool DisableDllImportResolver` property with XML documentation and usage example. ### `csharp/src/Microsoft.ML.OnnxRuntime/NativeMethods.shared.cs` - Wrapped `NativeLibrary.SetDllImportResolver` in `if (!OrtEnv.DisableDllImportResolver)` guard. - Added `try/catch(InvalidOperationException)` with a `Trace.WriteLine` diagnostic message. ### `csharp/test/Microsoft.ML.OnnxRuntime.Tests.Common/OrtEnvTests.cs` - Added `OrtEnvExternalDllImportResolverTest` test class with two tests using `AssemblyLoadContext` for process-level isolation of static constructor behavior: - **`TestExternalResolverRegisteredFirst`** — Registers an external resolver FIRST, then initializes ORT. Verifies the `try/catch` prevents a fatal error and ORT remains fully functional (`GetVersionString()` succeeds). - **`TestDisableDllImportResolverWorks`** — Sets `DisableDllImportResolver = true`, initializes ORT, then registers an external resolver. Verifies no `InvalidOperationException` is thrown, proving ORT correctly skipped its own registration.

### Description ModelLoadStart/End - InferenceSession::LoadWithLoader, InferenceSession::LoadOrtModelWithLoader SessionCreationEnd - InferenceSession::Initialize RegisterEpLibraryWithLibPath, RegisterEpLibraryStart/End - Environment::RegisterExecutionProviderLibrary Update: RuntimePerf event is triggered more frequently with exponential backoff. It is also now triggered from ~InferenceSession() to log data in the tail. ### Motivation and Context To better measure health --------- Co-authored-by: Darshak Bhatti <dabhatti@micorsoft.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

) # Fix RoiAlign heap out-of-bounds read via unchecked batch_indices ## Description Add value-range validation for `batch_indices` in the RoiAlign operator to prevent out-of-bounds heap reads from maliciously crafted ONNX models. `CheckROIAlignValidInput()` previously validated tensor shapes but never checked that the **values** in `batch_indices` fall within `[0, batch_size)`. An attacker could supply `batch_indices` containing values exceeding the batch dimension of the input tensor `X`, causing the kernel to read arbitrary heap memory at: - **CPU:** `roialign.cc:212` — `roi_batch_ind` used as unchecked index into `bottom_data` - **CUDA:** `roialign_impl.cu:109` — `batch_indices_ptr[n]` used as unchecked index into `bottom_data` on GPU ## Impact - **Vulnerability type:** Heap out-of-bounds read - **Impact:** Arbitrary heap memory read, potential information disclosure, program crash - **Trigger:** Construct `batch_indices` with values ≥ `batch_size` or < 0 - **Affected providers:** CPU and CUDA (both call `CheckROIAlignValidInput()`) ## Changes ### `onnxruntime/core/providers/cpu/object_detection/roialign.cc` - Added per-element bounds check in `CheckROIAlignValidInput()`: each `batch_indices[i]` must satisfy `0 <= value < X.shape[0]` - Returns `INVALID_ARGUMENT` with a descriptive error message on violation - Guarded by `batch_indices_ptr->Location().device.Type() == OrtDevice::CPU` so it only runs when the tensor data is host-accessible (CPU EP and CropAndResize). For the CUDA EP, `batch_indices` lives in GPU memory and cannot be safely dereferenced on the host. ### `onnxruntime/test/providers/cpu/object_detection/roialign_test.cc` - Added `BatchIndicesOutOfRange` test: `batch_indices={1}` with `batch_size=1` (exercises `>= batch_size` path) - Added `BatchIndicesNegative` test: `batch_indices={-1}` (exercises `< 0` path) ## Known Limitation The CUDA execution path is **not** protected by this bounds check because `batch_indices` is a GPU tensor and cannot be read on the host. Adding a device-side bounds check would require passing `batch_size` into the CUDA kernel — this is tracked as a follow-up. Note: Using `.InputMemoryType(OrtMemTypeCPUInput, 2)` was considered but rejected because it would force a GPU→CPU transfer of `batch_indices`, breaking CUDA graph capture for models like Masked R-CNN where `batch_indices` is produced by upstream GPU ops. ## Validation - Full `RoiAlignTest.*` suite passes (12/12 tests) on CPU build - Full `RoiAlignTest.*` suite passes (12/12 tests) on CUDA build - No regressions in existing positive or negative tests

## Summary Generalize the WebNN-specific DequantizeLinear → MatMulNBits graph fusion transformer so it can be reused by other execution providers (e.g. NvTensorRTRTX), and add defensive shape/size validation to prevent crashes on malformed tensors. ### Fusion patterns **Pattern 1:** `DequantizeLinear → Reshape → Transpose → [Cast] → MatMul/Gemm` → **MatMulNBits** **Pattern 2:** `DequantizeLinear (axis=0) → MatMul/Gemm` → **MatMulNBits** --------- Co-authored-by: praneshgo <227579474+praneshgo@users.noreply.github.com>

vishalpandya1990 and others added 14 commits March 4, 2026 16:04

Account for ORT_NO_EXCEPTIONS builds in Lora test (#27537)

dfa3fbc

### Description  Non required builds fail because Lora Tests use `ASSERT_THROW` while RTTI is disabled. Followup: #27518

edgchen1 approved these changes Mar 5, 2026

View reviewed changes

tianleiwu enabled auto-merge (squash) March 5, 2026 04:27

kunal-vaishnavi approved these changes Mar 5, 2026

View reviewed changes

tianleiwu merged commit 3a728b7 into rel-1.24.3 Mar 5, 2026
94 of 128 checks passed

tianleiwu deleted the tlwu/rel-1.24.3_cherrypick_round4 branch March 5, 2026 04:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ORT 1.24.3 release cherry pick round 4#27558

ORT 1.24.3 release cherry pick round 4#27558
tianleiwu merged 14 commits intorel-1.24.3from
tlwu/rel-1.24.3_cherrypick_round4

tianleiwu commented Mar 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

12 participants

Conversation

tianleiwu commented Mar 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

12 participants