ORT 1.23.5 Cherry Picks by adrastogi · Pull Request #27716 · microsoft/onnxruntime

adrastogi · 2026-03-17T21:36:38Z

This cherry-picks the following commits for the release:

[VitisAI]add tensor type bool #26434 [VitisAI]add tensor type bool
[VitisAI EP] Fix error in graph resolving #26452 [VitisAI EP] Fix error in graph resolving
[VitisAI] Enable ort::logger usage in compile_onnx_model_vitisai_ep_v4 #26487 [VitisAI] Enable ort::logger usage in compile_onnx_model_vitisai_ep_v4
[VitisAI] Remove unused function body handling in graph fusion #26519 [VitisAI] Remove unused function body handling in graph fusion
[VitisAI] Add External EP Loader #26627 [VitisAI] Add External EP Loader
[VitisAI] Add support compiled model compatibility information retrieval and validation #26699 [VitisAI] Add support compiled model compatibility information retrieval and validation
Remove s_kernel_registry_vitisaiep.reset() in deinitialize_vitisai_ep() #27295 Remove s_kernel_registry_vitisaiep.reset() in deinitialize_vitisai_ep()
Add/Update telemetry events #27356 Add/Update telemetry events
Add PE version info to onnxruntime_providers_vitisai.dll #27626 Add PE version info to onnxruntime_providers_vitisai.dll
Fix integer division by zero crash in CPU EP Div operator #27693 Fix integer division by zero crash in CPU EP Div operator
Fix overflow in DmlGraphFusionHelper::ProcessInputData #27815 Fix overflow in DmlGraphFusionHelper::ProcessInputData
Fix new-delete mismatch in DML EP's QuantizeLinear operator #27823 Fix new-delete mismatch in DML EP's QuantizeLinear operator

### Description  Add tensor proto type for bool ### Motivation and Context  Found a new model with bool data type Co-authored-by: Yueqing Zhang <yueqingz@amd.com>

### Description  Use the updated subgraph when cloning the model. ### Motivation and Context When a node in model contains a subgraph attribute, onnxruntime will optimize the subgraph. If the updated subgraph is not used in the cloned model, it may cause errors during graph resolving.

#26487) ### Description Enable the use of ort::logger safely across DLL boundaries for the VitisAI execution provider. ### Motivation and Context Resolves issue PUID 1194096. This enables compile_onnx_model_vitisai_ep_v4 to use ort::logger properly.

### Description Vitis AI EP graph_fuse memory optimize. This PR removes unused function body handling code in the VitisAI graph fusion implementation. ### Changes - Removed unused function body handling in graph fusion (`onnxruntime/core/providers/vitisai/imp/graph.cc`) ### Context The function body handling code in the graph fusion logic was not being used and can be safely removed to simplify the implementation.

## [VitisAI] Add External EP Loader ### Description This PR introduces a dynamic external execution provider loading mechanism for the VitisAI execution provider, enabling runtime loading of alternative execution providers through a plugin-style architecture. ### Key Changes #### 1. **New External EP Library Infrastructure** (`global_api.cc`) - Added `ExternalEpLibaray` class to dynamically load external execution provider libraries at runtime - Implemented complete library lifecycle management (loading, unloading, symbol resolution) - Added global registry (`g_external_ep_libaries`) with caching to avoid redundant library loading - Created `CreateExecutionProviderFromAnotherEp()` function to instantiate execution providers from external libraries **Implementation Details:** - **Simplified symbol resolution**: Only resolves the essential `GetProvider` symbol (required) - **Removed optional symbols**: No longer attempts to resolve `CreateEpFactories` or `RyzenAI_SetSessionOptions` - Lazy initialization pattern with `Ensure()` method - Safe cleanup with `Clear()` method and proper error handling - Platform-agnostic library loading using `LIBRARY_PREFIX` and `LIBRARY_EXTENSION` macros #### 2. **API Extension** (`global_api.h`) - Declared new public function: `CreateExecutionProviderFromAnotherEp()` - Added required includes: - `core/framework/execution_provider.h` for `IExecutionProvider` interface - `<memory>` for smart pointer support #### 3. **Factory Integration** (`vitisai_provider_factory.cc`) - Integrated external EP loading into the VitisAI provider factory workflow - Added provider option check for `external_ep_libray` key - **Logic Flow**: 1. Check if `external_ep_libray` option is specified 2. If yes, load and return the external execution provider 3. If no, create and return standard VitisAI execution provider Co-authored-by: Yueqing Zhang <yueqingz@amd.com>

…val and validation (#26699) ### Description Adds support for compiled model compatibility information retrieval and validation in the VitisAI EP. This enables runtime validation of compiled models against the execution environment to prevent failures and provide clear compatibility feedback. **Key Changes:** - Implemented `GetCompiledModelCompatibilityInfo` to collect and serialize compatibility metadata during model compilation - Added `ValidateCompiledModelCompatibilityInfo` to validate compatibility at runtime against the current environment ### Motivation and Context Compiled models may fail at runtime due to missing backend plugins, version mismatches, or hardware platform differences. The ONNXRuntime add 2 API for support compiled model compatibility validation system . Ref PRs: #25841 #25749 This PR implements a compatibility validation system for Vitis AI EP that: - Detects incompatibilities before model loading to prevent runtime failures - Enables cross-version compatibility checking between different EP versions - Provides clear feedback through specific compatibility status codes - Maintains backward compatibility with legacy EPs

…() (#27295) Remove unnecessary s_kernel_registry_vitisaiep.reset() call in deinitialize_vitisai_ep() function. The kernel registry will be repopulated on next initialization, making this reset redundant.

ModelLoadStart/End - InferenceSession::LoadWithLoader, InferenceSession::LoadOrtModelWithLoader SessionCreationEnd - InferenceSession::Initialize RegisterEpLibraryWithLibPath, RegisterEpLibraryStart/End - Environment::RegisterExecutionProviderLibrary Update: RuntimePerf event is triggered more frequently with exponential backoff. It is also now triggered from ~InferenceSession() to log data in the tail. To better measure health --------- Co-authored-by: Darshak Bhatti <dabhatti@micorsoft.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

### Description Add a Windows VERSIONINFO resource (.rc file) for the Vitis AI provider DLL, following the same pattern used for CUDA, TensorRT, and QNN EPs (added in #24606). This embeds the ORT version into the DLL's PE header so it shows up in file properties. ### Motivation and Context Need version in onnxruntime_providers_vitisai.dll to track changes. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

dabhattimsft

### Description Add a pre-check for zero values in the divisor tensor for integral types in `Div<T>`. Returns an error `Status` instead of hitting undefined behavior (SIGFPE / structured exception). - **`element_wise_ops.h`**: When the divisor is a constant initializer, `TryGetConstantInput` validates for zeros once at kernel creation time in the constructor, avoiding per-`Compute` overhead. A `divisor_is_validated_constant_` flag tracks whether the one-time check was performed. - **`element_wise_ops.cc`**: `if constexpr (std::is_integral<T>::value)` guard scans non-constant divisors before calling `UntypedBroadcastTwo`, skipping the check when the constant was already validated. Compiled away for float/double/half — zero cost for non-integer paths. - **`element_wise_ops_test.cc`**: Added `Div_int8_by_zero`, `Div_int32_by_zero`, `Div_int64_by_zero_scalar` tests covering tensor and scalar divisor cases, plus `Div_int32_by_zero_constant_initializer` to exercise the `TryGetConstantInput` constructor path with `is_initializer = true`. ### Motivation and Context Integer division by zero is UB in C++ and causes a hardware exception that crashes the process. Float types produce inf/NaN naturally, but int8/int16/int32/int64/uint* types do not. This was reported via Chromium (https://issues.chromium.org/issues/491835014) with a trivial repro: `tensor<int8> / scalar(0)`.  <details> <summary>Original prompt</summary> > > ---- > > *This section details on the original issue you should resolve* > > <issue_title>int8 / 0 exception not caught for cpu ep</issue_title> > <issue_description>See https://issues.chromium.org/issues/491835014. > > Repro: > a=tensor<int8> > b=tensor<int8>, ie a scalar that is 0 > model that does a/b > > Stack trace: > ``` > onnxruntime.dll!Eigen::internal::scalar_quotient_op<signed char,signed char>::operator()(const char &) Line 437 C++ > [Inline Frame] onnxruntime.dll!Eigen::internal::binary_evaluator<Eigen::CwiseBinaryOp<Eigen::internal::scalar_quotient_op<signed char,signed char>,Eigen::CwiseNullaryOp<Eigen::internal::scalar_constant_op<signed char>,Eigen::Array<signed char,-1,1,0,-1,1> const> const ,Eigen::ArrayWrapper<Eigen::Map<Eigen::Matrix<signed char,-1,1,0,-1,1> const ,0,Eigen::Stride<0,0>>> const>,Eigen::internal::IndexBased,Eigen::internal::IndexBased,signed char,signed char>::coeff(__int64) Line 910 C++ > ... > [Inline Frame] onnxruntime.dll!Eigen::internal::Assignment<Eigen::Map<Eigen::Matrix<signed char,-1,1,0,-1,1>,0,Eigen::Stride<0,0>>,Eigen::CwiseBinaryOp<Eigen::internal::scalar_quotient_op<signed char,signed char>,Eigen::CwiseNullaryOp<Eigen::internal::scalar_constant_op<signed char>,Eigen::Array<signed char,-1,1,0,-1,1> const> const ,Eigen::ArrayWrapper<Eigen::Map<Eigen::Matrix<signed char,-1,1,0,-1,1> const ,0,Eigen::Stride<0,0>>> const>,Eigen::internal::assign_op<signed char,signed char>,Eigen::internal::Dense2Dense,void>::run(Eigen::Map<Eigen::Matrix<signed char,-1,1,0,-1,1>,0,Eigen::Stride<0,0>> &) Line 855 C++ > [Inline Frame] onnxruntime.dll!Eigen::internal::call_assignment_no_alias(Eigen::Map<Eigen::Matrix<signed char,-1,1,0,-1,1>,0,Eigen::Stride<0,0>> &) Line 797 C++ > [Inline Frame] onnxruntime.dll!Eigen::internal::call_assignment(Eigen::Map<Eigen::Matrix<signed char,-1,1,0,-1,1>,0,Eigen::Stride<0,0>> &) Line 768 C++ > [Inline Frame] onnxruntime.dll!Eigen::internal::call_assignment(Eigen::Map<Eigen::Matrix<signed char,-1,1,0,-1,1>,0,Eigen::Stride<0,0>> &) Line 750 C++ > [Inline Frame] onnxruntime.dll!Eigen::MatrixBase<Eigen::Map<Eigen::Matrix<signed char,-1,1,0,-1,1>,0,Eigen::Stride<0,0>>>::operator=(const Eigen::DenseBase<Eigen::CwiseBinaryOp<Eigen::internal::scalar_quotient_op<signed char,signed char>,Eigen::CwiseNullaryOp<Eigen::internal::scalar_constant_op<signed char>,Eigen::Array<signed char,-1,1,0,-1,1> const> const ,Eigen::ArrayWrapper<Eigen::Map<Eigen::Matrix<signed char,-1,1,0,-1,1> const ,0,Eigen::Stride<0,0>>> const>> &) Line 59 C++ > [Inline Frame] onnxruntime.dll!onnxruntime::Div<signed char>::Compute::__l2::<lambda_998187df037dec36fd0905b4142c682e>::operator()(onnxruntime::BroadcastHelper &) Line 685 C++ > onnxruntime.dll!<lambda_998187df037dec36fd0905b4142c682e>::<lambda_invoker_cdecl>(onnxruntime::BroadcastHelper & per_iter_bh) Line 686 C++ > [External Code] > [Inline Frame] onnxruntime.dll!std::_Func_class<void,__int64,__int64>::operator()(__int64 <_Args_0>, __int64 <_Args_1>) Line 926 C++ > onnxruntime.dll!onnxruntime::concurrency::ThreadPool::ParallelFor(__int64 n, const onnxruntime::TensorOpCost & c, const std::function<void __cdecl(__int64,__int64)> & f) Line 628 C++ > onnxruntime.dll!onnxruntime::concurrency::ThreadPool::TryParallelFor(onnxruntime::concurrency::ThreadPool * tp, __int64 total, const onnxruntime::TensorOpCost & cost_per_unit, const std::function<void __cdecl(__int64,__int64)> & fn) Line 705 C++ > onnxruntime.dll!onnxruntime::ParallelizeSingleSpan<onnxruntime::BroadcastHelper>(onnxruntime::BroadcastHelper & helper, const onnxruntime::ProcessBroadcastSpanFuncs & functors) Line 955 C++ > onnxruntime.dll!onnxruntime::BroadcastLooper<onnxruntime::BroadcastHelper>(onnxruntime::BroadcastHelper & helper, const onnxruntime::ProcessBroadcastSpanFuncs & functors) Line 1006 C++ > onnxruntime.dll!onnxruntime::UntypedBroadcastTwo(onnxruntime::OpKernelContext & context, const onnxruntime::ProcessBroadcastSpanFuncs & funcs, double unit_cost, void * user_data) Line 2305 C++ > onnxruntime.dll!onnxruntime::Div<signed char>::Compute(onnxruntime::OpKernelContext * context) Line 695 C++ > > ``` > </issue_description> > > ## Comments on the Issue (you are @copilot in this section) > > <comments> > </comments> > </details>  - Fixes #27686  --- 📱 Kick off Copilot coding agent tasks wherever you are with [GitHub Mobile](https://gh.io/cca-mobile-docs), available on iOS and Android. --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com> Co-authored-by: skottmckay <979079+skottmckay@users.noreply.github.com> Co-authored-by: Tianlei Wu <tlwu@microsoft.com>

### Description  DmlOperatorQuantization21 was missing the tensor reshaping logic that the older DmlOperatorElementwiseQLinear already had. Scalar scale tensors get padded to 4D, but a 5D input stays 5D. DML rejects the dimension mismatch with E_INVALIDARG, and the resulting exception unwind triggers a sized-delete bug in WRL's MakeAllocator which address sanitizer detects. The fix is to port the same logic from the DmlOperatorElementwiseQLinear into this path, so that the dimensions match. ### Motivation and Context  This is required to ensure the DML EP correctly handles this scenario. --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

This change tries to address a problem in the DML EP where AlignToPow2 rounded up tensorByteSize to a 4-byte boundary before the data was read from the source buffer. This caused CreateCpuResource, CreateResource, WriteToFile, and the inputRawData vector construction to read 1–3 bytes past the end of the original tensor data. CreateResource and CreateCpuResource already independently align the D3D12 resource descriptor size, so they work correctly with the original (unaligned) byte count. The fix is to move the alignment to the location where it's needed.  This is required because it addresses a crash / incorrect behavior in the DML EP.

Use Tensor::CalculateTensorStorageSize instead of DataType()->Size() * shape.Size() to correctly compute buffer size for sub-byte types (e.g., Int4). Partial backport of #27547. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Skip setup-python on arm64 (azurelinux 3.0 compat) - Runner pool mms -> latest (CUDA, TensorRT) - CUDA SDK 12.2 -> 12.8 (CUDA, TensorRT) - TensorRT 10.9.0.34 -> 10.14.1.48 - Add --nvcc_threads 1 to prevent OOM - setup-build-tools v0.0.9 -> v0.0.12 - vcpkg 2025.06.13 -> 2025.08.27 - Add job_name/job_identifier inputs for reusable workflows - GitHub Actions version bumps (checkout v6, setup-node v6, setup-java v5, cache v5, setup-dotnet v5, upload-artifact v6, download-artifact v7, setup-python v6) - locate-vcvarsall vcpkg version update - MacOS timeout increase 90 -> 120 min Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The v0.0.12 actions inject ccache usage and have other changes that are incompatible with the Docker images and build scripts on this release branch. Keep v0.0.9 for run-build-script-in-docker, build-docker-image, and setup-build-tools. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

1. Linux CPU Minimal Build: Upgrade setup-build-tools from v0.0.9 to v0.0.12 (SHA 8bad63a3) with ccache support. The v0.0.12 build actions (build-and-prep-ort-files, build-minimal-ort-and-run-tests, run-build-script-in-docker) all hardcode ccache invocations internally, so ccache MUST be installed by setup-build-tools. Added actions/cache steps for ccache and vcpkg directories across all jobs. 2. Web CI Pipeline (WASM): Same setup-build-tools upgrade plus added --use_cache to common_build_args. Without ccache, WASM builds compile from scratch each time, causing 30+ min timeouts. 3. CUDA Builds: Cherry-pick test/build fixes from main commit 311b4a6 (PR #26267) needed for CUDA 12.8 compatibility: - Disable YOLO v3/v4 and MobilenetV1 model tests (cuDNN frontend cannot find engine plan with cuDNN 9.8) - Switch from --relocatable-device-code=true to --static-global-template-stub=false (faster builds) - Fix typeid(T).name() build error in gather_block_quantized test Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The hash had an extra 'b' character causing SHA512 verification failure when downloading ccache. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Restore build-docker-image action for Job 5 (Build Extended Minimal) which was accidentally replaced with run-build-script-in-docker - Migrate QNN CI from decommissioned mms pool to latest Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

adrastogi · 2026-03-31T16:59:49Z

/azp run

azure-pipelines · 2026-03-31T17:00:36Z

Azure Pipelines successfully started running 6 pipeline(s).

adrastogi · 2026-03-31T17:03:01Z

/azp run

azure-pipelines · 2026-03-31T17:03:40Z

Azure Pipelines successfully started running 6 pipeline(s).

adrastogi requested review from Kevin-Taha, dabhattimsft and guschmue March 19, 2026 17:30

BoarQing and others added 9 commits March 19, 2026 14:00

Remove s_kernel_registry_vitisaiep.reset() in deinitialize_vitisai_ep…

dc82215

…() (#27295) Remove unnecessary s_kernel_registry_vitisaiep.reset() call in deinitialize_vitisai_ep() function. The kernel registry will be repopulated on next initialization, making this reset redundant.

adrastogi force-pushed the adrastogi/rel-1.23.5-cherrypicks branch from b9dd9a8 to 57d6b59 Compare March 19, 2026 22:26

guschmue previously approved these changes Mar 20, 2026

View reviewed changes

dabhattimsft previously approved these changes Mar 20, 2026

View reviewed changes

adrastogi dismissed stale reviews from dabhattimsft and guschmue via 1652d92 March 20, 2026 20:37

dabhattimsft previously approved these changes Mar 24, 2026

View reviewed changes

adrastogi and others added 3 commits March 29, 2026 20:08

adrastogi dismissed dabhattimsft’s stale review via 7c7269f March 30, 2026 03:51

adrastogi and others added 5 commits March 29, 2026 22:36

Fix ccache SHA512 hash typo in setup-build-tools inputs

2770be6

The hash had an extra 'b' character causing SHA512 verification failure when downloading ccache. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

adrastogi requested a review from guschmue March 31, 2026 00:12

guschmue approved these changes Mar 31, 2026

View reviewed changes

Kevin-Taha approved these changes Mar 31, 2026

View reviewed changes

adrastogi merged commit 840c8d7 into rel-1.23.5 Mar 31, 2026
58 of 82 checks passed

adrastogi deleted the adrastogi/rel-1.23.5-cherrypicks branch March 31, 2026 20:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ORT 1.23.5 Cherry Picks#27716

ORT 1.23.5 Cherry Picks#27716
adrastogi merged 18 commits intorel-1.23.5from
adrastogi/rel-1.23.5-cherrypicks

adrastogi commented Mar 17, 2026 •

edited

Loading

Uh oh!

dabhattimsft left a comment

Uh oh!

adrastogi commented Mar 31, 2026

Uh oh!

azure-pipelines bot commented Mar 31, 2026

Uh oh!

adrastogi commented Mar 31, 2026

Uh oh!

azure-pipelines bot commented Mar 31, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants

Conversation

adrastogi commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dabhattimsft left a comment

Choose a reason for hiding this comment

Uh oh!

adrastogi commented Mar 31, 2026

Uh oh!

azure-pipelines bot commented Mar 31, 2026

Uh oh!

adrastogi commented Mar 31, 2026

Uh oh!

azure-pipelines bot commented Mar 31, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants

adrastogi commented Mar 17, 2026 •

edited

Loading