Skip to content

ORT 1.23.5 Cherry Picks#27716

Merged
adrastogi merged 18 commits intorel-1.23.5from
adrastogi/rel-1.23.5-cherrypicks
Mar 31, 2026
Merged

ORT 1.23.5 Cherry Picks#27716
adrastogi merged 18 commits intorel-1.23.5from
adrastogi/rel-1.23.5-cherrypicks

Conversation

@adrastogi
Copy link
Copy Markdown
Contributor

@adrastogi adrastogi commented Mar 17, 2026

This cherry-picks the following commits for the release:

BoarQing and others added 9 commits March 19, 2026 14:00
### Description
<!-- Describe your changes. -->
Add tensor proto type for bool


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Found a new model with bool data type

Co-authored-by: Yueqing Zhang <yueqingz@amd.com>
### Description
<!-- Describe your changes. -->

Use the updated subgraph when cloning the model.

### Motivation and Context

When a node in model contains a subgraph attribute, onnxruntime will
optimize the subgraph. If the updated subgraph is not used in the cloned
model, it may cause errors during graph resolving.
#26487)

### Description
Enable the use of ort::logger safely across DLL boundaries for the
VitisAI execution provider.

### Motivation and Context
Resolves issue PUID 1194096. This enables
compile_onnx_model_vitisai_ep_v4 to use ort::logger properly.
### Description
Vitis AI EP graph_fuse memory optimize.
This PR removes unused function body handling code in the VitisAI graph
fusion implementation.

### Changes
- Removed unused function body handling in graph fusion
(`onnxruntime/core/providers/vitisai/imp/graph.cc`)

### Context
The function body handling code in the graph fusion logic was not being
used and can be safely removed to simplify the implementation.
## [VitisAI] Add External EP Loader

### Description

This PR introduces a dynamic external execution provider loading
mechanism for the VitisAI execution provider, enabling runtime loading
of alternative execution providers through a plugin-style architecture.

### Key Changes

#### 1. **New External EP Library Infrastructure** (`global_api.cc`)
- Added `ExternalEpLibaray` class to dynamically load external execution
provider libraries at runtime
- Implemented complete library lifecycle management (loading, unloading,
symbol resolution)
- Added global registry (`g_external_ep_libaries`) with caching to avoid
redundant library loading
- Created `CreateExecutionProviderFromAnotherEp()` function to
instantiate execution providers from external libraries

**Implementation Details:**
- **Simplified symbol resolution**: Only resolves the essential
`GetProvider` symbol (required)
- **Removed optional symbols**: No longer attempts to resolve
`CreateEpFactories` or `RyzenAI_SetSessionOptions`
- Lazy initialization pattern with `Ensure()` method
- Safe cleanup with `Clear()` method and proper error handling
- Platform-agnostic library loading using `LIBRARY_PREFIX` and
`LIBRARY_EXTENSION` macros

#### 2. **API Extension** (`global_api.h`)
- Declared new public function: `CreateExecutionProviderFromAnotherEp()`
- Added required includes:
- `core/framework/execution_provider.h` for `IExecutionProvider`
interface
  - `<memory>` for smart pointer support

#### 3. **Factory Integration** (`vitisai_provider_factory.cc`)
- Integrated external EP loading into the VitisAI provider factory
workflow
- Added provider option check for `external_ep_libray` key
- **Logic Flow**:
  1. Check if `external_ep_libray` option is specified
  2. If yes, load and return the external execution provider
  3. If no, create and return standard VitisAI execution provider

Co-authored-by: Yueqing Zhang <yueqingz@amd.com>
…val and validation (#26699)

### Description

Adds support for compiled model compatibility information retrieval and
validation in the VitisAI EP. This enables runtime validation of
compiled models against the execution environment to prevent failures
and provide clear compatibility feedback.

**Key Changes:**
- Implemented `GetCompiledModelCompatibilityInfo` to collect and
serialize compatibility metadata during model compilation
- Added `ValidateCompiledModelCompatibilityInfo` to validate
compatibility at runtime against the current environment

### Motivation and Context
Compiled models may fail at runtime due to missing backend plugins,
version mismatches, or hardware platform differences.
The ONNXRuntime add 2 API for support compiled model compatibility
validation system . Ref PRs:
    #25841
    #25749
    

This PR implements a compatibility validation system for Vitis AI EP
that:

- Detects incompatibilities before model loading to prevent runtime
failures
- Enables cross-version compatibility checking between different EP
versions
- Provides clear feedback through specific compatibility status codes
- Maintains backward compatibility with legacy EPs
…() (#27295)

Remove unnecessary s_kernel_registry_vitisaiep.reset() call in
deinitialize_vitisai_ep() function. The kernel registry will be
repopulated on next initialization, making this reset redundant.
ModelLoadStart/End - InferenceSession::LoadWithLoader,
InferenceSession::LoadOrtModelWithLoader
SessionCreationEnd - InferenceSession::Initialize
RegisterEpLibraryWithLibPath, RegisterEpLibraryStart/End -
Environment::RegisterExecutionProviderLibrary

Update: RuntimePerf event is triggered more
frequently with exponential backoff.
It is also now triggered from ~InferenceSession() to log data in the
tail.

To better measure health

---------

Co-authored-by: Darshak Bhatti <dabhatti@micorsoft.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
### Description
Add a Windows VERSIONINFO resource (.rc file) for the Vitis AI provider
DLL, following the same pattern used for CUDA, TensorRT, and QNN EPs
(added in #24606). This embeds the ORT version into the DLL's PE header
so it shows up in file properties.

### Motivation and Context
Need version in onnxruntime_providers_vitisai.dll to track changes.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@adrastogi adrastogi force-pushed the adrastogi/rel-1.23.5-cherrypicks branch from b9dd9a8 to 57d6b59 Compare March 19, 2026 22:26
guschmue
guschmue previously approved these changes Mar 20, 2026
dabhattimsft
dabhattimsft previously approved these changes Mar 20, 2026
Copy link
Copy Markdown
Contributor

@dabhattimsft dabhattimsft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

### Description

Add a pre-check for zero values in the divisor tensor for integral types
in `Div<T>`. Returns an error `Status` instead of hitting undefined
behavior (SIGFPE / structured exception).

- **`element_wise_ops.h`**: When the divisor is a constant initializer,
`TryGetConstantInput` validates for zeros once at kernel creation time
in the constructor, avoiding per-`Compute` overhead. A
`divisor_is_validated_constant_` flag tracks whether the one-time check
was performed.
- **`element_wise_ops.cc`**: `if constexpr (std::is_integral<T>::value)`
guard scans non-constant divisors before calling `UntypedBroadcastTwo`,
skipping the check when the constant was already validated. Compiled
away for float/double/half — zero cost for non-integer paths.
- **`element_wise_ops_test.cc`**: Added `Div_int8_by_zero`,
`Div_int32_by_zero`, `Div_int64_by_zero_scalar` tests covering tensor
and scalar divisor cases, plus `Div_int32_by_zero_constant_initializer`
to exercise the `TryGetConstantInput` constructor path with
`is_initializer = true`.

### Motivation and Context

Integer division by zero is UB in C++ and causes a hardware exception
that crashes the process. Float types produce inf/NaN naturally, but
int8/int16/int32/int64/uint* types do not. This was reported via
Chromium (https://issues.chromium.org/issues/491835014) with a trivial
repro: `tensor<int8> / scalar(0)`.

<!-- START COPILOT ORIGINAL PROMPT -->



<details>

<summary>Original prompt</summary>

> 
> ----
> 
> *This section details on the original issue you should resolve*
> 
> <issue_title>int8 / 0 exception not caught for cpu ep</issue_title>
> <issue_description>See https://issues.chromium.org/issues/491835014.
> 
> Repro:
> a=tensor<int8>
> b=tensor<int8>, ie a scalar that is 0
> model that does a/b
> 
> Stack trace:
> ```
> onnxruntime.dll!Eigen::internal::scalar_quotient_op<signed char,signed
char>::operator()(const char &) Line 437      C++
>      [Inline Frame]
onnxruntime.dll!Eigen::internal::binary_evaluator<Eigen::CwiseBinaryOp<Eigen::internal::scalar_quotient_op<signed
char,signed
char>,Eigen::CwiseNullaryOp<Eigen::internal::scalar_constant_op<signed
char>,Eigen::Array<signed char,-1,1,0,-1,1> const> const
,Eigen::ArrayWrapper<Eigen::Map<Eigen::Matrix<signed char,-1,1,0,-1,1>
const ,0,Eigen::Stride<0,0>>>
const>,Eigen::internal::IndexBased,Eigen::internal::IndexBased,signed
char,signed char>::coeff(__int64) Line 910    C++
>  ...
>      [Inline Frame]
onnxruntime.dll!Eigen::internal::Assignment<Eigen::Map<Eigen::Matrix<signed
char,-1,1,0,-1,1>,0,Eigen::Stride<0,0>>,Eigen::CwiseBinaryOp<Eigen::internal::scalar_quotient_op<signed
char,signed
char>,Eigen::CwiseNullaryOp<Eigen::internal::scalar_constant_op<signed
char>,Eigen::Array<signed char,-1,1,0,-1,1> const> const
,Eigen::ArrayWrapper<Eigen::Map<Eigen::Matrix<signed char,-1,1,0,-1,1>
const ,0,Eigen::Stride<0,0>>> const>,Eigen::internal::assign_op<signed
char,signed
char>,Eigen::internal::Dense2Dense,void>::run(Eigen::Map<Eigen::Matrix<signed
char,-1,1,0,-1,1>,0,Eigen::Stride<0,0>> &) Line 855      C++
>      [Inline Frame]
onnxruntime.dll!Eigen::internal::call_assignment_no_alias(Eigen::Map<Eigen::Matrix<signed
char,-1,1,0,-1,1>,0,Eigen::Stride<0,0>> &) Line 797      C++
>      [Inline Frame]
onnxruntime.dll!Eigen::internal::call_assignment(Eigen::Map<Eigen::Matrix<signed
char,-1,1,0,-1,1>,0,Eigen::Stride<0,0>> &) Line 768   C++
>      [Inline Frame]
onnxruntime.dll!Eigen::internal::call_assignment(Eigen::Map<Eigen::Matrix<signed
char,-1,1,0,-1,1>,0,Eigen::Stride<0,0>> &) Line 750   C++
>      [Inline Frame]
onnxruntime.dll!Eigen::MatrixBase<Eigen::Map<Eigen::Matrix<signed
char,-1,1,0,-1,1>,0,Eigen::Stride<0,0>>>::operator=(const
Eigen::DenseBase<Eigen::CwiseBinaryOp<Eigen::internal::scalar_quotient_op<signed
char,signed
char>,Eigen::CwiseNullaryOp<Eigen::internal::scalar_constant_op<signed
char>,Eigen::Array<signed char,-1,1,0,-1,1> const> const
,Eigen::ArrayWrapper<Eigen::Map<Eigen::Matrix<signed char,-1,1,0,-1,1>
const ,0,Eigen::Stride<0,0>>> const>> &) Line 59 C++
>      [Inline Frame] onnxruntime.dll!onnxruntime::Div<signed
char>::Compute::__l2::<lambda_998187df037dec36fd0905b4142c682e>::operator()(onnxruntime::BroadcastHelper
&) Line 685   C++
>
     onnxruntime.dll!<lambda_998187df037dec36fd0905b4142c682e>::<lambda_invoker_cdecl>(onnxruntime::BroadcastHelper
& per_iter_bh) Line 686    C++
>       [External Code]   
>      [Inline Frame]
onnxruntime.dll!std::_Func_class<void,__int64,__int64>::operator()(__int64
<_Args_0>, __int64 <_Args_1>) Line 926    C++
>
     onnxruntime.dll!onnxruntime::concurrency::ThreadPool::ParallelFor(__int64
n, const onnxruntime::TensorOpCost & c, const std::function<void
__cdecl(__int64,__int64)> & f) Line 628  C++
>
     onnxruntime.dll!onnxruntime::concurrency::ThreadPool::TryParallelFor(onnxruntime::concurrency::ThreadPool
* tp, __int64 total, const onnxruntime::TensorOpCost & cost_per_unit,
const std::function<void __cdecl(__int64,__int64)> & fn) Line
705     C++
>
     onnxruntime.dll!onnxruntime::ParallelizeSingleSpan<onnxruntime::BroadcastHelper>(onnxruntime::BroadcastHelper
& helper, const onnxruntime::ProcessBroadcastSpanFuncs & functors) Line
955 C++
>
     onnxruntime.dll!onnxruntime::BroadcastLooper<onnxruntime::BroadcastHelper>(onnxruntime::BroadcastHelper
& helper, const onnxruntime::ProcessBroadcastSpanFuncs & functors) Line
1006      C++
>
     onnxruntime.dll!onnxruntime::UntypedBroadcastTwo(onnxruntime::OpKernelContext
& context, const onnxruntime::ProcessBroadcastSpanFuncs & funcs, double
unit_cost, void * user_data) Line 2305    C++
>      onnxruntime.dll!onnxruntime::Div<signed
char>::Compute(onnxruntime::OpKernelContext * context) Line 695     C++
>       
> ```
> </issue_description>
> 
> ## Comments on the Issue (you are @copilot in this section)
> 
> <comments>
> </comments>
> 


</details>



<!-- START COPILOT CODING AGENT SUFFIX -->

- Fixes #27686

<!-- START COPILOT CODING AGENT TIPS -->
---

📱 Kick off Copilot coding agent tasks wherever you are with [GitHub
Mobile](https://gh.io/cca-mobile-docs), available on iOS and Android.

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com>
Co-authored-by: skottmckay <979079+skottmckay@users.noreply.github.com>
Co-authored-by: Tianlei Wu <tlwu@microsoft.com>
@adrastogi adrastogi dismissed stale reviews from dabhattimsft and guschmue via 1652d92 March 20, 2026 20:37
dabhattimsft
dabhattimsft previously approved these changes Mar 24, 2026
adrastogi and others added 3 commits March 29, 2026 20:08
### Description
<!-- Describe your changes. -->
DmlOperatorQuantization21 was missing the tensor reshaping logic that
the older DmlOperatorElementwiseQLinear already had.

Scalar scale tensors get padded to 4D, but a 5D input stays 5D. DML
rejects the dimension mismatch with E_INVALIDARG, and the resulting
exception unwind triggers a sized-delete bug in WRL's MakeAllocator
which address sanitizer detects. The fix is to port the same logic from
the DmlOperatorElementwiseQLinear into this path, so that the dimensions
match.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
This is required to ensure the DML EP correctly handles this scenario.

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
<!-- Describe your changes. -->
This change tries to address a problem in the DML EP where AlignToPow2
rounded up tensorByteSize to a 4-byte boundary before the data was read
from the source buffer. This caused CreateCpuResource, CreateResource,
WriteToFile, and the inputRawData vector construction to read 1–3 bytes
past the end of the original tensor data.

CreateResource and CreateCpuResource already independently align the
D3D12 resource descriptor size, so they work correctly with the original
(unaligned) byte count. The fix is to move the alignment to the location
where it's needed.

<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
This is required because it addresses a crash / incorrect behavior in
the DML EP.
Use Tensor::CalculateTensorStorageSize instead of DataType()->Size() * shape.Size()
to correctly compute buffer size for sub-byte types (e.g., Int4).

Partial backport of #27547.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
adrastogi and others added 5 commits March 29, 2026 22:36
- Skip setup-python on arm64 (azurelinux 3.0 compat)
- Runner pool mms -> latest (CUDA, TensorRT)
- CUDA SDK 12.2 -> 12.8 (CUDA, TensorRT)
- TensorRT 10.9.0.34 -> 10.14.1.48
- Add --nvcc_threads 1 to prevent OOM
- setup-build-tools v0.0.9 -> v0.0.12
- vcpkg 2025.06.13 -> 2025.08.27
- Add job_name/job_identifier inputs for reusable workflows
- GitHub Actions version bumps (checkout v6, setup-node v6,
  setup-java v5, cache v5, setup-dotnet v5, upload-artifact v6,
  download-artifact v7, setup-python v6)
- locate-vcvarsall vcpkg version update
- MacOS timeout increase 90 -> 120 min

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The v0.0.12 actions inject ccache usage and have other changes
that are incompatible with the Docker images and build scripts
on this release branch. Keep v0.0.9 for run-build-script-in-docker,
build-docker-image, and setup-build-tools.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1. Linux CPU Minimal Build: Upgrade setup-build-tools from v0.0.9 to
   v0.0.12 (SHA 8bad63a3) with ccache support. The v0.0.12 build actions
   (build-and-prep-ort-files, build-minimal-ort-and-run-tests,
   run-build-script-in-docker) all hardcode ccache invocations internally,
   so ccache MUST be installed by setup-build-tools. Added actions/cache
   steps for ccache and vcpkg directories across all jobs.

2. Web CI Pipeline (WASM): Same setup-build-tools upgrade plus added
   --use_cache to common_build_args. Without ccache, WASM builds compile
   from scratch each time, causing 30+ min timeouts.

3. CUDA Builds: Cherry-pick test/build fixes from main commit 311b4a6
   (PR #26267) needed for CUDA 12.8 compatibility:
   - Disable YOLO v3/v4 and MobilenetV1 model tests (cuDNN frontend
     cannot find engine plan with cuDNN 9.8)
   - Switch from --relocatable-device-code=true to
     --static-global-template-stub=false (faster builds)
   - Fix typeid(T).name() build error in gather_block_quantized test

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The hash had an extra 'b' character causing SHA512 verification
failure when downloading ccache.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Restore build-docker-image action for Job 5 (Build Extended Minimal)
  which was accidentally replaced with run-build-script-in-docker
- Migrate QNN CI from decommissioned mms pool to latest

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@adrastogi adrastogi requested a review from guschmue March 31, 2026 00:12
@adrastogi
Copy link
Copy Markdown
Contributor Author

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 6 pipeline(s).

@adrastogi
Copy link
Copy Markdown
Contributor Author

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 6 pipeline(s).

@adrastogi adrastogi merged commit 840c8d7 into rel-1.23.5 Mar 31, 2026
58 of 82 checks passed
@adrastogi adrastogi deleted the adrastogi/rel-1.23.5-cherrypicks branch March 31, 2026 20:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.