Merged
Conversation
### Description <!-- Describe your changes. --> Add tensor proto type for bool ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Found a new model with bool data type Co-authored-by: Yueqing Zhang <yueqingz@amd.com>
### Description <!-- Describe your changes. --> Use the updated subgraph when cloning the model. ### Motivation and Context When a node in model contains a subgraph attribute, onnxruntime will optimize the subgraph. If the updated subgraph is not used in the cloned model, it may cause errors during graph resolving.
#26487) ### Description Enable the use of ort::logger safely across DLL boundaries for the VitisAI execution provider. ### Motivation and Context Resolves issue PUID 1194096. This enables compile_onnx_model_vitisai_ep_v4 to use ort::logger properly.
### Description Vitis AI EP graph_fuse memory optimize. This PR removes unused function body handling code in the VitisAI graph fusion implementation. ### Changes - Removed unused function body handling in graph fusion (`onnxruntime/core/providers/vitisai/imp/graph.cc`) ### Context The function body handling code in the graph fusion logic was not being used and can be safely removed to simplify the implementation.
## [VitisAI] Add External EP Loader ### Description This PR introduces a dynamic external execution provider loading mechanism for the VitisAI execution provider, enabling runtime loading of alternative execution providers through a plugin-style architecture. ### Key Changes #### 1. **New External EP Library Infrastructure** (`global_api.cc`) - Added `ExternalEpLibaray` class to dynamically load external execution provider libraries at runtime - Implemented complete library lifecycle management (loading, unloading, symbol resolution) - Added global registry (`g_external_ep_libaries`) with caching to avoid redundant library loading - Created `CreateExecutionProviderFromAnotherEp()` function to instantiate execution providers from external libraries **Implementation Details:** - **Simplified symbol resolution**: Only resolves the essential `GetProvider` symbol (required) - **Removed optional symbols**: No longer attempts to resolve `CreateEpFactories` or `RyzenAI_SetSessionOptions` - Lazy initialization pattern with `Ensure()` method - Safe cleanup with `Clear()` method and proper error handling - Platform-agnostic library loading using `LIBRARY_PREFIX` and `LIBRARY_EXTENSION` macros #### 2. **API Extension** (`global_api.h`) - Declared new public function: `CreateExecutionProviderFromAnotherEp()` - Added required includes: - `core/framework/execution_provider.h` for `IExecutionProvider` interface - `<memory>` for smart pointer support #### 3. **Factory Integration** (`vitisai_provider_factory.cc`) - Integrated external EP loading into the VitisAI provider factory workflow - Added provider option check for `external_ep_libray` key - **Logic Flow**: 1. Check if `external_ep_libray` option is specified 2. If yes, load and return the external execution provider 3. If no, create and return standard VitisAI execution provider Co-authored-by: Yueqing Zhang <yueqingz@amd.com>
…val and validation (#26699) ### Description Adds support for compiled model compatibility information retrieval and validation in the VitisAI EP. This enables runtime validation of compiled models against the execution environment to prevent failures and provide clear compatibility feedback. **Key Changes:** - Implemented `GetCompiledModelCompatibilityInfo` to collect and serialize compatibility metadata during model compilation - Added `ValidateCompiledModelCompatibilityInfo` to validate compatibility at runtime against the current environment ### Motivation and Context Compiled models may fail at runtime due to missing backend plugins, version mismatches, or hardware platform differences. The ONNXRuntime add 2 API for support compiled model compatibility validation system . Ref PRs: #25841 #25749 This PR implements a compatibility validation system for Vitis AI EP that: - Detects incompatibilities before model loading to prevent runtime failures - Enables cross-version compatibility checking between different EP versions - Provides clear feedback through specific compatibility status codes - Maintains backward compatibility with legacy EPs
…() (#27295) Remove unnecessary s_kernel_registry_vitisaiep.reset() call in deinitialize_vitisai_ep() function. The kernel registry will be repopulated on next initialization, making this reset redundant.
ModelLoadStart/End - InferenceSession::LoadWithLoader, InferenceSession::LoadOrtModelWithLoader SessionCreationEnd - InferenceSession::Initialize RegisterEpLibraryWithLibPath, RegisterEpLibraryStart/End - Environment::RegisterExecutionProviderLibrary Update: RuntimePerf event is triggered more frequently with exponential backoff. It is also now triggered from ~InferenceSession() to log data in the tail. To better measure health --------- Co-authored-by: Darshak Bhatti <dabhatti@micorsoft.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
### Description Add a Windows VERSIONINFO resource (.rc file) for the Vitis AI provider DLL, following the same pattern used for CUDA, TensorRT, and QNN EPs (added in #24606). This embeds the ORT version into the DLL's PE header so it shows up in file properties. ### Motivation and Context Need version in onnxruntime_providers_vitisai.dll to track changes. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
b9dd9a8 to
57d6b59
Compare
guschmue
previously approved these changes
Mar 20, 2026
### Description Add a pre-check for zero values in the divisor tensor for integral types in `Div<T>`. Returns an error `Status` instead of hitting undefined behavior (SIGFPE / structured exception). - **`element_wise_ops.h`**: When the divisor is a constant initializer, `TryGetConstantInput` validates for zeros once at kernel creation time in the constructor, avoiding per-`Compute` overhead. A `divisor_is_validated_constant_` flag tracks whether the one-time check was performed. - **`element_wise_ops.cc`**: `if constexpr (std::is_integral<T>::value)` guard scans non-constant divisors before calling `UntypedBroadcastTwo`, skipping the check when the constant was already validated. Compiled away for float/double/half — zero cost for non-integer paths. - **`element_wise_ops_test.cc`**: Added `Div_int8_by_zero`, `Div_int32_by_zero`, `Div_int64_by_zero_scalar` tests covering tensor and scalar divisor cases, plus `Div_int32_by_zero_constant_initializer` to exercise the `TryGetConstantInput` constructor path with `is_initializer = true`. ### Motivation and Context Integer division by zero is UB in C++ and causes a hardware exception that crashes the process. Float types produce inf/NaN naturally, but int8/int16/int32/int64/uint* types do not. This was reported via Chromium (https://issues.chromium.org/issues/491835014) with a trivial repro: `tensor<int8> / scalar(0)`. <!-- START COPILOT ORIGINAL PROMPT --> <details> <summary>Original prompt</summary> > > ---- > > *This section details on the original issue you should resolve* > > <issue_title>int8 / 0 exception not caught for cpu ep</issue_title> > <issue_description>See https://issues.chromium.org/issues/491835014. > > Repro: > a=tensor<int8> > b=tensor<int8>, ie a scalar that is 0 > model that does a/b > > Stack trace: > ``` > onnxruntime.dll!Eigen::internal::scalar_quotient_op<signed char,signed char>::operator()(const char &) Line 437 C++ > [Inline Frame] onnxruntime.dll!Eigen::internal::binary_evaluator<Eigen::CwiseBinaryOp<Eigen::internal::scalar_quotient_op<signed char,signed char>,Eigen::CwiseNullaryOp<Eigen::internal::scalar_constant_op<signed char>,Eigen::Array<signed char,-1,1,0,-1,1> const> const ,Eigen::ArrayWrapper<Eigen::Map<Eigen::Matrix<signed char,-1,1,0,-1,1> const ,0,Eigen::Stride<0,0>>> const>,Eigen::internal::IndexBased,Eigen::internal::IndexBased,signed char,signed char>::coeff(__int64) Line 910 C++ > ... > [Inline Frame] onnxruntime.dll!Eigen::internal::Assignment<Eigen::Map<Eigen::Matrix<signed char,-1,1,0,-1,1>,0,Eigen::Stride<0,0>>,Eigen::CwiseBinaryOp<Eigen::internal::scalar_quotient_op<signed char,signed char>,Eigen::CwiseNullaryOp<Eigen::internal::scalar_constant_op<signed char>,Eigen::Array<signed char,-1,1,0,-1,1> const> const ,Eigen::ArrayWrapper<Eigen::Map<Eigen::Matrix<signed char,-1,1,0,-1,1> const ,0,Eigen::Stride<0,0>>> const>,Eigen::internal::assign_op<signed char,signed char>,Eigen::internal::Dense2Dense,void>::run(Eigen::Map<Eigen::Matrix<signed char,-1,1,0,-1,1>,0,Eigen::Stride<0,0>> &) Line 855 C++ > [Inline Frame] onnxruntime.dll!Eigen::internal::call_assignment_no_alias(Eigen::Map<Eigen::Matrix<signed char,-1,1,0,-1,1>,0,Eigen::Stride<0,0>> &) Line 797 C++ > [Inline Frame] onnxruntime.dll!Eigen::internal::call_assignment(Eigen::Map<Eigen::Matrix<signed char,-1,1,0,-1,1>,0,Eigen::Stride<0,0>> &) Line 768 C++ > [Inline Frame] onnxruntime.dll!Eigen::internal::call_assignment(Eigen::Map<Eigen::Matrix<signed char,-1,1,0,-1,1>,0,Eigen::Stride<0,0>> &) Line 750 C++ > [Inline Frame] onnxruntime.dll!Eigen::MatrixBase<Eigen::Map<Eigen::Matrix<signed char,-1,1,0,-1,1>,0,Eigen::Stride<0,0>>>::operator=(const Eigen::DenseBase<Eigen::CwiseBinaryOp<Eigen::internal::scalar_quotient_op<signed char,signed char>,Eigen::CwiseNullaryOp<Eigen::internal::scalar_constant_op<signed char>,Eigen::Array<signed char,-1,1,0,-1,1> const> const ,Eigen::ArrayWrapper<Eigen::Map<Eigen::Matrix<signed char,-1,1,0,-1,1> const ,0,Eigen::Stride<0,0>>> const>> &) Line 59 C++ > [Inline Frame] onnxruntime.dll!onnxruntime::Div<signed char>::Compute::__l2::<lambda_998187df037dec36fd0905b4142c682e>::operator()(onnxruntime::BroadcastHelper &) Line 685 C++ > onnxruntime.dll!<lambda_998187df037dec36fd0905b4142c682e>::<lambda_invoker_cdecl>(onnxruntime::BroadcastHelper & per_iter_bh) Line 686 C++ > [External Code] > [Inline Frame] onnxruntime.dll!std::_Func_class<void,__int64,__int64>::operator()(__int64 <_Args_0>, __int64 <_Args_1>) Line 926 C++ > onnxruntime.dll!onnxruntime::concurrency::ThreadPool::ParallelFor(__int64 n, const onnxruntime::TensorOpCost & c, const std::function<void __cdecl(__int64,__int64)> & f) Line 628 C++ > onnxruntime.dll!onnxruntime::concurrency::ThreadPool::TryParallelFor(onnxruntime::concurrency::ThreadPool * tp, __int64 total, const onnxruntime::TensorOpCost & cost_per_unit, const std::function<void __cdecl(__int64,__int64)> & fn) Line 705 C++ > onnxruntime.dll!onnxruntime::ParallelizeSingleSpan<onnxruntime::BroadcastHelper>(onnxruntime::BroadcastHelper & helper, const onnxruntime::ProcessBroadcastSpanFuncs & functors) Line 955 C++ > onnxruntime.dll!onnxruntime::BroadcastLooper<onnxruntime::BroadcastHelper>(onnxruntime::BroadcastHelper & helper, const onnxruntime::ProcessBroadcastSpanFuncs & functors) Line 1006 C++ > onnxruntime.dll!onnxruntime::UntypedBroadcastTwo(onnxruntime::OpKernelContext & context, const onnxruntime::ProcessBroadcastSpanFuncs & funcs, double unit_cost, void * user_data) Line 2305 C++ > onnxruntime.dll!onnxruntime::Div<signed char>::Compute(onnxruntime::OpKernelContext * context) Line 695 C++ > > ``` > </issue_description> > > ## Comments on the Issue (you are @copilot in this section) > > <comments> > </comments> > </details> <!-- START COPILOT CODING AGENT SUFFIX --> - Fixes #27686 <!-- START COPILOT CODING AGENT TIPS --> --- 📱 Kick off Copilot coding agent tasks wherever you are with [GitHub Mobile](https://gh.io/cca-mobile-docs), available on iOS and Android. --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com> Co-authored-by: skottmckay <979079+skottmckay@users.noreply.github.com> Co-authored-by: Tianlei Wu <tlwu@microsoft.com>
dabhattimsft
previously approved these changes
Mar 24, 2026
### Description <!-- Describe your changes. --> DmlOperatorQuantization21 was missing the tensor reshaping logic that the older DmlOperatorElementwiseQLinear already had. Scalar scale tensors get padded to 4D, but a 5D input stays 5D. DML rejects the dimension mismatch with E_INVALIDARG, and the resulting exception unwind triggers a sized-delete bug in WRL's MakeAllocator which address sanitizer detects. The fix is to port the same logic from the DmlOperatorElementwiseQLinear into this path, so that the dimensions match. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> This is required to ensure the DML EP correctly handles this scenario. --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
<!-- Describe your changes. --> This change tries to address a problem in the DML EP where AlignToPow2 rounded up tensorByteSize to a 4-byte boundary before the data was read from the source buffer. This caused CreateCpuResource, CreateResource, WriteToFile, and the inputRawData vector construction to read 1–3 bytes past the end of the original tensor data. CreateResource and CreateCpuResource already independently align the D3D12 resource descriptor size, so they work correctly with the original (unaligned) byte count. The fix is to move the alignment to the location where it's needed. <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> This is required because it addresses a crash / incorrect behavior in the DML EP.
Use Tensor::CalculateTensorStorageSize instead of DataType()->Size() * shape.Size() to correctly compute buffer size for sub-byte types (e.g., Int4). Partial backport of #27547. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Skip setup-python on arm64 (azurelinux 3.0 compat) - Runner pool mms -> latest (CUDA, TensorRT) - CUDA SDK 12.2 -> 12.8 (CUDA, TensorRT) - TensorRT 10.9.0.34 -> 10.14.1.48 - Add --nvcc_threads 1 to prevent OOM - setup-build-tools v0.0.9 -> v0.0.12 - vcpkg 2025.06.13 -> 2025.08.27 - Add job_name/job_identifier inputs for reusable workflows - GitHub Actions version bumps (checkout v6, setup-node v6, setup-java v5, cache v5, setup-dotnet v5, upload-artifact v6, download-artifact v7, setup-python v6) - locate-vcvarsall vcpkg version update - MacOS timeout increase 90 -> 120 min Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The v0.0.12 actions inject ccache usage and have other changes that are incompatible with the Docker images and build scripts on this release branch. Keep v0.0.9 for run-build-script-in-docker, build-docker-image, and setup-build-tools. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1. Linux CPU Minimal Build: Upgrade setup-build-tools from v0.0.9 to v0.0.12 (SHA 8bad63a3) with ccache support. The v0.0.12 build actions (build-and-prep-ort-files, build-minimal-ort-and-run-tests, run-build-script-in-docker) all hardcode ccache invocations internally, so ccache MUST be installed by setup-build-tools. Added actions/cache steps for ccache and vcpkg directories across all jobs. 2. Web CI Pipeline (WASM): Same setup-build-tools upgrade plus added --use_cache to common_build_args. Without ccache, WASM builds compile from scratch each time, causing 30+ min timeouts. 3. CUDA Builds: Cherry-pick test/build fixes from main commit 311b4a6 (PR #26267) needed for CUDA 12.8 compatibility: - Disable YOLO v3/v4 and MobilenetV1 model tests (cuDNN frontend cannot find engine plan with cuDNN 9.8) - Switch from --relocatable-device-code=true to --static-global-template-stub=false (faster builds) - Fix typeid(T).name() build error in gather_block_quantized test Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The hash had an extra 'b' character causing SHA512 verification failure when downloading ccache. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Restore build-docker-image action for Job 5 (Build Extended Minimal) which was accidentally replaced with run-build-script-in-docker - Migrate QNN CI from decommissioned mms pool to latest Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
guschmue
approved these changes
Mar 31, 2026
Contributor
Author
|
/azp run |
|
Azure Pipelines successfully started running 6 pipeline(s). |
Contributor
Author
|
/azp run |
|
Azure Pipelines successfully started running 6 pipeline(s). |
Kevin-Taha
approved these changes
Mar 31, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This cherry-picks the following commits for the release: