ORT 1.24.0 release cherry pick round 3#27168
Merged
tianleiwu merged 7 commits intorel-1.24.0from Jan 27, 2026
Merged
Conversation
### Description <!-- Describe your changes. --> Load tensorrt_plugin library from EP library location instead of runtime library location. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Plugin library was being searched for in core ONNX Runtime library path, and not the EP library path. These paths are separate in case of WinML workflow.
### Description <!-- Describe your changes. --> The current infrastructure for validating compatibility of a precompiled model does the check after session initialization occurs, which turns out to be quite costly. The check should ideally happen beforehand, to short-circuit those expensive operations. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> This change will make it more tractable for applications to rely on the existing session machinery to check compatibility of any of their models. Co-authored-by: Aditya Rastogi <adityar@ntdev.microsoft.com>
…24931) ## Problem QNN error messages were being logged at VERBOSE level instead of ERROR level, making them invisible unless verbose logging was enabled. Users would only see unhelpful generic error messages like: ``` Failed to finalize QNN graph. Error code: 1002 at location qnn_model.cc:167 FinalizeGraphs ``` But the actual detailed error messages from QNN were hidden in verbose logs: ``` tcm_migration.cc:2088:ERROR:Operator named q::*InputSlicePad (0x1654900000002) not sufficiently tiled to fit in TCM. Requires 12441600 bytes graph_prepare.cc:2808:ERROR:Graph prepare TCM Migration action failed graph_prepare.cc:2868:ERROR:Graph prepare failed during optimization with err: 17, Fatal Optimize ``` ## Root Cause The `QnnLogging` callback function in `qnn_backend_manager.cc` was ignoring the `level` parameter from QNN and hardcoding all messages as `kVERBOSE` severity: ```cpp void QnnLogging(const char* format, QnnLog_Level_t level, uint64_t timestamp, va_list argument_parameter) { ORT_UNUSED_PARAMETER(level); // ❌ Ignoring the actual log level // ... const auto severity = ::onnxruntime::logging::Severity::kVERBOSE; // ❌ Hardcoded as VERBOSE ``` ## Solution Modified the `QnnLogging` function to properly map QNN log levels to appropriate ORT severity levels: - `QNN_LOG_LEVEL_ERROR` → `logging::Severity::kERROR` ✅ **Key fix** - `QNN_LOG_LEVEL_WARN` → `logging::Severity::kWARNING` - `QNN_LOG_LEVEL_INFO` → `logging::Severity::kINFO` - `QNN_LOG_LEVEL_VERBOSE/DEBUG` → `logging::Severity::kVERBOSE` ## Changes Made 1. **Modified `QnnLogging` function**: Removed hardcoded `kVERBOSE` and added proper level mapping 2. **Added `MapQNNLogLevelToOrtSeverity` function**: For potential future reuse 3. **Minimal and surgical changes**: Only 37 lines added, 2 removed ## Impact QNN error messages will now appear as ERROR-level logs in normal logging output, making debugging much easier for users without requiring verbose logging to be enabled. Fixes #24876. --- 💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more [Copilot coding agent tips](https://gh.io/copilot-coding-agent-tips) in the docs. --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: vraspar <51386888+vraspar@users.noreply.github.com> Co-authored-by: yuslepukhin <11303988+yuslepukhin@users.noreply.github.com> Co-authored-by: Dmitri Smirnov <yuslepukhin@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Dmitri Smirnov <dmitrism@microsoft.com>
* Extend compile_ep_context to also support plugin eps * Adds compile_only option to skip execution, can be used when compiling for virtual devices compile_ep_context (physical device) <img width="1259" height="510" alt="image" src="https://github.com/user-attachments/assets/14650c17-0c8a-4002-a7ce-e8e4c815a516" /> compile_ep_context + compile_only (virtual device) <img width="1262" height="173" alt="image" src="https://github.com/user-attachments/assets/2f0844cc-5e83-4b2d-bf0a-0d815d9bad29" />
Added support for engine validation check for EP Context models. ### Motivation and Context We wanted to implement the GetModelCompatibilityForEpDevices() API support and thus have an end user available API for the engine validation check for EP context models. Added this support and the necessary function implementation
### Description Models with corresponding Olive recipes are deprecated. ### Motivation and Context Olive and Olive-recipes is the entry point for model optimization. We want onnxruntime to be only for runtime. So, deprecating examples that are already present in olive recipes.
Description Enables the file mapping of weights as well as the overall context bin. This feature is currently only enabled for ARM64 WIN devices Motivation and Context Currently, when reading the context bin, ORT allocates a large buffer on the heap. Assuming the same model is used, each ORT session will allocate a buffer for the context bin. This is incredibly wasteful when large models are used. Instead, WIN file mapping can be leveraged to map the context bin, then every time a context needs to be created with the context bin, the pointer to the context bin can be retrieved and used instead of some pre-allocated buffer, thus making QNN EP more memory-efficient. In the case of multiple ORT sessions, the context bin will only be loaded once for all sessions, increasing memory efficiency and overall initialization performance. This is very useful regarding the use of LLMs going forward. --------- Co-authored-by: quic_calvnguy <quic_calvnguy@quic_inc.com>
adrianlizarraga
approved these changes
Jan 27, 2026
Contributor
adrianlizarraga
left a comment
There was a problem hiding this comment.
Look good. I checked all except PR 27156
edgchen1
approved these changes
Jan 27, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
11dde2d9e080d96818ec4f6bfa10432e7125727db0d3d27013522ff83d4d06e