ORT 1.24.0 release cherry pick round 3 by tianleiwu · Pull Request #27168 · microsoft/onnxruntime

tianleiwu · 2026-01-27T05:49:36Z

Commit	Commit Title	Author
`11dde2d9e`	[NV TensorRT RTX EP] Fix external tensorrt_plugins load path (#26814)	keshavv27
`080d96818`	Move model compatibility checks ahead of session initialization (#27037)	adrastogi
`ec4f6bfa1`	[QNN EP] Fix error messages being logged as VERBOSE instead of ERROR (#24931)	Copilot
`0432e7125`	perftest: support plugin eps for compile_ep_context (#27121)	Jaskaran Singh Nagi
`727db0d3d`	Engine compatibility validity API implementation (#26774)	umangb-09
`27013522f`	Deprecate transformers model examples (#27156)	Jambay Kinley
`f83d4d06e`	[QNN-EP] Implement file mapped weights feature (#26952)	quic-calvnguy

### Description  Load tensorrt_plugin library from EP library location instead of runtime library location. ### Motivation and Context  Plugin library was being searched for in core ONNX Runtime library path, and not the EP library path. These paths are separate in case of WinML workflow.

### Description  The current infrastructure for validating compatibility of a precompiled model does the check after session initialization occurs, which turns out to be quite costly. The check should ideally happen beforehand, to short-circuit those expensive operations. ### Motivation and Context  This change will make it more tractable for applications to rely on the existing session machinery to check compatibility of any of their models. Co-authored-by: Aditya Rastogi <adityar@ntdev.microsoft.com>

…24931) ## Problem QNN error messages were being logged at VERBOSE level instead of ERROR level, making them invisible unless verbose logging was enabled. Users would only see unhelpful generic error messages like: ``` Failed to finalize QNN graph. Error code: 1002 at location qnn_model.cc:167 FinalizeGraphs ``` But the actual detailed error messages from QNN were hidden in verbose logs: ``` tcm_migration.cc:2088:ERROR:Operator named q::*InputSlicePad (0x1654900000002) not sufficiently tiled to fit in TCM. Requires 12441600 bytes graph_prepare.cc:2808:ERROR:Graph prepare TCM Migration action failed graph_prepare.cc:2868:ERROR:Graph prepare failed during optimization with err: 17, Fatal Optimize ``` ## Root Cause The `QnnLogging` callback function in `qnn_backend_manager.cc` was ignoring the `level` parameter from QNN and hardcoding all messages as `kVERBOSE` severity: ```cpp void QnnLogging(const char* format, QnnLog_Level_t level, uint64_t timestamp, va_list argument_parameter) { ORT_UNUSED_PARAMETER(level); // ❌ Ignoring the actual log level // ... const auto severity = ::onnxruntime::logging::Severity::kVERBOSE; // ❌ Hardcoded as VERBOSE ``` ## Solution Modified the `QnnLogging` function to properly map QNN log levels to appropriate ORT severity levels: - `QNN_LOG_LEVEL_ERROR` → `logging::Severity::kERROR` ✅ **Key fix** - `QNN_LOG_LEVEL_WARN` → `logging::Severity::kWARNING` - `QNN_LOG_LEVEL_INFO` → `logging::Severity::kINFO` - `QNN_LOG_LEVEL_VERBOSE/DEBUG` → `logging::Severity::kVERBOSE` ## Changes Made 1. **Modified `QnnLogging` function**: Removed hardcoded `kVERBOSE` and added proper level mapping 2. **Added `MapQNNLogLevelToOrtSeverity` function**: For potential future reuse 3. **Minimal and surgical changes**: Only 37 lines added, 2 removed ## Impact QNN error messages will now appear as ERROR-level logs in normal logging output, making debugging much easier for users without requiring verbose logging to be enabled. Fixes #24876. --- 💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more [Copilot coding agent tips](https://gh.io/copilot-coding-agent-tips) in the docs. --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: vraspar <51386888+vraspar@users.noreply.github.com> Co-authored-by: yuslepukhin <11303988+yuslepukhin@users.noreply.github.com> Co-authored-by: Dmitri Smirnov <yuslepukhin@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Dmitri Smirnov <dmitrism@microsoft.com>

* Extend compile_ep_context to also support plugin eps * Adds compile_only option to skip execution, can be used when compiling for virtual devices compile_ep_context (physical device) <img width="1259" height="510" alt="image" src="https://github.com/user-attachments/assets/14650c17-0c8a-4002-a7ce-e8e4c815a516" /> compile_ep_context + compile_only (virtual device) <img width="1262" height="173" alt="image" src="https://github.com/user-attachments/assets/2f0844cc-5e83-4b2d-bf0a-0d815d9bad29" />

Added support for engine validation check for EP Context models. ### Motivation and Context We wanted to implement the GetModelCompatibilityForEpDevices() API support and thus have an end user available API for the engine validation check for EP context models. Added this support and the necessary function implementation

### Description Models with corresponding Olive recipes are deprecated. ### Motivation and Context Olive and Olive-recipes is the entry point for model optimization. We want onnxruntime to be only for runtime. So, deprecating examples that are already present in olive recipes.

Description Enables the file mapping of weights as well as the overall context bin. This feature is currently only enabled for ARM64 WIN devices Motivation and Context Currently, when reading the context bin, ORT allocates a large buffer on the heap. Assuming the same model is used, each ORT session will allocate a buffer for the context bin. This is incredibly wasteful when large models are used. Instead, WIN file mapping can be leveraged to map the context bin, then every time a context needs to be created with the context bin, the pointer to the context bin can be retrieved and used instead of some pre-allocated buffer, thus making QNN EP more memory-efficient. In the case of multiple ORT sessions, the context bin will only be loaded once for all sessions, increasing memory efficiency and overall initialization performance. This is very useful regarding the use of LLMs going forward. --------- Co-authored-by: quic_calvnguy <quic_calvnguy@quic_inc.com>

adrianlizarraga

Look good. I checked all except PR 27156

keshavv27 and others added 7 commits January 26, 2026 21:46

tianleiwu requested review from adrianlizarraga, chilo-ms, jambayk, skottmckay and yuslepukhin and removed request for chilo-ms January 27, 2026 05:49

adrianlizarraga approved these changes Jan 27, 2026

View reviewed changes

edgchen1 approved these changes Jan 27, 2026

View reviewed changes

tianleiwu merged commit 50d4c84 into rel-1.24.0 Jan 27, 2026
75 of 78 checks passed

tianleiwu deleted the tlwu/rel-1.24.0_cherry_pick_round3 branch January 27, 2026 18:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ORT 1.24.0 release cherry pick round 3#27168

ORT 1.24.0 release cherry pick round 3#27168
tianleiwu merged 7 commits intorel-1.24.0from
tlwu/rel-1.24.0_cherry_pick_round3

tianleiwu commented Jan 27, 2026

Uh oh!

adrianlizarraga left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

Conversation

tianleiwu commented Jan 27, 2026

Uh oh!

adrianlizarraga left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants