Skip to content

ORT 1.24.0 release cherry pick round 3#27168

Merged
tianleiwu merged 7 commits intorel-1.24.0from
tlwu/rel-1.24.0_cherry_pick_round3
Jan 27, 2026
Merged

ORT 1.24.0 release cherry pick round 3#27168
tianleiwu merged 7 commits intorel-1.24.0from
tlwu/rel-1.24.0_cherry_pick_round3

Conversation

@tianleiwu
Copy link
Contributor

Commit Commit Title Author
11dde2d9e [NV TensorRT RTX EP] Fix external tensorrt_plugins load path (#26814) keshavv27
080d96818 Move model compatibility checks ahead of session initialization (#27037) adrastogi
ec4f6bfa1 [QNN EP] Fix error messages being logged as VERBOSE instead of ERROR (#24931) Copilot
0432e7125 perftest: support plugin eps for compile_ep_context (#27121) Jaskaran Singh Nagi
727db0d3d Engine compatibility validity API implementation (#26774) umangb-09
27013522f Deprecate transformers model examples (#27156) Jambay Kinley
f83d4d06e [QNN-EP] Implement file mapped weights feature (#26952) quic-calvnguy

keshavv27 and others added 7 commits January 26, 2026 21:46
### Description
<!-- Describe your changes. -->
Load tensorrt_plugin library from EP library location instead of runtime
library location.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Plugin library was being searched for in core ONNX Runtime library path,
and not the EP library path. These paths are separate in case of WinML
workflow.
### Description
<!-- Describe your changes. -->
The current infrastructure for validating compatibility of a precompiled
model does the check after session initialization occurs, which turns
out to be quite costly. The check should ideally happen beforehand, to
short-circuit those expensive operations.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
This change will make it more tractable for applications to rely on the
existing session machinery to check compatibility of any of their
models.

Co-authored-by: Aditya Rastogi <adityar@ntdev.microsoft.com>
…24931)

## Problem

QNN error messages were being logged at VERBOSE level instead of ERROR
level, making them invisible unless verbose logging was enabled. Users
would only see unhelpful generic error messages like:

```
Failed to finalize QNN graph. Error code: 1002 at location qnn_model.cc:167 FinalizeGraphs
```

But the actual detailed error messages from QNN were hidden in verbose
logs:

```
tcm_migration.cc:2088:ERROR:Operator named q::*InputSlicePad (0x1654900000002) not sufficiently tiled to fit in TCM. Requires 12441600 bytes
graph_prepare.cc:2808:ERROR:Graph prepare TCM Migration action failed
graph_prepare.cc:2868:ERROR:Graph prepare failed during optimization with err: 17, Fatal Optimize
```

## Root Cause

The `QnnLogging` callback function in `qnn_backend_manager.cc` was
ignoring the `level` parameter from QNN and hardcoding all messages as
`kVERBOSE` severity:

```cpp
void QnnLogging(const char* format, QnnLog_Level_t level, uint64_t timestamp, va_list argument_parameter) {
  ORT_UNUSED_PARAMETER(level);  // ❌ Ignoring the actual log level
  // ...
  const auto severity = ::onnxruntime::logging::Severity::kVERBOSE;  // ❌ Hardcoded as VERBOSE
```

## Solution

Modified the `QnnLogging` function to properly map QNN log levels to
appropriate ORT severity levels:

- `QNN_LOG_LEVEL_ERROR` → `logging::Severity::kERROR` ✅ **Key fix**
- `QNN_LOG_LEVEL_WARN` → `logging::Severity::kWARNING`
- `QNN_LOG_LEVEL_INFO` → `logging::Severity::kINFO`
- `QNN_LOG_LEVEL_VERBOSE/DEBUG` → `logging::Severity::kVERBOSE`

## Changes Made

1. **Modified `QnnLogging` function**: Removed hardcoded `kVERBOSE` and
added proper level mapping
2. **Added `MapQNNLogLevelToOrtSeverity` function**: For potential
future reuse
3. **Minimal and surgical changes**: Only 37 lines added, 2 removed

## Impact

QNN error messages will now appear as ERROR-level logs in normal logging
output, making debugging much easier for users without requiring verbose
logging to be enabled.

Fixes #24876.

---

💡 You can make Copilot smarter by setting up custom instructions,
customizing its development environment and configuring Model Context
Protocol (MCP) servers. Learn more [Copilot coding agent
tips](https://gh.io/copilot-coding-agent-tips) in the docs.

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: vraspar <51386888+vraspar@users.noreply.github.com>
Co-authored-by: yuslepukhin <11303988+yuslepukhin@users.noreply.github.com>
Co-authored-by: Dmitri Smirnov <yuslepukhin@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Dmitri Smirnov <dmitrism@microsoft.com>
* Extend compile_ep_context to also support plugin eps
* Adds compile_only option to skip execution, can be used when compiling
for virtual devices

compile_ep_context (physical device)
<img width="1259" height="510" alt="image"
src="https://github.com/user-attachments/assets/14650c17-0c8a-4002-a7ce-e8e4c815a516"
/>

compile_ep_context + compile_only (virtual device)
<img width="1262" height="173" alt="image"
src="https://github.com/user-attachments/assets/2f0844cc-5e83-4b2d-bf0a-0d815d9bad29"
/>
Added support for engine validation check for EP Context models.

### Motivation and Context
We wanted to implement the GetModelCompatibilityForEpDevices() API
support and thus have an end user available API for the engine
validation check for EP context models. Added this support and the
necessary function implementation
### Description
Models with corresponding Olive recipes are deprecated.


### Motivation and Context
Olive and Olive-recipes is the entry point for model optimization. We
want onnxruntime to be only for runtime. So, deprecating examples that
are already present in olive recipes.
Description
Enables the file mapping of weights as well as the overall context bin.
This feature is currently only enabled for ARM64 WIN devices

Motivation and Context
Currently, when reading the context bin, ORT allocates a large buffer on
the heap. Assuming the same model is used, each ORT session will
allocate a buffer for the context bin. This is incredibly wasteful when
large models are used. Instead, WIN file mapping can be leveraged to map
the context bin, then every time a context needs to be created with the
context bin, the pointer to the context bin can be retrieved and used
instead of some pre-allocated buffer, thus making QNN EP more
memory-efficient. In the case of multiple ORT sessions, the context bin
will only be loaded once for all sessions, increasing memory efficiency
and overall initialization performance. This is very useful regarding
the use of LLMs going forward.

---------

Co-authored-by: quic_calvnguy <quic_calvnguy@quic_inc.com>
@tianleiwu tianleiwu requested review from adrianlizarraga, chilo-ms, jambayk, skottmckay and yuslepukhin and removed request for chilo-ms January 27, 2026 05:49
Copy link
Contributor

@adrianlizarraga adrianlizarraga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look good. I checked all except PR 27156

@tianleiwu tianleiwu merged commit 50d4c84 into rel-1.24.0 Jan 27, 2026
75 of 78 checks passed
@tianleiwu tianleiwu deleted the tlwu/rel-1.24.0_cherry_pick_round3 branch January 27, 2026 18:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants