Skip to content

[QNN EP] Enable offline x64 compilation with memhandle IO type#27479

Merged
derdeljan-msft merged 5 commits intomainfrom
derdeljan/qnn_ep_memhandle_offline_compile
Mar 7, 2026
Merged

[QNN EP] Enable offline x64 compilation with memhandle IO type#27479
derdeljan-msft merged 5 commits intomainfrom
derdeljan/qnn_ep_memhandle_offline_compile

Conversation

@derdeljan-msft
Copy link
Copy Markdown
Contributor

Description

Enable offline compilation for QNN EP with MEMHADNLE IO type. It was previously enabled only on ARM because the QNN EP was loading rpcmem library (only available on ARM drivers), which is not actually used for the compilation (it is required only for inference to allocate the shared memory).

Ensured that the MEMHANDLE IO type is correctly set regardless of the way how QnnTensorWrapper is created (either through factory function or by creating it directly). This ensures that mem type will be correctly configured regardless of the op builder implementation.

@derdeljan-msft derdeljan-msft self-assigned this Feb 27, 2026
@yuslepukhin
Copy link
Copy Markdown
Member

/azp run Linux QNN CI Pipeline, Win_TRT_Minimal_CUDA_Test_CI, Windows ARM64 QNN CI Pipeline, Windows GPU CUDA CI Pipeline, Windows GPU DML CI Pipeline, Windows GPU Doc Gen CI Pipeline, Windows GPU TensorRT CI Pipeline, Windows OpenVINO CI Pipeline, Windows x64 QNN CI Pipeline

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 4 pipeline(s).

Copy link
Copy Markdown
Member

@yuslepukhin yuslepukhin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need unit test.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enables offline x64 compilation for QNN EP with MEMHANDLE IO type by making two key changes: (1) deferring rpcmem library loading until inference time (since it's only needed for shared memory allocation, not compilation), and (2) centralizing the MEMHANDLE mem type assignment in AddTensorWrapper so that it's applied consistently regardless of how QnnTensorWrapper is created.

Changes:

  • Skip rpcmem library loading during context generation (when context_cache_enabled_ is true), enabling offline compilation on x64 where rpcmem is unavailable.
  • Move MEMHANDLE mem type assignment from MakeTensorWrapper to AddTensorWrapper, ensuring all tensor wrappers (whether created via factory methods or directly by op builders) get the correct mem type based on whether they are graph I/O tensors.
  • Update QnnTensorWrapper::Init to preserve the source tensor's mem type instead of unconditionally resetting to RAW.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
onnxruntime/core/providers/qnn/qnn_execution_provider.cc Guard rpcmem library loading with !context_cache_enabled_ to allow offline compilation without the library
onnxruntime/core/providers/qnn/builder/qnn_model_wrapper.cc Remove mem type logic from MakeTensorWrapper and centralize it in AddTensorWrapper
onnxruntime/core/providers/qnn/builder/qnn_def.h Preserve source tensor's mem type in QnnTensorWrapper::Init instead of resetting to RAW

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can commit the suggested changes from lintrunner.

derdeljan-msft and others added 2 commits March 6, 2026 22:35
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
@yuslepukhin
Copy link
Copy Markdown
Member

/azp run Linux QNN CI Pipeline, Win_TRT_Minimal_CUDA_Test_CI, Windows ARM64 QNN CI Pipeline, Windows GPU CUDA CI Pipeline, Windows GPU DML CI Pipeline, Windows GPU Doc Gen CI Pipeline, Windows GPU TensorRT CI Pipeline, Windows OpenVINO CI Pipeline, Windows x64 QNN CI Pipeline

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 4 pipeline(s).

@derdeljan-msft derdeljan-msft enabled auto-merge (squash) March 6, 2026 22:56
@derdeljan-msft derdeljan-msft merged commit d626b56 into main Mar 7, 2026
91 checks passed
@derdeljan-msft derdeljan-msft deleted the derdeljan/qnn_ep_memhandle_offline_compile branch March 7, 2026 07:53
derdeljan-msft added a commit that referenced this pull request Mar 9, 2026
### Description

Enable offline compilation for QNN EP with MEMHADNLE IO type. It was
previously enabled only on ARM because the QNN EP was loading rpcmem
library (only available on ARM drivers), which is not actually used for
the compilation (it is required only for inference to allocate the
shared memory).

Ensured that the MEMHANDLE IO type is correctly set regardless of the
way how `QnnTensorWrapper` is created (either through factory function
or by creating it directly). This ensures that mem type will be
correctly configured regardless of the op builder implementation.

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
(cherry picked from commit d626b56)
derdeljan-msft added a commit that referenced this pull request Mar 12, 2026
### Description

Enable offline compilation for QNN EP with MEMHADNLE IO type. It was
previously enabled only on ARM because the QNN EP was loading rpcmem
library (only available on ARM drivers), which is not actually used for
the compilation (it is required only for inference to allocate the
shared memory).

Ensured that the MEMHANDLE IO type is correctly set regardless of the
way how `QnnTensorWrapper` is created (either through factory function
or by creating it directly). This ensures that mem type will be
correctly configured regardless of the op builder implementation.

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
(cherry picked from commit d626b56)
tianleiwu pushed a commit that referenced this pull request Mar 16, 2026
### Description

Enable offline compilation for QNN EP with MEMHADNLE IO type. It was
previously enabled only on ARM because the QNN EP was loading rpcmem
library (only available on ARM drivers), which is not actually used for
the compilation (it is required only for inference to allocate the
shared memory).

Ensured that the MEMHANDLE IO type is correctly set regardless of the
way how `QnnTensorWrapper` is created (either through factory function
or by creating it directly). This ensures that mem type will be
correctly configured regardless of the op builder implementation.

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
tianleiwu added a commit that referenced this pull request Mar 16, 2026
This cherry-picks the following commits for the release:

| Commit ID | PR Number | Commit Title |
|-----------|-----------|-------------|
| eb23be8 | #27354 | Update python_requires |
| d626b56 | #27479 | [QNN EP] Enable offline x64 compilation with
memhandle IO type |
| 60ce0e6 | #27607 | Use `_tpause` instead of `__builtin_ia32_tpause`
|
| 69feb84 | #27591 | Add PCI bus fallback for Linux GPU device
discovery in containerized environments |
| de92668 | #27650 | Revert "[QNN EP] Fix error messages being logged
as VERBOSE instead o… |
| 0f66526 | #27644 | [Plugin EP] Check for nullptr before
dereferencing |
| 929f73e | #27666 | Plugin EP: Fix bug that incorrectly assigned
duplicate MetDef IDs to fused nodes in different GraphViews |

---------

Co-authored-by: XXXXRT666 <157766680+XXXXRT666@users.noreply.github.com>
Co-authored-by: derdeljan-msft <derdeljan@microsoft.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Shogo Yamazaki <f9ifphmiz7i8akhowc8l5t1x9qp0lfu4@mocknen.net>
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: baijumeswani <12852605+baijumeswani@users.noreply.github.com>
Co-authored-by: edgchen1 <18449977+edgchen1@users.noreply.github.com>
Co-authored-by: Baiju Meswani <bmeswani@microsoft.com>
Co-authored-by: Artur Wojcik <artur.wojcik@amd.com>
Co-authored-by: Adrian Lizarraga <adlizarraga@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants