[QNN EP] Enable offline x64 compilation with memhandle IO type#27479
[QNN EP] Enable offline x64 compilation with memhandle IO type#27479derdeljan-msft merged 5 commits intomainfrom
Conversation
(cherry picked from commit 4970a4c)
|
/azp run Linux QNN CI Pipeline, Win_TRT_Minimal_CUDA_Test_CI, Windows ARM64 QNN CI Pipeline, Windows GPU CUDA CI Pipeline, Windows GPU DML CI Pipeline, Windows GPU Doc Gen CI Pipeline, Windows GPU TensorRT CI Pipeline, Windows OpenVINO CI Pipeline, Windows x64 QNN CI Pipeline |
|
Azure Pipelines successfully started running 4 pipeline(s). |
There was a problem hiding this comment.
Pull request overview
This PR enables offline x64 compilation for QNN EP with MEMHANDLE IO type by making two key changes: (1) deferring rpcmem library loading until inference time (since it's only needed for shared memory allocation, not compilation), and (2) centralizing the MEMHANDLE mem type assignment in AddTensorWrapper so that it's applied consistently regardless of how QnnTensorWrapper is created.
Changes:
- Skip rpcmem library loading during context generation (when
context_cache_enabled_is true), enabling offline compilation on x64 where rpcmem is unavailable. - Move MEMHANDLE mem type assignment from
MakeTensorWrappertoAddTensorWrapper, ensuring all tensor wrappers (whether created via factory methods or directly by op builders) get the correct mem type based on whether they are graph I/O tensors. - Update
QnnTensorWrapper::Initto preserve the source tensor's mem type instead of unconditionally resetting to RAW.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
onnxruntime/core/providers/qnn/qnn_execution_provider.cc |
Guard rpcmem library loading with !context_cache_enabled_ to allow offline compilation without the library |
onnxruntime/core/providers/qnn/builder/qnn_model_wrapper.cc |
Remove mem type logic from MakeTensorWrapper and centralize it in AddTensorWrapper |
onnxruntime/core/providers/qnn/builder/qnn_def.h |
Preserve source tensor's mem type in QnnTensorWrapper::Init instead of resetting to RAW |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
|
/azp run Linux QNN CI Pipeline, Win_TRT_Minimal_CUDA_Test_CI, Windows ARM64 QNN CI Pipeline, Windows GPU CUDA CI Pipeline, Windows GPU DML CI Pipeline, Windows GPU Doc Gen CI Pipeline, Windows GPU TensorRT CI Pipeline, Windows OpenVINO CI Pipeline, Windows x64 QNN CI Pipeline |
|
Azure Pipelines successfully started running 4 pipeline(s). |
### Description Enable offline compilation for QNN EP with MEMHADNLE IO type. It was previously enabled only on ARM because the QNN EP was loading rpcmem library (only available on ARM drivers), which is not actually used for the compilation (it is required only for inference to allocate the shared memory). Ensured that the MEMHANDLE IO type is correctly set regardless of the way how `QnnTensorWrapper` is created (either through factory function or by creating it directly). This ensures that mem type will be correctly configured regardless of the op builder implementation. --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> (cherry picked from commit d626b56)
### Description Enable offline compilation for QNN EP with MEMHADNLE IO type. It was previously enabled only on ARM because the QNN EP was loading rpcmem library (only available on ARM drivers), which is not actually used for the compilation (it is required only for inference to allocate the shared memory). Ensured that the MEMHANDLE IO type is correctly set regardless of the way how `QnnTensorWrapper` is created (either through factory function or by creating it directly). This ensures that mem type will be correctly configured regardless of the op builder implementation. --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> (cherry picked from commit d626b56)
### Description Enable offline compilation for QNN EP with MEMHADNLE IO type. It was previously enabled only on ARM because the QNN EP was loading rpcmem library (only available on ARM drivers), which is not actually used for the compilation (it is required only for inference to allocate the shared memory). Ensured that the MEMHANDLE IO type is correctly set regardless of the way how `QnnTensorWrapper` is created (either through factory function or by creating it directly). This ensures that mem type will be correctly configured regardless of the op builder implementation. --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
This cherry-picks the following commits for the release: | Commit ID | PR Number | Commit Title | |-----------|-----------|-------------| | eb23be8 | #27354 | Update python_requires | | d626b56 | #27479 | [QNN EP] Enable offline x64 compilation with memhandle IO type | | 60ce0e6 | #27607 | Use `_tpause` instead of `__builtin_ia32_tpause` | | 69feb84 | #27591 | Add PCI bus fallback for Linux GPU device discovery in containerized environments | | de92668 | #27650 | Revert "[QNN EP] Fix error messages being logged as VERBOSE instead o… | | 0f66526 | #27644 | [Plugin EP] Check for nullptr before dereferencing | | 929f73e | #27666 | Plugin EP: Fix bug that incorrectly assigned duplicate MetDef IDs to fused nodes in different GraphViews | --------- Co-authored-by: XXXXRT666 <157766680+XXXXRT666@users.noreply.github.com> Co-authored-by: derdeljan-msft <derdeljan@microsoft.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Shogo Yamazaki <f9ifphmiz7i8akhowc8l5t1x9qp0lfu4@mocknen.net> Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com> Co-authored-by: baijumeswani <12852605+baijumeswani@users.noreply.github.com> Co-authored-by: edgchen1 <18449977+edgchen1@users.noreply.github.com> Co-authored-by: Baiju Meswani <bmeswani@microsoft.com> Co-authored-by: Artur Wojcik <artur.wojcik@amd.com> Co-authored-by: Adrian Lizarraga <adlizarraga@microsoft.com>
Description
Enable offline compilation for QNN EP with MEMHADNLE IO type. It was previously enabled only on ARM because the QNN EP was loading rpcmem library (only available on ARM drivers), which is not actually used for the compilation (it is required only for inference to allocate the shared memory).
Ensured that the MEMHANDLE IO type is correctly set regardless of the way how
QnnTensorWrapperis created (either through factory function or by creating it directly). This ensures that mem type will be correctly configured regardless of the op builder implementation.