Skip to content

Conversation

@quic-calvnguy
Copy link
Contributor

Description
Enables the file mapping of weights as well as the overall context bin. This feature is currently only enabled for ARM64 WIN devices

Motivation and Context
Currently, when reading the context bin, ORT allocates a large buffer on the heap. Assuming the same model is used, each ORT session will allocate a buffer for the context bin. This is incredibly wasteful when large models are used. Instead, WIN file mapping can be leveraged to map the context bin, then every time a context needs to be created with the context bin, the pointer to the context bin can be retrieved and used instead of some pre-allocated buffer, thus making QNN EP more memory-efficient. In the case of multiple ORT sessions, the context bin will only be loaded once for all sessions, increasing memory efficiency and overall initialization performance. This is very useful regarding the use of LLMs going forward.

@yuslepukhin
Copy link
Member

yuslepukhin commented Jan 13, 2026

The observation is not entirely true. ORT memory map external weights. You have an ability to request a weight as an ORT Value from the EP. If the weight is external it will be memory mapped.

See Graph::LoadExternalInitializerAsOrtValue. One can add this to the provider_wrapped_types.h to Graph and expose to DLL based EPs. Then you will get mapping for free.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@yuslepukhin
Copy link
Member

I suggest not to implement QNN specific mapping, but re-use code in ORT.

@yuslepukhin
Copy link
Member

Discussed offline, the EP maps initializers form the binary context, not from the external weights files.

quic_calvnguy added 3 commits January 14, 2026 13:56
 - Create file mapping callback interface class
   - Android expected to have support in the future
 - Implement Windows callbacks in WindowsFileMapper
 - New option disable_file_mapped_weights
   - Feature is enabled by default with retry logic
@quic-calvnguy quic-calvnguy force-pushed the dev/calvnguy/file_mapped_weights branch from f55dc78 to 2e451ae Compare January 14, 2026 21:57
@yuslepukhin
Copy link
Member

Please, avoid force pushes.

Copy link
Member

@yuslepukhin yuslepukhin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🕐

@edgchen1
Copy link
Contributor

/azp run Linux QNN CI Pipeline,Windows ARM64 QNN CI Pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@yuslepukhin
Copy link
Member

Please, comment on all Copilot review issues before resolving them.

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

quic_calvnguy added 2 commits January 22, 2026 19:04
Add QnnHtpSharedAllocator to HTP check during unit testing
to cover cases where RPCMEM is not available
@edgchen1
Copy link
Contributor

/azp run Linux QNN CI Pipeline,Windows ARM64 QNN CI Pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@edgchen1
Copy link
Contributor

/azp run Linux QNN CI Pipeline,Windows ARM64 QNN CI Pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@tianleiwu tianleiwu requested a review from yuslepukhin January 23, 2026 06:28
@tianleiwu
Copy link
Contributor

/azp run Win_TRT_Minimal_CUDA_Test_CI, Windows GPU Doc Gen CI Pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@edgchen1
Copy link
Contributor

/azp run Linux QNN CI Pipeline,Windows ARM64 QNN CI Pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

QnnBackendManager::SetupBackend if file mapping is not available
@edgchen1
Copy link
Contributor

/azp run Linux QNN CI Pipeline,Windows ARM64 QNN CI Pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

Make file mapping callbacks more thread safe
Do not destruct file_mapper_ until session destruction
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ep:QNN issues related to QNN exeution provider release:1.24.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants