-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[QNN-EP] Implement file mapped weights feature #26952
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[QNN-EP] Implement file mapped weights feature #26952
Conversation
|
The observation is not entirely true. ORT memory map external weights. You have an ability to request a weight as an ORT Value from the EP. If the weight is external it will be memory mapped. See |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.
onnxruntime/core/providers/qnn/builder/qnn_windows_file_mapper.cc
Outdated
Show resolved
Hide resolved
onnxruntime/core/providers/qnn/builder/qnn_windows_file_mapper.cc
Outdated
Show resolved
Hide resolved
onnxruntime/core/providers/qnn/builder/qnn_windows_file_mapper.cc
Outdated
Show resolved
Hide resolved
onnxruntime/core/providers/qnn/builder/qnn_file_mapping_callback_interface.h
Outdated
Show resolved
Hide resolved
|
I suggest not to implement QNN specific mapping, but re-use code in ORT. |
|
Discussed offline, the EP maps initializers form the binary context, not from the external weights files. |
- Create file mapping callback interface class - Android expected to have support in the future - Implement Windows callbacks in WindowsFileMapper - New option disable_file_mapped_weights - Feature is enabled by default with retry logic
f55dc78 to
2e451ae
Compare
onnxruntime/core/providers/qnn/builder/qnn_windows_file_mapper.cc
Outdated
Show resolved
Hide resolved
onnxruntime/core/providers/qnn/builder/qnn_windows_file_mapper.cc
Outdated
Show resolved
Hide resolved
onnxruntime/core/providers/qnn/builder/qnn_windows_file_mapper.cc
Outdated
Show resolved
Hide resolved
|
Please, avoid force pushes. |
onnxruntime/core/providers/qnn/builder/qnn_windows_file_mapper.cc
Outdated
Show resolved
Hide resolved
onnxruntime/core/providers/qnn/builder/qnn_windows_file_mapper.cc
Outdated
Show resolved
Hide resolved
onnxruntime/core/providers/qnn/builder/qnn_windows_file_mapper.cc
Outdated
Show resolved
Hide resolved
yuslepukhin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🕐
|
/azp run Linux QNN CI Pipeline,Windows ARM64 QNN CI Pipeline |
|
Azure Pipelines successfully started running 2 pipeline(s). |
|
Please, comment on all Copilot review issues before resolving them. |
onnxruntime/core/providers/qnn/builder/qnn_file_mapping_callback_interface.h
Outdated
Show resolved
Hide resolved
onnxruntime/core/providers/qnn/builder/qnn_windows_file_mapper.cc
Outdated
Show resolved
Hide resolved
|
Azure Pipelines successfully started running 2 pipeline(s). |
Add QnnHtpSharedAllocator to HTP check during unit testing to cover cases where RPCMEM is not available
|
/azp run Linux QNN CI Pipeline,Windows ARM64 QNN CI Pipeline |
|
Azure Pipelines successfully started running 2 pipeline(s). |
onnxruntime/core/providers/qnn/builder/qnn_windows_file_mapper.cc
Outdated
Show resolved
Hide resolved
onnxruntime/core/providers/qnn/builder/qnn_windows_file_mapper.cc
Outdated
Show resolved
Hide resolved
onnxruntime/core/providers/qnn/builder/qnn_windows_file_mapper.cc
Outdated
Show resolved
Hide resolved
onnxruntime/core/providers/qnn/builder/qnn_windows_file_mapper.h
Outdated
Show resolved
Hide resolved
onnxruntime/core/providers/qnn/builder/qnn_windows_file_mapper.cc
Outdated
Show resolved
Hide resolved
|
/azp run Linux QNN CI Pipeline,Windows ARM64 QNN CI Pipeline |
|
Azure Pipelines successfully started running 2 pipeline(s). |
|
/azp run Win_TRT_Minimal_CUDA_Test_CI, Windows GPU Doc Gen CI Pipeline |
|
Azure Pipelines successfully started running 2 pipeline(s). |
onnxruntime/core/providers/qnn/builder/qnn_windows_file_mapper.cc
Outdated
Show resolved
Hide resolved
|
/azp run Linux QNN CI Pipeline,Windows ARM64 QNN CI Pipeline |
|
Azure Pipelines successfully started running 2 pipeline(s). |
QnnBackendManager::SetupBackend if file mapping is not available
onnxruntime/core/providers/qnn/builder/qnn_windows_file_mapper.cc
Outdated
Show resolved
Hide resolved
onnxruntime/core/providers/qnn/builder/qnn_file_mapping_interface.h
Outdated
Show resolved
Hide resolved
|
/azp run Linux QNN CI Pipeline,Windows ARM64 QNN CI Pipeline |
|
Azure Pipelines successfully started running 2 pipeline(s). |
onnxruntime/core/providers/qnn/builder/qnn_windows_file_mapper.cc
Outdated
Show resolved
Hide resolved
Make file mapping callbacks more thread safe Do not destruct file_mapper_ until session destruction
Description
Enables the file mapping of weights as well as the overall context bin. This feature is currently only enabled for ARM64 WIN devices
Motivation and Context
Currently, when reading the context bin, ORT allocates a large buffer on the heap. Assuming the same model is used, each ORT session will allocate a buffer for the context bin. This is incredibly wasteful when large models are used. Instead, WIN file mapping can be leveraged to map the context bin, then every time a context needs to be created with the context bin, the pointer to the context bin can be retrieved and used instead of some pre-allocated buffer, thus making QNN EP more memory-efficient. In the case of multiple ORT sessions, the context bin will only be loaded once for all sessions, increasing memory efficiency and overall initialization performance. This is very useful regarding the use of LLMs going forward.