-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[QNN-EP] Implement file mapped weights feature #26952
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[QNN-EP] Implement file mapped weights feature #26952
Conversation
|
The observation is not entirely true. ORT memory map external weights. You have an ability to request a weight as an ORT Value from the EP. If the weight is external it will be memory mapped. See |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.
onnxruntime/core/providers/qnn/builder/qnn_windows_file_mapper.cc
Outdated
Show resolved
Hide resolved
onnxruntime/core/providers/qnn/builder/qnn_windows_file_mapper.cc
Outdated
Show resolved
Hide resolved
onnxruntime/core/providers/qnn/builder/qnn_windows_file_mapper.cc
Outdated
Show resolved
Hide resolved
onnxruntime/core/providers/qnn/builder/qnn_file_mapping_callback_interface.h
Outdated
Show resolved
Hide resolved
|
I suggest not to implement QNN specific mapping, but re-use code in ORT. |
|
Discussed offline, the EP maps initializers form the binary context, not from the external weights files. |
- Create file mapping callback interface class - Android expected to have support in the future - Implement Windows callbacks in WindowsFileMapper - New option disable_file_mapped_weights - Feature is enabled by default with retry logic
f55dc78 to
2e451ae
Compare
| #endif | ||
| } | ||
|
|
||
| static const std::string DISABLE_FILE_MAPPED_WEIGHTS = "disable_file_mapped_weights"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if ("1" == disable_file_mapped_weights_pos->second) { | ||
| enable_file_mapped_weights_ = false; | ||
| } | ||
| LOGS_DEFAULT(VERBOSE) << "User specified disable_file_mapped_weights: " << enable_file_mapped_weights_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ORT_RETURN_IF(!cache_file || !cache_file.good(), "Failed to retrieve context binary from: ", context_bin_filepath); | ||
|
|
||
| cache_file.seekg(0, cache_file.end); | ||
| size_t buffer_size = static_cast<size_t>(cache_file.tellg()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ORT_RETURN_IF_ERROR(ReadContextBinIfValid(context_bin_filepath, buffer_info, false)); | ||
|
|
||
| size_t buffer_size = buffer_info.size; | ||
| ORT_RETURN_IF(buffer_size == 0, "Context bin has a size of 0 bytes: ", context_bin_filepath); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| context_params_ptr_list.clear(); | ||
| context_callbacks_list.clear(); | ||
| context_paramsv2_list.clear(); | ||
| context_params_list.clear(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this necessary? If it is, then the code is not exception safe.
| // At time of destruction. Usage of logger_ will not be available and will result in a seg fault | ||
| WindowsFileMapper::~WindowsFileMapper() { | ||
| std::lock_guard<std::mutex> lock(map_mutex_); | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need tis lock? Are we expecting multiple threads destroying the same object? Then we have bigger problems.
| OPEN_EXISTING, | ||
| FILE_ATTRIBUTE_NORMAL, | ||
| NULL); | ||
| ORT_RETURN_IF(file_handle == INVALID_HANDLE_VALUE, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| std::lock_guard<std::mutex> lock(map_mutex_); | ||
| auto status = Status::OK(); | ||
|
|
||
| auto bin_map_it = std::find_if(context_bin_to_mapping_handle_map_.begin(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
Please, avoid force pushes. |
|
|
||
| HANDLE file_mapping_handle = bin_map_it->second; | ||
| auto mapping_it = std::find_if(mapping_handle_to_info_map_.begin(), | ||
| mapping_handle_to_info_map_.end(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hash tables exist to query elements in constant time, not O(n)
| LOGS(*logger_, INFO) << "Creating mapping pointer for " << bin_filepath; | ||
|
|
||
| std::lock_guard<std::mutex> lock(map_mutex_); | ||
| auto it = std::find_if(context_bin_to_mapping_handle_map_.begin(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
|
||
| ORT_RETURN_IF(mapview_ptr == nullptr, "Failed to create mapping pointer for ", bin_filepath); | ||
|
|
||
| if (!context_bin_map_view_pointers_.insert(mapview_ptr).second) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yuslepukhin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🕐
|
/azp run Linux QNN CI Pipeline,Windows ARM64 QNN CI Pipeline |
|
Azure Pipelines successfully started running 2 pipeline(s). |
|
Please, comment on all Copilot review issues before resolving them. |
onnxruntime/core/providers/qnn/builder/qnn_file_mapping_callback_interface.h
Show resolved
Hide resolved
| bool file_mapped_weights_enabled_ = false; | ||
|
|
||
| #ifdef QNN_FILE_MAPPED_WEIGHTS_ENABLED | ||
| std::shared_ptr<FileMappingCallbackInterface> file_mapper_ = nullptr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is shared ownership needed?
| OPEN_EXISTING, | ||
| FILE_ATTRIBUTE_NORMAL, | ||
| NULL); | ||
| ORT_RETURN_IF(file_handle == INVALID_HANDLE_VALUE, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
general: it would be good to output more information about errors from Windows APIs, e.g., with ::GetLastError()
Description
Enables the file mapping of weights as well as the overall context bin. This feature is currently only enabled for ARM64 WIN devices
Motivation and Context
Currently, when reading the context bin, ORT allocates a large buffer on the heap. Assuming the same model is used, each ORT session will allocate a buffer for the context bin. This is incredibly wasteful when large models are used. Instead, WIN file mapping can be leveraged to map the context bin, then every time a context needs to be created with the context bin, the pointer to the context bin can be retrieved and used instead of some pre-allocated buffer, thus making QNN EP more memory-efficient. In the case of multiple ORT sessions, the context bin will only be loaded once for all sessions, increasing memory efficiency and overall initialization performance. This is very useful regarding the use of LLMs going forward.