Skip to content

Conversation

@quic-calvnguy
Copy link
Contributor

Description
Enables the file mapping of weights as well as the overall context bin. This feature is currently only enabled for ARM64 WIN devices

Motivation and Context
Currently, when reading the context bin, ORT allocates a large buffer on the heap. Assuming the same model is used, each ORT session will allocate a buffer for the context bin. This is incredibly wasteful when large models are used. Instead, WIN file mapping can be leveraged to map the context bin, then every time a context needs to be created with the context bin, the pointer to the context bin can be retrieved and used instead of some pre-allocated buffer, thus making QNN EP more memory-efficient. In the case of multiple ORT sessions, the context bin will only be loaded once for all sessions, increasing memory efficiency and overall initialization performance. This is very useful regarding the use of LLMs going forward.

@yuslepukhin
Copy link
Member

yuslepukhin commented Jan 13, 2026

The observation is not entirely true. ORT memory map external weights. You have an ability to request a weight as an ORT Value from the EP. If the weight is external it will be memory mapped.

See Graph::LoadExternalInitializerAsOrtValue. One can add this to the provider_wrapped_types.h to Graph and expose to DLL based EPs. Then you will get mapping for free.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@yuslepukhin
Copy link
Member

I suggest not to implement QNN specific mapping, but re-use code in ORT.

@yuslepukhin
Copy link
Member

Discussed offline, the EP maps initializers form the binary context, not from the external weights files.

quic_calvnguy added 3 commits January 14, 2026 13:56
 - Create file mapping callback interface class
   - Android expected to have support in the future
 - Implement Windows callbacks in WindowsFileMapper
 - New option disable_file_mapped_weights
   - Feature is enabled by default with retry logic
@quic-calvnguy quic-calvnguy force-pushed the dev/calvnguy/file_mapped_weights branch from f55dc78 to 2e451ae Compare January 14, 2026 21:57
#endif
}

static const std::string DISABLE_FILE_MAPPED_WEIGHTS = "disable_file_mapped_weights";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

static const std::string DISABLE_FILE_MAPPED_WEIGHTS

Is there a separate place where all constants are collected together?

if ("1" == disable_file_mapped_weights_pos->second) {
enable_file_mapped_weights_ = false;
}
LOGS_DEFAULT(VERBOSE) << "User specified disable_file_mapped_weights: " << enable_file_mapped_weights_;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LOGS_DEFAULT(VERBOSE)

I believe we pass a logger to use to every EP so we can see which session emits which message. Pls, check if it is available.

ORT_RETURN_IF(!cache_file || !cache_file.good(), "Failed to retrieve context binary from: ", context_bin_filepath);

cache_file.seekg(0, cache_file.end);
size_t buffer_size = static_cast<size_t>(cache_file.tellg());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

static_cast<size_t>

Would this require a safe cast? It may fail on 32-bit systems if that matters.

ORT_RETURN_IF_ERROR(ReadContextBinIfValid(context_bin_filepath, buffer_info, false));

size_t buffer_size = buffer_info.size;
ORT_RETURN_IF(buffer_size == 0, "Context bin has a size of 0 bytes: ", context_bin_filepath);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ORT_RETURN_IF(buffer

The size has already been validated inside the function and the error code was checked.

context_params_ptr_list.clear();
context_callbacks_list.clear();
context_paramsv2_list.clear();
context_params_list.clear();
Copy link
Member

@yuslepukhin yuslepukhin Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary? If it is, then the code is not exception safe.

// At time of destruction. Usage of logger_ will not be available and will result in a seg fault
WindowsFileMapper::~WindowsFileMapper() {
std::lock_guard<std::mutex> lock(map_mutex_);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need tis lock? Are we expecting multiple threads destroying the same object? Then we have bigger problems.

OPEN_EXISTING,
FILE_ATTRIBUTE_NORMAL,
NULL);
ORT_RETURN_IF(file_handle == INVALID_HANDLE_VALUE,
Copy link
Member

@yuslepukhin yuslepukhin Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

file_handle

This handle is leaking on line 72. Check the wil library that has windows oriented smart pointers.
Also, you can close the file handle as soon as mapping handle is obtained, not need to cache it.

std::lock_guard<std::mutex> lock(map_mutex_);
auto status = Status::OK();

auto bin_map_it = std::find_if(context_bin_to_mapping_handle_map_.begin(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

context_bin_to_mapping_handle_map_

Here and below. If this is a map, would not make sense to avoid sequential scan and use map find() method?

@yuslepukhin
Copy link
Member

Please, avoid force pushes.


HANDLE file_mapping_handle = bin_map_it->second;
auto mapping_it = std::find_if(mapping_handle_to_info_map_.begin(),
mapping_handle_to_info_map_.end(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hash tables exist to query elements in constant time, not O(n)

LOGS(*logger_, INFO) << "Creating mapping pointer for " << bin_filepath;

std::lock_guard<std::mutex> lock(map_mutex_);
auto it = std::find_if(context_bin_to_mapping_handle_map_.begin(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::find_if

Ditto


ORT_RETURN_IF(mapview_ptr == nullptr, "Failed to create mapping pointer for ", bin_filepath);

if (!context_bin_map_view_pointers_.insert(mapview_ptr).second) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(!context_bin_map_view_pointers_.insert(mapview_ptr).second)

How is this even possible? You just created a new mapping that is guaranteed to have a unique pointer returned.

Copy link
Member

@yuslepukhin yuslepukhin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🕐

@edgchen1
Copy link
Contributor

/azp run Linux QNN CI Pipeline,Windows ARM64 QNN CI Pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@yuslepukhin
Copy link
Member

Please, comment on all Copilot review issues before resolving them.

bool file_mapped_weights_enabled_ = false;

#ifdef QNN_FILE_MAPPED_WEIGHTS_ENABLED
std::shared_ptr<FileMappingCallbackInterface> file_mapper_ = nullptr;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is shared ownership needed?

OPEN_EXISTING,
FILE_ATTRIBUTE_NORMAL,
NULL);
ORT_RETURN_IF(file_handle == INVALID_HANDLE_VALUE,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

general: it would be good to output more information about errors from Windows APIs, e.g., with ::GetLastError()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants