Cleanup: Consolidate OpKernel::UseSharePrePackedBuffers_V2 and OpKernel::UseSharePrePackedBuffers#27924
Merged
adrianlizarraga merged 3 commits intomainfrom Apr 3, 2026
Merged
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This PR consolidates the OpKernel shared pre-packed weight consumption API by removing the _V2 variant and standardizing on a single UseSharedPrePackedBuffers virtual method that includes buffer sizes, aligning the base class and all overrides with how the framework already invokes the hook.
Changes:
- Replace
UseSharedPrePackedBuffers_V2with a singleUseSharedPrePackedBuffersvirtual method using the (buffers, sizes, input_idx, used) signature. - Update framework call site and update kernel overrides across CPU/ACL/contrib ops and plugin EP bridge to match the new signature.
- Adjust unit test kernel override accordingly.
Reviewed changes
Copilot reviewed 27 out of 27 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| include/onnxruntime/core/framework/op_kernel.h | Consolidates the virtual method to a single 4-parameter signature and updates inline docs/comments. |
| onnxruntime/core/framework/session_state.cc | Renames framework call from _V2 to the consolidated UseSharedPrePackedBuffers. |
| onnxruntime/core/session/plugin_ep/ep_kernel_registration.cc | Updates Plugin EP kernel override to the consolidated method name/signature. |
| onnxruntime/test/framework/session_state_test.cc | Updates test kernel override signature to include prepacked_buffer_sizes. |
| onnxruntime/core/providers/cpu/rnn/deep_cpu_lstm.h | Updates LSTM kernel override declaration to include prepacked_buffer_sizes. |
| onnxruntime/core/providers/cpu/rnn/deep_cpu_lstm.cc | Updates LSTM kernel override definition to include prepacked_buffer_sizes. |
| onnxruntime/core/providers/cpu/rnn/deep_cpu_gru.h | Updates GRU kernel override declaration to include prepacked_buffer_sizes. |
| onnxruntime/core/providers/cpu/rnn/deep_cpu_gru.cc | Updates GRU kernel override definition to include prepacked_buffer_sizes. |
| onnxruntime/core/providers/cpu/quantization/qlinearconv.cc | Updates QLinearConv kernel override declaration/definition to include prepacked_buffer_sizes. |
| onnxruntime/core/providers/cpu/quantization/matmul_integer_base.h | Updates MatMulIntegerBase inline override to include prepacked_buffer_sizes. |
| onnxruntime/core/providers/cpu/nn/conv_transpose.h | Updates ConvTranspose override declaration to include prepacked_buffer_sizes. |
| onnxruntime/core/providers/cpu/nn/conv_transpose.cc | Updates ConvTranspose override definitions to include prepacked_buffer_sizes. |
| onnxruntime/core/providers/cpu/math/matmul.h | Updates MatMul override declaration to include prepacked_buffer_sizes. |
| onnxruntime/core/providers/cpu/math/matmul.cc | Updates MatMul override definition to include prepacked_buffer_sizes. |
| onnxruntime/core/providers/cpu/math/gemm.h | Updates Gemm override declaration to include prepacked_buffer_sizes. |
| onnxruntime/core/providers/cpu/math/gemm.cc | Updates Gemm override definitions to include prepacked_buffer_sizes. |
| onnxruntime/core/providers/cpu/fp16/fp16_conv.cc | Updates FusedConvFp16 override declaration/definition to include prepacked_buffer_sizes. |
| onnxruntime/core/providers/acl/nn/conv.h | Updates ACL Conv override declaration to include prepacked_buffer_sizes. |
| onnxruntime/core/providers/acl/nn/conv.cc | Updates ACL Conv override definition to include prepacked_buffer_sizes. |
| onnxruntime/core/providers/acl/math/matmul.h | Updates ACL MatMul override declaration to include prepacked_buffer_sizes. |
| onnxruntime/core/providers/acl/math/matmul.cc | Updates ACL MatMul override definition to include prepacked_buffer_sizes. |
| onnxruntime/contrib_ops/cpu/quantization/matmul_nbits.cc | Updates MatMulNBits override declaration/definition to include prepacked_buffer_sizes. |
| onnxruntime/contrib_ops/cpu/quantization/dynamic_quantize_lstm.cc | Updates DynamicQuantizeLSTM override declaration/definition to include prepacked_buffer_sizes. |
| onnxruntime/contrib_ops/cpu/quantization/attention_quant.cc | Updates QAttention override declaration/definition to include prepacked_buffer_sizes. |
| onnxruntime/contrib_ops/cpu/moe/moe_quantization_cpu.h | Renames QMoECPU override from _V2 to consolidated UseSharedPrePackedBuffers. |
| onnxruntime/contrib_ops/cpu/moe/moe_quantization_cpu.cc | Renames QMoECPU override/explicit instantiations to consolidated UseSharedPrePackedBuffers. |
| onnxruntime/contrib_ops/cpu/bert/attention.cc | Updates contrib Attention override declaration/definition to include prepacked_buffer_sizes. |
Comments suppressed due to low confidence (1)
onnxruntime/core/session/plugin_ep/ep_kernel_registration.cc:149
- In PluginEpOpKernel::UseSharedPrePackedBuffers,
buffer_unique_ptrsandbuffer_sizesare assumed to have the same length, but this isn't validated before passingbuffer_sizes.data()to the pluginSetSharedPrePackedWeightcallback. A buggy/malicious plugin could provide mismatched lengths (or a span not matchingbuffer_unique_ptrs.size()), leading to out-of-bounds reads in the plugin. Add an explicit check (and return a failure Status) ifbuffer_sizes.size() != buffer_unique_ptrs.size()before buildingbuffer_data_ptrs/calling into the plugin.
Status UseSharedPrePackedBuffers(std::vector<BufferUniquePtr>& buffer_unique_ptrs,
gsl::span<const size_t> buffer_sizes,
int input_idx, /*out*/ bool& used_shared_buffers) override {
assert(kernel_impl_ != nullptr); // Should be ensured by PluginEpOpKernel::Create().
if (kernel_impl_->ort_version_supported < 24 || kernel_impl_->SetSharedPrePackedWeight == nullptr) {
// OrtKernelImpl does not define an implementation. The session state, which calls this function,
// generates an error if necessary (i.e., kernel indicated it wanted to share weights but did not define this).
used_shared_buffers = false;
return Status::OK();
}
std::vector<const void*> buffer_data_ptrs;
buffer_data_ptrs.reserve(buffer_unique_ptrs.size());
std::transform(buffer_unique_ptrs.begin(), buffer_unique_ptrs.end(), std::back_inserter(buffer_data_ptrs),
[](const BufferUniquePtr& buff) -> const void* { return buff.get(); });
ORT_RETURN_IF_ERROR(ToStatusAndRelease(
kernel_impl_->SetSharedPrePackedWeight(kernel_impl_, buffer_data_ptrs.data(), buffer_sizes.data(),
buffer_data_ptrs.size(), input_idx)));
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
edgchen1
approved these changes
Apr 1, 2026
Contributor
edgchen1
left a comment
There was a problem hiding this comment.
thanks for cleaning this up!
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Consolidate
OpKernel::UseSharedPrePackedBuffersandOpKernel::UseSharedPrePackedBuffers_V2into a single virtual method, resolving the TODO inop_kernel.h.Background
The
OpKernelclass previously had two virtual methods for consuming shared pre-packed weight buffers:UseSharedPrePackedBuffers(V1) — 3 params:prepacked_buffers,input_idx,used_shared_buffersUseSharedPrePackedBuffers_V2— 4 params: addedprepacked_buffer_sizes(agsl::span<const size_t>)V2 was introduced to pass buffer sizes alongside the buffers. Its default implementation forwarded to V1 for backward compatibility. The framework (
session_state.cc) only ever called V2.Changes
Merged both methods into a single
UseSharedPrePackedBuffersusing the V2 signature:Updated 27 files across the codebase:
op_kernel.hsession_state.cc_V2callep_kernel_registration.ccmoe_quantization_cpu.h/.ccgemm,matmul,conv_transpose,fp16_conv,qlinearconv,matmul_integer_base,deep_cpu_lstm,deep_cpu_gruprepacked_buffer_sizesparamacl/conv,acl/matmulmatmul_nbits,dynamic_quantize_lstm,attention_quant,bert/attentionsession_state_test.ccNotes
prepacked_buffer_sizesparameter as unnamed/unused (/*prepacked_buffer_sizes*/) — no logic changes in those kernels.SetSharedPrePackedWeightinonnxruntime_ep_c_api.h) already passes buffer sizes, so no C API changes were needed.UseSharedPrePackedBuffersImplin LSTM/GRU) are not virtual overrides and were not modified.Motivation and Context
Addresses the TODO at
include/onnxruntime/core/framework/op_kernel.h:139: