Skip to content

Cleanup: Consolidate OpKernel::UseSharePrePackedBuffers_V2 and OpKernel::UseSharePrePackedBuffers#27924

Merged
adrianlizarraga merged 3 commits intomainfrom
adrianl/Cleanup_PluginEp_PrePackV2
Apr 3, 2026
Merged

Cleanup: Consolidate OpKernel::UseSharePrePackedBuffers_V2 and OpKernel::UseSharePrePackedBuffers#27924
adrianlizarraga merged 3 commits intomainfrom
adrianl/Cleanup_PluginEp_PrePackV2

Conversation

@adrianlizarraga
Copy link
Copy Markdown
Contributor

Description

Consolidate OpKernel::UseSharedPrePackedBuffers and OpKernel::UseSharedPrePackedBuffers_V2 into a single virtual method, resolving the TODO in op_kernel.h.

Background

The OpKernel class previously had two virtual methods for consuming shared pre-packed weight buffers:

  • UseSharedPrePackedBuffers (V1) — 3 params: prepacked_buffers, input_idx, used_shared_buffers
  • UseSharedPrePackedBuffers_V2 — 4 params: added prepacked_buffer_sizes (a gsl::span<const size_t>)

V2 was introduced to pass buffer sizes alongside the buffers. Its default implementation forwarded to V1 for backward compatibility. The framework (session_state.cc) only ever called V2.

Changes

Merged both methods into a single UseSharedPrePackedBuffers using the V2 signature:

virtual Status UseSharedPrePackedBuffers(std::vector<BufferUniquePtr>& prepacked_buffers,
                                         gsl::span<const size_t> prepacked_buffer_sizes,
                                         int input_idx,
                                         /*out*/ bool& used_shared_buffers);

Updated 27 files across the codebase:

Category Files Change
Base class op_kernel.h Removed V1 + V2; single 4-param method
Framework session_state.cc Renamed _V2 call
Plugin EP bridge ep_kernel_registration.cc Renamed override
QMoECPU moe_quantization_cpu.h/.cc Renamed V2 override + template instantiations
CPU provider (8 kernels) gemm, matmul, conv_transpose, fp16_conv, qlinearconv, matmul_integer_base, deep_cpu_lstm, deep_cpu_gru Added prepacked_buffer_sizes param
ACL provider (2 kernels) acl/conv, acl/matmul Added param
Contrib ops (4 kernels) matmul_nbits, dynamic_quantize_lstm, attention_quant, bert/attention Added param
Tests session_state_test.cc Updated test kernel override

Notes

  • Existing V1 overrides add the new prepacked_buffer_sizes parameter as unnamed/unused (/*prepacked_buffer_sizes*/) — no logic changes in those kernels.
  • The C API (SetSharedPrePackedWeight in onnxruntime_ep_c_api.h) already passes buffer sizes, so no C API changes were needed.
  • Private helper functions (e.g., UseSharedPrePackedBuffersImpl in LSTM/GRU) are not virtual overrides and were not modified.

Motivation and Context

Addresses the TODO at include/onnxruntime/core/framework/op_kernel.h:139:

TODO: Consolidate UseSharedPrePackedBuffers and UseSharedPrePackedBuffers_V2 into a single function, which will require updating kernel-based provider-bridge EPs (cpu, cuda, webgpu).

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR consolidates the OpKernel shared pre-packed weight consumption API by removing the _V2 variant and standardizing on a single UseSharedPrePackedBuffers virtual method that includes buffer sizes, aligning the base class and all overrides with how the framework already invokes the hook.

Changes:

  • Replace UseSharedPrePackedBuffers_V2 with a single UseSharedPrePackedBuffers virtual method using the (buffers, sizes, input_idx, used) signature.
  • Update framework call site and update kernel overrides across CPU/ACL/contrib ops and plugin EP bridge to match the new signature.
  • Adjust unit test kernel override accordingly.

Reviewed changes

Copilot reviewed 27 out of 27 changed files in this pull request and generated no comments.

Show a summary per file
File Description
include/onnxruntime/core/framework/op_kernel.h Consolidates the virtual method to a single 4-parameter signature and updates inline docs/comments.
onnxruntime/core/framework/session_state.cc Renames framework call from _V2 to the consolidated UseSharedPrePackedBuffers.
onnxruntime/core/session/plugin_ep/ep_kernel_registration.cc Updates Plugin EP kernel override to the consolidated method name/signature.
onnxruntime/test/framework/session_state_test.cc Updates test kernel override signature to include prepacked_buffer_sizes.
onnxruntime/core/providers/cpu/rnn/deep_cpu_lstm.h Updates LSTM kernel override declaration to include prepacked_buffer_sizes.
onnxruntime/core/providers/cpu/rnn/deep_cpu_lstm.cc Updates LSTM kernel override definition to include prepacked_buffer_sizes.
onnxruntime/core/providers/cpu/rnn/deep_cpu_gru.h Updates GRU kernel override declaration to include prepacked_buffer_sizes.
onnxruntime/core/providers/cpu/rnn/deep_cpu_gru.cc Updates GRU kernel override definition to include prepacked_buffer_sizes.
onnxruntime/core/providers/cpu/quantization/qlinearconv.cc Updates QLinearConv kernel override declaration/definition to include prepacked_buffer_sizes.
onnxruntime/core/providers/cpu/quantization/matmul_integer_base.h Updates MatMulIntegerBase inline override to include prepacked_buffer_sizes.
onnxruntime/core/providers/cpu/nn/conv_transpose.h Updates ConvTranspose override declaration to include prepacked_buffer_sizes.
onnxruntime/core/providers/cpu/nn/conv_transpose.cc Updates ConvTranspose override definitions to include prepacked_buffer_sizes.
onnxruntime/core/providers/cpu/math/matmul.h Updates MatMul override declaration to include prepacked_buffer_sizes.
onnxruntime/core/providers/cpu/math/matmul.cc Updates MatMul override definition to include prepacked_buffer_sizes.
onnxruntime/core/providers/cpu/math/gemm.h Updates Gemm override declaration to include prepacked_buffer_sizes.
onnxruntime/core/providers/cpu/math/gemm.cc Updates Gemm override definitions to include prepacked_buffer_sizes.
onnxruntime/core/providers/cpu/fp16/fp16_conv.cc Updates FusedConvFp16 override declaration/definition to include prepacked_buffer_sizes.
onnxruntime/core/providers/acl/nn/conv.h Updates ACL Conv override declaration to include prepacked_buffer_sizes.
onnxruntime/core/providers/acl/nn/conv.cc Updates ACL Conv override definition to include prepacked_buffer_sizes.
onnxruntime/core/providers/acl/math/matmul.h Updates ACL MatMul override declaration to include prepacked_buffer_sizes.
onnxruntime/core/providers/acl/math/matmul.cc Updates ACL MatMul override definition to include prepacked_buffer_sizes.
onnxruntime/contrib_ops/cpu/quantization/matmul_nbits.cc Updates MatMulNBits override declaration/definition to include prepacked_buffer_sizes.
onnxruntime/contrib_ops/cpu/quantization/dynamic_quantize_lstm.cc Updates DynamicQuantizeLSTM override declaration/definition to include prepacked_buffer_sizes.
onnxruntime/contrib_ops/cpu/quantization/attention_quant.cc Updates QAttention override declaration/definition to include prepacked_buffer_sizes.
onnxruntime/contrib_ops/cpu/moe/moe_quantization_cpu.h Renames QMoECPU override from _V2 to consolidated UseSharedPrePackedBuffers.
onnxruntime/contrib_ops/cpu/moe/moe_quantization_cpu.cc Renames QMoECPU override/explicit instantiations to consolidated UseSharedPrePackedBuffers.
onnxruntime/contrib_ops/cpu/bert/attention.cc Updates contrib Attention override declaration/definition to include prepacked_buffer_sizes.
Comments suppressed due to low confidence (1)

onnxruntime/core/session/plugin_ep/ep_kernel_registration.cc:149

  • In PluginEpOpKernel::UseSharedPrePackedBuffers, buffer_unique_ptrs and buffer_sizes are assumed to have the same length, but this isn't validated before passing buffer_sizes.data() to the plugin SetSharedPrePackedWeight callback. A buggy/malicious plugin could provide mismatched lengths (or a span not matching buffer_unique_ptrs.size()), leading to out-of-bounds reads in the plugin. Add an explicit check (and return a failure Status) if buffer_sizes.size() != buffer_unique_ptrs.size() before building buffer_data_ptrs/calling into the plugin.
  Status UseSharedPrePackedBuffers(std::vector<BufferUniquePtr>& buffer_unique_ptrs,
                                   gsl::span<const size_t> buffer_sizes,
                                   int input_idx, /*out*/ bool& used_shared_buffers) override {
    assert(kernel_impl_ != nullptr);  // Should be ensured by PluginEpOpKernel::Create().

    if (kernel_impl_->ort_version_supported < 24 || kernel_impl_->SetSharedPrePackedWeight == nullptr) {
      // OrtKernelImpl does not define an implementation. The session state, which calls this function,
      // generates an error if necessary (i.e., kernel indicated it wanted to share weights but did not define this).
      used_shared_buffers = false;
      return Status::OK();
    }

    std::vector<const void*> buffer_data_ptrs;

    buffer_data_ptrs.reserve(buffer_unique_ptrs.size());
    std::transform(buffer_unique_ptrs.begin(), buffer_unique_ptrs.end(), std::back_inserter(buffer_data_ptrs),
                   [](const BufferUniquePtr& buff) -> const void* { return buff.get(); });

    ORT_RETURN_IF_ERROR(ToStatusAndRelease(
        kernel_impl_->SetSharedPrePackedWeight(kernel_impl_, buffer_data_ptrs.data(), buffer_sizes.data(),
                                               buffer_data_ptrs.size(), input_idx)));

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@adrianlizarraga adrianlizarraga marked this pull request as ready for review April 1, 2026 05:04
@adrianlizarraga adrianlizarraga requested a review from edgchen1 April 1, 2026 17:36
Copy link
Copy Markdown
Contributor

@edgchen1 edgchen1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for cleaning this up!

@adrianlizarraga adrianlizarraga merged commit a6592fc into main Apr 3, 2026
106 of 111 checks passed
@adrianlizarraga adrianlizarraga deleted the adrianl/Cleanup_PluginEp_PrePackV2 branch April 3, 2026 17:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants