Cleanup: Consolidate `OpKernel::UseSharePrePackedBuffers_V2` and `OpKernel::UseSharePrePackedBuffers` by adrianlizarraga · Pull Request #27924 · microsoft/onnxruntime

adrianlizarraga · 2026-04-01T04:46:36Z

Description

Consolidate OpKernel::UseSharedPrePackedBuffers and OpKernel::UseSharedPrePackedBuffers_V2 into a single virtual method, resolving the TODO in op_kernel.h.

Background

The OpKernel class previously had two virtual methods for consuming shared pre-packed weight buffers:

UseSharedPrePackedBuffers (V1) — 3 params: prepacked_buffers, input_idx, used_shared_buffers
UseSharedPrePackedBuffers_V2 — 4 params: added prepacked_buffer_sizes (a gsl::span<const size_t>)

V2 was introduced to pass buffer sizes alongside the buffers. Its default implementation forwarded to V1 for backward compatibility. The framework (session_state.cc) only ever called V2.

Changes

Merged both methods into a single UseSharedPrePackedBuffers using the V2 signature:

virtual Status UseSharedPrePackedBuffers(std::vector<BufferUniquePtr>& prepacked_buffers,
                                         gsl::span<const size_t> prepacked_buffer_sizes,
                                         int input_idx,
                                         /*out*/ bool& used_shared_buffers);

Updated 27 files across the codebase:

Category	Files	Change
Base class	`op_kernel.h`	Removed V1 + V2; single 4-param method
Framework	`session_state.cc`	Renamed `_V2` call
Plugin EP bridge	`ep_kernel_registration.cc`	Renamed override
QMoECPU	`moe_quantization_cpu.h/.cc`	Renamed V2 override + template instantiations
CPU provider (8 kernels)	`gemm`, `matmul`, `conv_transpose`, `fp16_conv`, `qlinearconv`, `matmul_integer_base`, `deep_cpu_lstm`, `deep_cpu_gru`	Added `prepacked_buffer_sizes` param
ACL provider (2 kernels)	`acl/conv`, `acl/matmul`	Added param
Contrib ops (4 kernels)	`matmul_nbits`, `dynamic_quantize_lstm`, `attention_quant`, `bert/attention`	Added param
Tests	`session_state_test.cc`	Updated test kernel override

Notes

Existing V1 overrides add the new prepacked_buffer_sizes parameter as unnamed/unused (/*prepacked_buffer_sizes*/) — no logic changes in those kernels.
The C API (SetSharedPrePackedWeight in onnxruntime_ep_c_api.h) already passes buffer sizes, so no C API changes were needed.
Private helper functions (e.g., UseSharedPrePackedBuffersImpl in LSTM/GRU) are not virtual overrides and were not modified.

Motivation and Context

Addresses the TODO at include/onnxruntime/core/framework/op_kernel.h:139:

TODO: Consolidate UseSharedPrePackedBuffers and UseSharedPrePackedBuffers_V2 into a single function, which will require updating kernel-based provider-bridge EPs (cpu, cuda, webgpu).

…2 for OpKernels

Copilot

Pull request overview

This PR consolidates the OpKernel shared pre-packed weight consumption API by removing the _V2 variant and standardizing on a single UseSharedPrePackedBuffers virtual method that includes buffer sizes, aligning the base class and all overrides with how the framework already invokes the hook.

Changes:

Replace UseSharedPrePackedBuffers_V2 with a single UseSharedPrePackedBuffers virtual method using the (buffers, sizes, input_idx, used) signature.
Update framework call site and update kernel overrides across CPU/ACL/contrib ops and plugin EP bridge to match the new signature.
Adjust unit test kernel override accordingly.

Reviewed changes

Copilot reviewed 27 out of 27 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
include/onnxruntime/core/framework/op_kernel.h	Consolidates the virtual method to a single 4-parameter signature and updates inline docs/comments.
onnxruntime/core/framework/session_state.cc	Renames framework call from `_V2` to the consolidated `UseSharedPrePackedBuffers`.
onnxruntime/core/session/plugin_ep/ep_kernel_registration.cc	Updates Plugin EP kernel override to the consolidated method name/signature.
onnxruntime/test/framework/session_state_test.cc	Updates test kernel override signature to include `prepacked_buffer_sizes`.
onnxruntime/core/providers/cpu/rnn/deep_cpu_lstm.h	Updates LSTM kernel override declaration to include `prepacked_buffer_sizes`.
onnxruntime/core/providers/cpu/rnn/deep_cpu_lstm.cc	Updates LSTM kernel override definition to include `prepacked_buffer_sizes`.
onnxruntime/core/providers/cpu/rnn/deep_cpu_gru.h	Updates GRU kernel override declaration to include `prepacked_buffer_sizes`.
onnxruntime/core/providers/cpu/rnn/deep_cpu_gru.cc	Updates GRU kernel override definition to include `prepacked_buffer_sizes`.
onnxruntime/core/providers/cpu/quantization/qlinearconv.cc	Updates QLinearConv kernel override declaration/definition to include `prepacked_buffer_sizes`.
onnxruntime/core/providers/cpu/quantization/matmul_integer_base.h	Updates MatMulIntegerBase inline override to include `prepacked_buffer_sizes`.
onnxruntime/core/providers/cpu/nn/conv_transpose.h	Updates ConvTranspose override declaration to include `prepacked_buffer_sizes`.
onnxruntime/core/providers/cpu/nn/conv_transpose.cc	Updates ConvTranspose override definitions to include `prepacked_buffer_sizes`.
onnxruntime/core/providers/cpu/math/matmul.h	Updates MatMul override declaration to include `prepacked_buffer_sizes`.
onnxruntime/core/providers/cpu/math/matmul.cc	Updates MatMul override definition to include `prepacked_buffer_sizes`.
onnxruntime/core/providers/cpu/math/gemm.h	Updates Gemm override declaration to include `prepacked_buffer_sizes`.
onnxruntime/core/providers/cpu/math/gemm.cc	Updates Gemm override definitions to include `prepacked_buffer_sizes`.
onnxruntime/core/providers/cpu/fp16/fp16_conv.cc	Updates FusedConvFp16 override declaration/definition to include `prepacked_buffer_sizes`.
onnxruntime/core/providers/acl/nn/conv.h	Updates ACL Conv override declaration to include `prepacked_buffer_sizes`.
onnxruntime/core/providers/acl/nn/conv.cc	Updates ACL Conv override definition to include `prepacked_buffer_sizes`.
onnxruntime/core/providers/acl/math/matmul.h	Updates ACL MatMul override declaration to include `prepacked_buffer_sizes`.
onnxruntime/core/providers/acl/math/matmul.cc	Updates ACL MatMul override definition to include `prepacked_buffer_sizes`.
onnxruntime/contrib_ops/cpu/quantization/matmul_nbits.cc	Updates MatMulNBits override declaration/definition to include `prepacked_buffer_sizes`.
onnxruntime/contrib_ops/cpu/quantization/dynamic_quantize_lstm.cc	Updates DynamicQuantizeLSTM override declaration/definition to include `prepacked_buffer_sizes`.
onnxruntime/contrib_ops/cpu/quantization/attention_quant.cc	Updates QAttention override declaration/definition to include `prepacked_buffer_sizes`.
onnxruntime/contrib_ops/cpu/moe/moe_quantization_cpu.h	Renames QMoECPU override from `_V2` to consolidated `UseSharedPrePackedBuffers`.
onnxruntime/contrib_ops/cpu/moe/moe_quantization_cpu.cc	Renames QMoECPU override/explicit instantiations to consolidated `UseSharedPrePackedBuffers`.
onnxruntime/contrib_ops/cpu/bert/attention.cc	Updates contrib Attention override declaration/definition to include `prepacked_buffer_sizes`.

Comments suppressed due to low confidence (1)

onnxruntime/core/session/plugin_ep/ep_kernel_registration.cc:149

In PluginEpOpKernel::UseSharedPrePackedBuffers, buffer_unique_ptrs and buffer_sizes are assumed to have the same length, but this isn't validated before passing buffer_sizes.data() to the plugin SetSharedPrePackedWeight callback. A buggy/malicious plugin could provide mismatched lengths (or a span not matching buffer_unique_ptrs.size()), leading to out-of-bounds reads in the plugin. Add an explicit check (and return a failure Status) if buffer_sizes.size() != buffer_unique_ptrs.size() before building buffer_data_ptrs/calling into the plugin.

  Status UseSharedPrePackedBuffers(std::vector<BufferUniquePtr>& buffer_unique_ptrs,
                                   gsl::span<const size_t> buffer_sizes,
                                   int input_idx, /*out*/ bool& used_shared_buffers) override {
    assert(kernel_impl_ != nullptr);  // Should be ensured by PluginEpOpKernel::Create().

    if (kernel_impl_->ort_version_supported < 24 || kernel_impl_->SetSharedPrePackedWeight == nullptr) {
      // OrtKernelImpl does not define an implementation. The session state, which calls this function,
      // generates an error if necessary (i.e., kernel indicated it wanted to share weights but did not define this).
      used_shared_buffers = false;
      return Status::OK();
    }

    std::vector<const void*> buffer_data_ptrs;

    buffer_data_ptrs.reserve(buffer_unique_ptrs.size());
    std::transform(buffer_unique_ptrs.begin(), buffer_unique_ptrs.end(), std::back_inserter(buffer_data_ptrs),
                   [](const BufferUniquePtr& buff) -> const void* { return buff.get(); });

    ORT_RETURN_IF_ERROR(ToStatusAndRelease(
        kernel_impl_->SetSharedPrePackedWeight(kernel_impl_, buffer_data_ptrs.data(), buffer_sizes.data(),
                                               buffer_data_ptrs.size(), input_idx)));

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

edgchen1

thanks for cleaning this up!

adrianlizarraga added 2 commits March 31, 2026 21:33

Consolidate UseSharedPrePackedBuffers and UseSharedPrePackedBuffers_V…

a58da0a

…2 for OpKernels

Restore original comment

3d6a926

adrianlizarraga requested a review from Copilot April 1, 2026 04:47

Copilot started reviewing on behalf of adrianlizarraga April 1, 2026 04:48 View session

Copilot AI reviewed Apr 1, 2026

View reviewed changes

adrianlizarraga marked this pull request as ready for review April 1, 2026 05:04

adrianlizarraga requested a review from edgchen1 April 1, 2026 17:36

edgchen1 approved these changes Apr 1, 2026

View reviewed changes

Merge branch 'main' into adrianl/Cleanup_PluginEp_PrePackV2

5fb16e1

adrianlizarraga merged commit a6592fc into main Apr 3, 2026
106 of 111 checks passed

adrianlizarraga deleted the adrianl/Cleanup_PluginEp_PrePackV2 branch April 3, 2026 17:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cleanup: Consolidate `OpKernel::UseSharePrePackedBuffers_V2` and `OpKernel::UseSharePrePackedBuffers`#27924

Cleanup: Consolidate `OpKernel::UseSharePrePackedBuffers_V2` and `OpKernel::UseSharePrePackedBuffers`#27924
adrianlizarraga merged 3 commits intomainfrom
adrianl/Cleanup_PluginEp_PrePackV2

adrianlizarraga commented Apr 1, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

edgchen1 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

adrianlizarraga commented Apr 1, 2026

Description

Background

Changes

Notes

Motivation and Context

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

edgchen1 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants