Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tensor Backing Buffer Mismatch Detected in Buffer Reuse #23739

Open
linzj opened this issue Feb 18, 2025 · 2 comments
Open

Tensor Backing Buffer Mismatch Detected in Buffer Reuse #23739

linzj opened this issue Feb 18, 2025 · 2 comments
Labels
ep:WebGPU ort-web webgpu provider platform:mobile issues related to ONNX Runtime mobile; typically submitted using template platform:web issues related to ONNX Runtime web; typically submitted using template

Comments

@linzj
Copy link

linzj commented Feb 18, 2025

Describe the issue

A reuse of a tensor backing buffer mismatch has been detected between the re-use target tensor with shape (1,320,640,3) and an output tensor that tends to reuse the buffer in shape (1,720,1280,3). This issue occurs when using a model with a dynamic input tensor in NCHW format where the input is (1,320,640,3) and the corresponding dimension parameters are empty.

Expected Behavior:

Tensor buffer reuse should only occur when the dimensions of the source and target tensors match correctly. In the case of dynamic dimensions with empty parameters, the algorithm should perform a more cautious check to avoid mismatches.

Current Workaround Patch:

The issue can be temporarily resolved with the following patch to disable the incorrect dimension parameter check, forcing the allocation planner to reject mismatches:

diff --git a/onnxruntime/core/framework/allocation_planner.cc b/onnxruntime/core/framework/allocation_planner.cc
index ecd3960107..c9001ed4be 100644
--- a/onnxruntime/core/framework/allocation_planner.cc
+++ b/onnxruntime/core/framework/allocation_planner.cc
@@ -481,11 +481,13 @@ class PlannerImpl {
       if (utils::HasDimValue(val1) && utils::HasDimValue(val2) &&
           (val1.dim_value() == val2.dim_value()))
         continue;  // same known dimension
+#if 0
       if (utils::HasDimParam(val1) && utils::HasDimParam(val2)) {
         const auto& val1_param = val1.dim_param();
         if (val1_param == val2.dim_param() && !val1_param.empty())
           continue;  // same unknown dimension
       }
+#endif
       return false;
     }
     return true;

Impact:

This issue may lead to unintended behavior or even instability since tensor buffers are being reused inappropriately when their shapes do not match. A permanent fix should enforce proper validation of tensor dimensions during buffer reuse to ensure robustness in dynamic input scenarios.

Additional Information:

  • Model Input Format: Dynamic NCHW tensor, e.g., (1,320,640,3)
  • Occurrence: When dimension parameters are empty, causing the dimension comparison to be bypassed.
  • Workaround: Patch provided above temporarily disables the faulty check.

Any assistance or further investigation into this issue is appreciated.

Please let me know if additional information is needed.

Thank you.

To reproduce

  1. Use a model that accepts a dynamic input tensor in NCHW format.
  2. Provide an input tensor with shape (1,320,640,3).
  3. Observe that the allocation planner incorrectly reuses a tensor buffer intended for a tensor of shape (1,720,1280,3).

Urgency

No response

Platform

Android

OS Version

Android 15

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

afd3e81

ONNX Runtime API

C++

Architecture

X64

Execution Provider

Other / Unknown

Execution Provider Library Version

webgpu

@github-actions github-actions bot added ep:WebGPU ort-web webgpu provider platform:mobile issues related to ONNX Runtime mobile; typically submitted using template platform:web issues related to ONNX Runtime web; typically submitted using template labels Feb 18, 2025
@yuslepukhin
Copy link
Member

This is likely an issue with the model. The same unknown dimension is supposed to have the same value at inference time at all locations. However, this is not the case apparently.

@linzj
Copy link
Author

linzj commented Feb 18, 2025

Thank you for your response.

Could you please clarify what you mean by "the same unknown dimension is supposed to have the same value at inference time at all locations"? Specifically, what does "value" refer to in this context? Are you referring to the tensor's dimension size or some other parameter? I would like to understand this issue better in order to facilitate a more effective fix and debugging process.

Thank you very much for your assistance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ep:WebGPU ort-web webgpu provider platform:mobile issues related to ONNX Runtime mobile; typically submitted using template platform:web issues related to ONNX Runtime web; typically submitted using template
Projects
None yet
Development

No branches or pull requests

2 participants