Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Inference Error] The onnx inference result is inconsistent with the numpy inference result #23202

Open
songqiuyu opened this issue Dec 26, 2024 · 2 comments
Labels
quantization issues related to quantization

Comments

@songqiuyu
Copy link

Describe the issue

I want to implement the inference of onnx model in my own C code,but in some layers,the result between C and ONNX has 1 error, such as C is 40 but onnx is 41.

I want to know why numpy's result is -87 but onnx is -88 ? ?
In Quant model inference, an error of 1 is fatal!The cumulative error through many layers can reach 4-5 (in 8-bit integers)
Thank u :>

the test code below⬇

To reproduce

import onnx
from onnx import helper, TensorProto, numpy_helper
import numpy as np
import onnxruntime as ort

A = 'A'
B = 'B'
C = 'C'


A_scale = 0.008010663092136383
A_zero_point = 7
B_scale = 0.00622713053599
B_zero_point = -128
C_scale = 0.006873490754514933
C_zero_point = -128


input_A = helper.make_tensor_value_info(A, TensorProto.INT8, [1, 1, 1, 1])
input_B = helper.make_tensor_value_info(B, TensorProto.INT8, [1, 1, 1, 1])


output = helper.make_tensor_value_info(C, TensorProto.INT8, [1, 1, 1, 1])


initializer_A_scale = numpy_helper.from_array(np.array(A_scale, dtype=np.float32), name='A_scale')
initializer_A_zero_point = numpy_helper.from_array(np.array(A_zero_point, dtype=np.int8), name='A_zero_point')

initializer_B_scale = numpy_helper.from_array(np.array(B_scale, dtype=np.float32), name='B_scale')
initializer_B_zero_point = numpy_helper.from_array(np.array(B_zero_point, dtype=np.int8), name='B_zero_point')

initializer_C_scale = numpy_helper.from_array(np.array(C_scale, dtype=np.float32), name='C_scale')
initializer_C_zero_point = numpy_helper.from_array(np.array(C_zero_point, dtype=np.int8), name='C_zero_point')



qlinear_add_node = helper.make_node(
    'QLinearAdd',
    inputs=[A, 'A_scale', 'A_zero_point', B, 'B_scale', 'B_zero_point', 'C_scale', 'C_zero_point'],
    outputs=[C],
    name='QLinearAdd',
     domain='com.microsoft' 
)
opset_version_ai_onnx = 13  
opset_version_com_microsoft = 1  

graph = helper.make_graph(
    nodes=[qlinear_add_node],
    name='QLinearAdd_Graph',
    inputs=[input_A, input_B],
    outputs=[output],
    initializer=[
        initializer_A_scale,
        initializer_A_zero_point,
        initializer_B_scale,
        initializer_B_zero_point,
        initializer_C_scale,
        initializer_C_zero_point
    ]
)


model = helper.make_model(graph, producer_name='onnx-qlinearadd-fixed-params', 
                          opset_imports=[ helper.make_opsetid(domain='ai.onnx', version=opset_version_ai_onnx),
        helper.make_opsetid(domain='com.microsoft', version=opset_version_com_microsoft)])
onnx.save(model, 'qlinearadd_fixed_params_model.onnx')
print("ONNX MODEL save 'qlinearadd_fixed_params_model.onnx'")


A_int8 = np.array([-8], dtype=np.int8)
B_int8 = np.array([-64], dtype=np.int8)


A_real = A_scale * (A_int8.astype(np.int32) - A_zero_point)
B_real = B_scale * (B_int8.astype(np.int32) - B_zero_point)


C_real = A_real + B_real

A1 = A_scale *(A_int8 - A_zero_point)
B1 = B_scale*(B_int8 - B_zero_point)

print((A1+B1) / C_scale + C_zero_point )

C_int32 = np.round(C_real / C_scale) + C_zero_point
C_int8 = C_int32.astype(np.int8)
print(C_int8)
session = ort.InferenceSession('qlinearadd_fixed_params_model.onnx')


output_name = session.get_outputs()[0].name

A_data = np.array([-8], dtype=np.int8).reshape([1, 1, 1, 1])
B_data = np.array([-64], dtype=np.int8).reshape([1, 1, 1, 1])


input_dict = {
    'A': A_data,
    'B': B_data
}


outputs = session.run([output_name], input_dict)


C_output = outputs[0]
print("output C:", C_output)

Urgency

No response

Platform

Windows

OS Version

11

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

onnxruntime==1.19.2 python

ONNX Runtime API

Python

Architecture

X64

Execution Provider

Default CPU

Execution Provider Library Version

No response

@songqiuyu
Copy link
Author

the program's result is:

ONNX MODEL save 'qlinearadd_fixed_params_model.onnx' 
[-87.49999529]
[-87]
output C: [[[[-88]]]]

@snnn snnn added the quantization issues related to quantization label Dec 30, 2024
@songqiuyu
Copy link
Author

I Found in C++ float 40.500004 - int 128 = -87.500000 is not -87.500004
round(-87.500000) = -88

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
quantization issues related to quantization
Projects
None yet
Development

No branches or pull requests

2 participants