Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

多输入模型转换时添加--optimizePrefer 2选项 并基于此使用MNNPythonOfflineQuant离线量化,在arm cpu后端推理输出异常,与x86 cpu推理不对齐 #2895

Closed
LXWLZU opened this issue Jun 3, 2024 · 2 comments
Labels

Comments

@LXWLZU
Copy link

LXWLZU commented Jun 3, 2024

问题描述:

多输入模型转换时添加--optimizePrefer 2选项,Convert MatMul Convolution use shared const B inputs,
提升了推理速度,但模型增大,我期望通过量化减小模型大小以降低内存占用
转换命令如下:
./MNNConvert -f ONNX --modelFile small.onnx --MNNModel small_opt2.mnn --bizCode biz --optimizePrefer 2 --forTraining
日志:
The device support i8sdot:0, support fp16:0, support i8mm: 0
Start to Convert Other Model Format To MNN Model..., target version: 2.9
[14:33:55] /home/lixw/MNN/tools/converter/source/onnx/onnxConverter.cpp:46: ONNX Model ir version: 4
[14:33:55] /home/lixw/MNN/tools/converter/source/onnx/onnxConverter.cpp:47: ONNX Model opset version: 9
Start to Optimize the MNN Net...
[14:33:55] /home/lixw/MNN/tools/converter/source/optimizer/PostConverter.cpp:225: convert model for training, reserve BatchNorm and Dropout
Convert MatMul Convolution use shared const B inputs, may increase the model size
[14:33:55] /home/lixw/MNN/tools/converter/source/optimizer/PostConverter.cpp:225: convert model for training, reserve BatchNorm and Dropout
inputTensors : [ cahce_c0, cahce_spec, feat_spec, cahce_erb, feat_erb, h0, h2, h1, spec, ]
outputTensors: [ cahce_c0o, cahce_erbo, cahce_speco, df_coefs, ho0, ho1, ho2, speco, ]
Converted Success!
基于该步骤转化后的模型进行多输入模型的量化,量化后的模型在x86 cpu 和 arm cpu 后端推理结果不对齐,而且有的输出值为nan,
arm cpu后端推理似乎在处理使用共享常量B的卷积以及反量化过程中存在问题?(x86cpu的应该是正常的)

单独测试,
(1)仅转换时添加--optimizePrefer 2选项以转换MatMul Convolution使用共享常量B,不量化,x86 cpu与arm cpu可基本对齐
(2)不添加--optimizePrefer 2选项,直接量化,x86 cpu与arm cpu也能基本对齐
因此,只有当二者同时执行,即--optimizePrefer 2 加 量化,会出现arm cpu 推理错误的情况?这是否是bug呢?

我提供了简化版模型以及复现该现象的记录
google https://drive.google.com/file/d/1Z1Fy9ClfhhZcrRJXsrPbRkwmK22wtnAE/view?usp=sharing
baidu链接:https://pan.baidu.com/s/1X4h32GqvkY1NgkiZuPxZ5g?pwd=umxf
提取码:umxf

平台:

x86 CPU 及 ARMv8 CPU

Github版本:

2.9.0

编译方式:

ARM CPU MNN 2.9.0编译
set(CMAKE_SYSTEM_NAME Linux)
set(CMAKE_SYSTEM_PROCESSOR aarch64)
set(CMAKE_TRY_COMPILE_TARGET_TYPE "STATIC_LIBRARY")
set(CMAKE_BUILD_TYPE Release CACHE STRING "build release" FORCE)
set(CMAKE_INSTALL_PREFIX package CACHE STRING "install path" FORCE)
set(MNN_ARM82 OFF CACHE STRING "build arm82" FORCE)
set(MNN_FORBID_MULTI_THREAD OFF CACHE STRING "build single thread" FORCE)
set(MNN_USE_THREAD_POOL ON CACHE STRING "build thread pool" FORCE)
set(MNN_SUPPORT_BF16 OFF CACHE STRING "build bf16" FORCE)
set(MNN_BUILD_SHARED_LIBS OFF CACHE STRING "build static" FORCE)
set(MNN_SUPPORT_TFLITE_QUAN OFF CACHE STRING "" FORCE)
set(MNN_SEP_BUILD OFF CACHE STRING "build sep" FORCE)
set(MNN_USE_SSE OFF CACHE STRING "use sse" FORCE)
set(MNN_USE_LOGCAT OFF CACHE STRING "sue log cat" FORCE)

x86 CPU MNN 2.9.0编译
cmake .. -DMNN_BUILD_SHARED_LIBS=OFF -DMNN_BUILD_TOOLS=OFF && make -j4

编译日志:

ARM CPU MNN 2.9.0编译日志,编译成功
-- The C compiler identification is GNU 7.5.0
-- The CXX compiler identification is GNU 7.5.0
-- The ASM compiler identification is GNU
-- Found assembler: /usr/bin/cc
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Use Threadpool, forbid openmp
-- >>>>>>>>>>>>>
-- MNN BUILD INFO:
-- System: Linux
-- Processor: aarch64
-- Version: 2.9.0
-- Metal: OFF
-- OpenCL: OFF
-- OpenGL: OFF
-- Vulkan: OFF
-- ARM82: OFF
-- oneDNN: OFF
-- TensorRT: OFF
-- CoreML: OFF
-- NNAPI: OFF
-- CUDA: OFF
-- OpenMP: OFF
-- BF16: OFF
-- ThreadPool: ON
-- Hidden: TRUE
-- Build Path: /sdcard/lixiangwei/MNN/build_static
-- CUDA PROFILE: OFF
-- Enabling AArch64 Assemblies
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
-- Configuring done
-- Generating done
-- Build files have been written to: /sdcard/lixiangwei/MNN/build_static

x86 CPU MNN 2.9.0编译日志,编译成功
-- Use Threadpool, forbid openmp
-- >>>>>>>>>>>>>
-- MNN BUILD INFO:
-- System: Linux
-- Processor: x86_64
-- Version: 2.9.0
-- Metal: OFF
-- OpenCL: OFF
-- OpenGL: OFF
-- Vulkan: OFF
-- ARM82: OFF
-- oneDNN: OFF
-- TensorRT: OFF
-- CoreML: OFF
-- NNAPI: OFF
-- CUDA: OFF
-- OpenMP: OFF
-- BF16: OFF
-- ThreadPool: ON
-- Hidden: TRUE
-- Build Path: /home/lixw/MNN/build_static
-- CUDA PROFILE: OFF
-- WIN_USE_ASM:
-- x86_64: Open SSE
-- MNN_AVX512:OFF
-- Configuring done
-- Generating done
-- Build files have been written to: /home/lixw/MNN/build_static

@LXWLZU
Copy link
Author

LXWLZU commented Jun 4, 2024

补充一下,通过打印中间operator的输入输出,发现在x86 cpu上 onnx::Split_317__matmul_converted 这个op的输出也是nan,只不过模型最终输出里没有nan,这说明 (Convert MatMul Convolution use shared const B inputs)+ 量化 这种方式存在问题?或者是相关算子在x86 cpu后端推理时也存在问题?

Copy link

github-actions bot commented Aug 3, 2024

Marking as stale. No activity in 60 days.

@github-actions github-actions bot added the stale label Aug 3, 2024
@github-actions github-actions bot closed this as completed Aug 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant