Picodet INT8 is slower than FP32 when inference with MKLDNN #44075

yeliang2258 · 2022-07-04T12:56:06Z

bug描述 Describe the Bug

Picodet INT8 is slower than FP32 when inference with MKLDNN.

CPU：Intel(R) Xeon(R) Gold 6271C CPU @ 2.60GHz
thread num：8
FP32：3.09 s
INT8：3.13 s

My model and script :
picobug.tar.gz

其他补充信息 Additional Supplementary Information

No response

paddle-bot-old · 2022-07-04T12:56:09Z

您好，我们已经收到了您的问题，会安排技术人员尽快解答您的问题，请耐心等待。请您再次检查是否提供了清晰的问题描述、复现代码、环境&版本、报错信息等。同时，您也可以通过查看官网API文档、常见问题、历史Issue、AI社区来寻求解答。祝您生活愉快～

Hi! We've received your issue and please be patient to get responded. We will arrange technicians to answer your questions as soon as possible. Please make sure that you have posted enough message to demo your request. You may also check out the API，FAQ，Github Issue and AI community to get the answer.Have a nice day!

wozna · 2022-09-02T08:11:29Z

Hi @yeliang2258 I working on improving performance for INT8 model. But as I mentioned before it is very difficult case where we have very small filters in convolution that is why avx512_vnni int8 will not give as so much speed up. So the performance is worse because this is an INT8 conversion overhead.
We still have few ideas to implement as:

fuse conv+depthwise_conv this will speed up fp32 and int8,
checking why elementise_mul does not quantize in one place,
fuse elementwise_add + scale

yeliang2258 · 2022-09-09T01:52:12Z

Hi @wozna,
In recent tests, it was found that the accuracy of this model is almost 0. Although the speed may not be improved, the accuracy problem still needs to be solved.
The test script is here: https://github.com/PaddlePaddle/PaddleTest/tree/develop/inference/python_api_test/test_int8_model
First run:

sh prepare.sh

Then:

python test_ppyoloe_infer.py --model_path=models/picodet_s_416_coco_npu_quant --reader_config=configs/picodet_reader.yml --precision=int8

wozna · 2022-09-12T15:32:29Z

@yeliang2258 this accuracy bug is related to new this new quantization method with quantize_linear and dequantize_linear isn't it?

yeliang2258 · 2022-09-13T02:13:34Z

@wozna No, the accuracy of the quantized model in the old format is also not correct.

wozna · 2022-09-21T18:20:07Z

This PR should fix this issue #46378. The problem was that even if we have uint8 output we used int8 data type which was associated with a loss of accuracy.

yeliang2258 added status/new-issue 新建 type/bug-report 报bug labels Jul 4, 2022

paddle-bot-old bot assigned From00 Jul 4, 2022

yeliang2258 assigned yaomichael and lidanqing-intel and unassigned From00 Jul 4, 2022

jakpiase added Intel int8 labels Jul 5, 2022

wozna self-assigned this Jul 7, 2022

wozna mentioned this issue Sep 21, 2022

Add unsigned int8 scale propagation #46378

Merged

yeliang2258 closed this as completed Oct 18, 2022

paddle-bot bot added status/close 已关闭 and removed status/new-issue 新建 labels Oct 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Picodet INT8 is slower than FP32 when inference with MKLDNN #44075

Picodet INT8 is slower than FP32 when inference with MKLDNN #44075

yeliang2258 commented Jul 4, 2022 •

edited

Loading

paddle-bot-old bot commented Jul 4, 2022

wozna commented Sep 2, 2022

yeliang2258 commented Sep 9, 2022 •

edited

Loading

wozna commented Sep 12, 2022

yeliang2258 commented Sep 13, 2022

wozna commented Sep 21, 2022

Picodet INT8 is slower than FP32 when inference with MKLDNN #44075

Picodet INT8 is slower than FP32 when inference with MKLDNN #44075

Comments

yeliang2258 commented Jul 4, 2022 • edited Loading

bug描述 Describe the Bug

其他补充信息 Additional Supplementary Information

paddle-bot-old bot commented Jul 4, 2022

wozna commented Sep 2, 2022

yeliang2258 commented Sep 9, 2022 • edited Loading

wozna commented Sep 12, 2022

yeliang2258 commented Sep 13, 2022

wozna commented Sep 21, 2022

yeliang2258 commented Jul 4, 2022 •

edited

Loading

yeliang2258 commented Sep 9, 2022 •

edited

Loading