Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Picodet INT8 is slower than FP32 when inference with MKLDNN #44075

Closed
yeliang2258 opened this issue Jul 4, 2022 · 6 comments
Closed

Picodet INT8 is slower than FP32 when inference with MKLDNN #44075

yeliang2258 opened this issue Jul 4, 2022 · 6 comments

Comments

@yeliang2258
Copy link
Contributor

yeliang2258 commented Jul 4, 2022

bug描述 Describe the Bug

Picodet INT8 is slower than FP32 when inference with MKLDNN.

CPU:Intel(R) Xeon(R) Gold 6271C CPU @ 2.60GHz
thread num:8
FP32:3.09 s
INT8:3.13 s

My model and script :
picobug.tar.gz

其他补充信息 Additional Supplementary Information

No response

@paddle-bot-old
Copy link

paddle-bot-old bot commented Jul 4, 2022

您好,我们已经收到了您的问题,会安排技术人员尽快解答您的问题,请耐心等待。请您再次检查是否提供了清晰的问题描述、复现代码、环境&版本、报错信息等。同时,您也可以通过查看官网API文档常见问题历史IssueAI社区来寻求解答。祝您生活愉快~

Hi! We've received your issue and please be patient to get responded. We will arrange technicians to answer your questions as soon as possible. Please make sure that you have posted enough message to demo your request. You may also check out the APIFAQGithub Issue and AI community to get the answer.Have a nice day!

@wozna
Copy link
Contributor

wozna commented Sep 2, 2022

Hi @yeliang2258 I working on improving performance for INT8 model. But as I mentioned before it is very difficult case where we have very small filters in convolution that is why avx512_vnni int8 will not give as so much speed up. So the performance is worse because this is an INT8 conversion overhead.
We still have few ideas to implement as:

  • fuse conv+depthwise_conv this will speed up fp32 and int8,
  • checking why elementise_mul does not quantize in one place,
  • fuse elementwise_add + scale

@yeliang2258
Copy link
Contributor Author

yeliang2258 commented Sep 9, 2022

Hi @wozna,
In recent tests, it was found that the accuracy of this model is almost 0. Although the speed may not be improved, the accuracy problem still needs to be solved.
The test script is here: https://github.com/PaddlePaddle/PaddleTest/tree/develop/inference/python_api_test/test_int8_model
First run:

sh prepare.sh

Then:

python test_ppyoloe_infer.py --model_path=models/picodet_s_416_coco_npu_quant --reader_config=configs/picodet_reader.yml --precision=int8

@wozna
Copy link
Contributor

wozna commented Sep 12, 2022

@yeliang2258 this accuracy bug is related to new this new quantization method with quantize_linear and dequantize_linear isn't it?

@yeliang2258
Copy link
Contributor Author

@wozna No, the accuracy of the quantized model in the old format is also not correct.

@wozna
Copy link
Contributor

wozna commented Sep 21, 2022

This PR should fix this issue #46378. The problem was that even if we have uint8 output we used int8 data type which was associated with a loss of accuracy.

@paddle-bot paddle-bot bot added status/close 已关闭 and removed status/new-issue 新建 labels Oct 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants