-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why does Int8 quantization occupy more GPU graphics memory than float16, TensorRT quantization ? #69
Comments
我是通过Ttiny-tensorrt来做量化 |
it's expected, the process of int8 quantization require FP32 inference to compute the scale. |
I can't understand what you mean. I used int8 quantization and calibration set, and the inference result is also correct. It's large, but the GPU memory usage is larger than float16. |
@zerollzeng Thank you very much! |
Hello, could you please provide the gpu usage and inference speed, with int8 and FP16? |
thank you! origin pt model: gpu usage 5099MB, inference time 1.7s; tiny-tensorrt float16: gpu usage 3993 MB, inference time 0.4s; tiny-tensorrt int 8 : gpu usage 4509 MB, inference time 0.4s; all result is ok. |
How about building the engine first and then load the engine, I think it can save some memory. Anyway I'll try to improve this. |
./tinyexec --onnx /data/sdb/manager/RX0249_liming/coronary_model/onnx_model/unet.onnx --mode 2 --batch_size 1 --save_engine /data/sdb/manager/RX0249_liming/coronary_model/tiny_trt_model/float16_int8_calib/unet.trt --int8 --calibrate_data /data/sdb/manager/RX0249_liming/calib_data/tinyrt_data/ thank you! |
please descript your problem in English if possible. it will to helpful to more people
Describe the bug
A clear and concise description of what the bug is.
To Reproduce
Steps to reproduce the behavior:
1.
2.
Screenshots
If applicable, add screenshots to help explain your problem.
System environment (please complete the following information):
Cmake output
Running output
The text was updated successfully, but these errors were encountered: