Why does Int8 quantization occupy more GPU graphics memory than float16, TensorRT quantization ? #69

nameli0722 · 2023-05-29T07:54:38Z

please descript your problem in English if possible. it will to helpful to more people
Describe the bug
A clear and concise description of what the bug is.

To Reproduce
Steps to reproduce the behavior:
1.
2.

Screenshots
If applicable, add screenshots to help explain your problem.

System environment (please complete the following information):

Device:
OS:
Driver version:
CUDA version:
TensorRT version:
Others:

Cmake output

Running output

nameli0722 · 2023-05-30T04:21:22Z

我是通过Ttiny-tensorrt来做量化

zerollzeng · 2023-05-30T15:48:50Z

it's expected, the process of int8 quantization require FP32 inference to compute the scale.

nameli0722 · 2023-05-31T01:28:54Z

I can't understand what you mean. I used int8 quantization and calibration set, and the inference result is also correct. It's large, but the GPU memory usage is larger than float16.
我不能理解您的意思，我使用了int8量化，用了校准集，推理结果也是对的，大，但是gpu显存占用比float16还大

nameli0722 · 2023-05-31T01:29:23Z

@zerollzeng Thank you very much!

QiangZhangCV · 2023-06-01T05:38:10Z

Hello, could you please provide the gpu usage and inference speed， with int8 and FP16?

nameli0722 · 2023-06-01T10:54:02Z

Hello, could you please provide the gpu usage and inference speed， with int8 and FP16?

thank you!

origin pt model: gpu usage 5099MB, inference time 1.7s;

tiny-tensorrt float16: gpu usage 3993 MB, inference time 0.4s;

tiny-tensorrt int 8 : gpu usage 4509 MB, inference time 0.4s;

all result is ok.

zerollzeng · 2023-06-01T14:55:24Z

How about building the engine first and then load the engine, I think it can save some memory.

Anyway I'll try to improve this.

nameli0722 · 2023-06-02T01:13:51Z

How about building the engine first and then load the engine, I think it can save some memory.

Anyway I'll try to improve this.

./tinyexec --onnx /data/sdb/manager/RX0249_liming/coronary_model/onnx_model/unet.onnx --mode 2 --batch_size 1 --save_engine /data/sdb/manager/RX0249_liming/coronary_model/tiny_trt_model/float16_int8_calib/unet.trt --int8 --calibrate_data /data/sdb/manager/RX0249_liming/calib_data/tinyrt_data/

thank you!

nameli0722 assigned zerollzeng May 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why does Int8 quantization occupy more GPU graphics memory than float16, TensorRT quantization ? #69

Why does Int8 quantization occupy more GPU graphics memory than float16, TensorRT quantization ? #69

nameli0722 commented May 29, 2023

nameli0722 commented May 30, 2023

zerollzeng commented May 30, 2023

nameli0722 commented May 31, 2023

nameli0722 commented May 31, 2023

QiangZhangCV commented Jun 1, 2023

nameli0722 commented Jun 1, 2023

zerollzeng commented Jun 1, 2023

nameli0722 commented Jun 2, 2023

Why does Int8 quantization occupy more GPU graphics memory than float16, TensorRT quantization ? #69

Why does Int8 quantization occupy more GPU graphics memory than float16, TensorRT quantization ? #69

Comments

nameli0722 commented May 29, 2023

nameli0722 commented May 30, 2023

zerollzeng commented May 30, 2023

nameli0722 commented May 31, 2023

nameli0722 commented May 31, 2023

QiangZhangCV commented Jun 1, 2023

nameli0722 commented Jun 1, 2023

zerollzeng commented Jun 1, 2023

nameli0722 commented Jun 2, 2023