onnx model convert trt.int8 failure：fallback fp32 #3754

kakascode · 2024-03-29T08:40:47Z

Description

When I use TensorRT for int8 quantization, I always encounter the accuracy fallback to fp32. The trt.BuilderFlag.OBEY_PRECISION_CONSTRAINTS parameter does not solve the issue. What should I do?"

Environment

TensorRT Version:8.6.16

NVIDIA GPU: A100

CUDA Version:11.4

Operating System:

Python Version (if applicable):3.7

PyTorch Version (if applicable):1.12.1

bernardrb · 2024-03-29T09:41:53Z

Can you provide what trt logged during the build, and possible the build script?

kakascode · 2024-03-29T11:35:19Z

Can you provide what trt logged during the build, and possible the build script?

thanks，bro. I truncated the last part because it was too long.

[03/29/2024-17:37:32] [TRT] [V] Setting a default quantization params because quantization data is missing for {ForeignNode[onnx::Gather_401...(Unnamed Layer* 3201) [ElementWise]]}
[03/29/2024-17:37:32] [TRT] [V] Tactic: 0x0000000000000000 Time: 59.9303
[03/29/2024-17:37:32] [TRT] [V] {ForeignNode[onnx::Gather_401...(Unnamed Layer* 3201) [ElementWise]]} (Myelin[0x80000023]) profiling completed in 165.507 seconds. Fastest Tactic: 0x0000000000000000 Time: 59.9303
[03/29/2024-17:37:32] [TRT] [V] >>>>>>>>>>>>>>> Chose Runner Type: Myelin Tactic: 0x0000000000000000
[03/29/2024-17:37:32] [TRT] [V] =============== Computing reformatting costs
[03/29/2024-17:37:32] [TRT] [V] =============== Computing reformatting costs
[03/29/2024-17:37:32] [TRT] [V] =============== Computing reformatting costs
[03/29/2024-17:37:32] [TRT] [V] =============== Computing reformatting costs
[03/29/2024-17:37:32] [TRT] [V] =============== Computing reformatting costs
[03/29/2024-17:37:32] [TRT] [V] =============== Computing reformatting costs:
[03/29/2024-17:37:32] [TRT] [V] *************** Autotuning Reformat: Half(49,1) -> Float(49,1) ***************
[03/29/2024-17:37:32] [TRT] [V] --------------- Timing Runner: Optimizer Reformat( -> output) (Reformat[0x80000006])
[03/29/2024-17:37:32] [TRT] [V] Setting a default quantization params because quantization data is missing for
[03/29/2024-17:37:32] [TRT] [V] Tactic: 0x00000000000003e8 Time: 0.00406451
[03/29/2024-17:37:32] [TRT] [V] Setting a default quantization params because quantization data is missing for
[03/29/2024-17:37:32] [TRT] [V] Tactic: 0x00000000000003ea Time: 0.00652882
[03/29/2024-17:37:32] [TRT] [V] Setting a default quantization params because quantization data is missing for
[03/29/2024-17:37:33] [TRT] [V] Tactic: 0x0000000000000000 Time: 0.00409158
[03/29/2024-17:37:33] [TRT] [V] Optimizer Reformat( -> output) (Reformat[0x80000006]) profiling completed in 0.0299125 seconds. Fastest Tactic: 0x00000000000003e8 Time: 0.00406451
[03/29/2024-17:37:33] [TRT] [V] Adding reformat layer: Reformatted Output Tensor 0 to {ForeignNode[onnx::Gather_401...(Unnamed Layer* 3201) [ElementWise]]} (output) from Half(49,1) to Float(49,1)
[03/29/2024-17:37:33] [TRT] [V] Formats and tactics selection completed in 302.827 seconds.
[03/29/2024-17:37:33] [TRT] [V] After reformat layers: 3 layers
[03/29/2024-17:37:33] [TRT] [V] Total number of blocks in pre-optimized block assignment: 3
[03/29/2024-17:37:33] [TRT] [I] Detected 2 inputs and 1 output network tensors.
[03/29/2024-17:39:34] [TRT] [V] Setting a default quantization params because quantization data is missing for [ShapeHostToDeviceCopy 0]
[03/29/2024-17:39:34] [TRT] [V] Setting a default quantization params because quantization data is missing for {ForeignNode[onnx::Gather_401...(Unnamed Layer* 3201) [ElementWise]]}
[03/29/2024-17:39:34] [TRT] [V] Layer: [ShapeHostToDeviceCopy 0] Host Persistent: 4 Device Persistent: 0 Scratch Memory: 0
[03/29/2024-17:39:34] [TRT] [V] Layer: {ForeignNode[onnx::Gather_401...(Unnamed Layer* 3201) [ElementWise]]} Host Persistent: 32 Device Persistent: 0 Scratch Memory: 1115742720
[03/29/2024-17:39:34] [TRT] [V] Skipped printing memory information for 1 layers with 0 memory size i.e. Host Persistent + Device Persistent + Scratch Memory == 0.
[03/29/2024-17:39:34] [TRT] [I] Total Host Persistent Memory: 48
[03/29/2024-17:39:34] [TRT] [I] Total Device Persistent Memory: 0
[03/29/2024-17:39:34] [TRT] [I] Total Scratch Memory: 1115742720
[03/29/2024-17:39:34] [TRT] [I] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 2166 MiB, GPU 6049 MiB
[03/29/2024-17:39:34] [TRT] [I] [BlockAssignment] Started assigning block shifts. This will take 3 steps to complete.
[03/29/2024-17:39:34] [TRT] [I] [BlockAssignment] Algorithm ShiftNTopDown took 0.014939ms to assign 3 blocks to 3 nodes requiring 1115743744 bytes.
[03/29/2024-17:39:34] [TRT] [V] Total number of blocks in optimized block assignment: 3
[03/29/2024-17:39:34] [TRT] [I] Total Activation Memory: 1115743744
[03/29/2024-17:39:34] [TRT] [V] Total number of generated kernels selected for the engine: 0
[03/29/2024-17:39:34] [TRT] [V] Disabling unused tactic source: EDGE_MASK_CONVOLUTIONS
[03/29/2024-17:39:34] [TRT] [V] Disabling unused tactic source: JIT_CONVOLUTIONS
[03/29/2024-17:39:34] [TRT] [V] Engine generation completed in 425.016 seconds.
[03/29/2024-17:39:34] [TRT] [W] TensorRT encountered issues when converting weights between types and that could affect accuracy.
[03/29/2024-17:39:34] [TRT] [W] If this is not the desired behavior, please modify the weights or retrain with regularization to adjust the magnitude of the weights.
[03/29/2024-17:39:34] [TRT] [W] Check verbose logs for the list of affected weights.
[03/29/2024-17:39:34] [TRT] [W] - 256 weights are affected by this issue: Detected subnormal FP16 values.

others：
when I set FP16 in front set INT8，it will fallback FP16

bernardrb · 2024-03-29T13:29:56Z

[03/29/2024-17:37:32] [TRT] [V] *************** Autotuning Reformat: Half(49,1) -> Float(49,1) ***************

[03/29/2024-17:37:33] [TRT] [V] Adding reformat layer: Reformatted Output Tensor 0 to {ForeignNode[onnx::Gather_401...(Unnamed Layer* 3201) [ElementWise]]} (output) from Half(49,1) to Float(49,1)

How many layers are affected? Since, it could be a necessary reformat layer that tensorrt adds at I/O. Refer to this for more info: https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#reformat-free-network-tensors

Please share the whole log, and .onnx file in a google drive for further help.

Had the same issue with a Reformat layer #2136

kakascode · 2024-03-30T04:28:27Z

[03/29/2024-17:37:32] [TRT] [V] *************** Autotuning Reformat: Half(49,1) -> Float(49,1) ***************

[03/29/2024-17:37:33] [TRT] [V] Adding reformat layer: Reformatted Output Tensor 0 to {ForeignNode[onnx::Gather_401...(Unnamed Layer* 3201) [ElementWise]]} (output) from Half(49,1) to Float(49,1)

How many layers are affected? Since, it could be a necessary reformat layer that tensorrt adds at I/O. Refer to this for more info: https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#reformat-free-network-tensors

Please share the whole log, and .onnx file in a google drive for further help.

Had the same issue with a Reformat layer #2136

I am sorry for my late response, i will check you method, thanks for help

lix19937 · 2024-04-02T14:46:36Z

When I use TensorRT for int8 quantization, I always encounter the accuracy fallback to fp32. The trt.BuilderFlag.OBEY_PRECISION_CONSTRAINTS parameter does not solve the issue. What should I do?"

What is your trtexec cmd ?

kakascode · 2024-04-03T02:22:07Z

When I use TensorRT for int8 quantization, I always encounter the accuracy fallback to fp32. The trt.BuilderFlag.OBEY_PRECISION_CONSTRAINTS parameter does not solve the issue. What should I do?"

What is your trtexec cmd ?

I didn't use the trtexec command; instead, I used my own script.

builder = trt.Builder(logger)  
network = builder.create_network(1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH))  
parser  = trt.OnnxParser(network, logger)  
config  = builder.create_builder_config()  
config.max_workspace_size = (1 << 30) * 8  
config.set_flag(trt.BuilderFlag.FP16)  
config.set_flag(trt.BuilderFlag.INT8)  
config.set_flag(trt.BuilderFlag.OBEY_PRECISION_CONSTRAINTS)  
config.int8_calibrator = calib```

if I don't use `config.set_flag(trt.BuilderFlag.FP16)`, it will fallback fp32, otherwise, it will fallback fp16

kakascode closed this as completed Mar 30, 2024

kakascode reopened this Mar 30, 2024

zerollzeng added the triaged Issue has been triaged by maintainers label Apr 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

onnx model convert trt.int8 failure：fallback fp32 #3754

onnx model convert trt.int8 failure：fallback fp32 #3754

kakascode commented Mar 29, 2024 •

edited

Loading

bernardrb commented Mar 29, 2024

kakascode commented Mar 29, 2024

bernardrb commented Mar 29, 2024

kakascode commented Mar 30, 2024

lix19937 commented Apr 2, 2024

kakascode commented Apr 3, 2024 •

edited

Loading

onnx model convert trt.int8 failure：fallback fp32 #3754

onnx model convert trt.int8 failure：fallback fp32 #3754

Comments

kakascode commented Mar 29, 2024 • edited Loading

Description

Environment

bernardrb commented Mar 29, 2024

kakascode commented Mar 29, 2024

bernardrb commented Mar 29, 2024

kakascode commented Mar 30, 2024

lix19937 commented Apr 2, 2024

kakascode commented Apr 3, 2024 • edited Loading

kakascode commented Mar 29, 2024 •

edited

Loading

kakascode commented Apr 3, 2024 •

edited

Loading