Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

onnx model convert trt.int8 failure:fallback fp32 #3754

Open
kakascode opened this issue Mar 29, 2024 · 6 comments
Open

onnx model convert trt.int8 failure:fallback fp32 #3754

kakascode opened this issue Mar 29, 2024 · 6 comments
Labels
triaged Issue has been triaged by maintainers

Comments

@kakascode
Copy link

kakascode commented Mar 29, 2024

Description

When I use TensorRT for int8 quantization, I always encounter the accuracy fallback to fp32. The trt.BuilderFlag.OBEY_PRECISION_CONSTRAINTS parameter does not solve the issue. What should I do?"

Environment

TensorRT Version:8.6.16

NVIDIA GPU: A100

CUDA Version:11.4

Operating System:

Python Version (if applicable):3.7

PyTorch Version (if applicable):1.12.1

@bernardrb
Copy link

Can you provide what trt logged during the build, and possible the build script?

@kakascode
Copy link
Author

Can you provide what trt logged during the build, and possible the build script?

thanks,bro. I truncated the last part because it was too long.

[03/29/2024-17:37:32] [TRT] [V] Setting a default quantization params because quantization data is missing for {ForeignNode[onnx::Gather_401...(Unnamed Layer* 3201) [ElementWise]]}
[03/29/2024-17:37:32] [TRT] [V] Tactic: 0x0000000000000000 Time: 59.9303
[03/29/2024-17:37:32] [TRT] [V] {ForeignNode[onnx::Gather_401...(Unnamed Layer* 3201) [ElementWise]]} (Myelin[0x80000023]) profiling completed in 165.507 seconds. Fastest Tactic: 0x0000000000000000 Time: 59.9303
[03/29/2024-17:37:32] [TRT] [V] >>>>>>>>>>>>>>> Chose Runner Type: Myelin Tactic: 0x0000000000000000
[03/29/2024-17:37:32] [TRT] [V] =============== Computing reformatting costs
[03/29/2024-17:37:32] [TRT] [V] =============== Computing reformatting costs
[03/29/2024-17:37:32] [TRT] [V] =============== Computing reformatting costs
[03/29/2024-17:37:32] [TRT] [V] =============== Computing reformatting costs
[03/29/2024-17:37:32] [TRT] [V] =============== Computing reformatting costs
[03/29/2024-17:37:32] [TRT] [V] =============== Computing reformatting costs:
[03/29/2024-17:37:32] [TRT] [V] *************** Autotuning Reformat: Half(49,1) -> Float(49,1) ***************
[03/29/2024-17:37:32] [TRT] [V] --------------- Timing Runner: Optimizer Reformat( -> output) (Reformat[0x80000006])
[03/29/2024-17:37:32] [TRT] [V] Setting a default quantization params because quantization data is missing for
[03/29/2024-17:37:32] [TRT] [V] Tactic: 0x00000000000003e8 Time: 0.00406451
[03/29/2024-17:37:32] [TRT] [V] Setting a default quantization params because quantization data is missing for
[03/29/2024-17:37:32] [TRT] [V] Tactic: 0x00000000000003ea Time: 0.00652882
[03/29/2024-17:37:32] [TRT] [V] Setting a default quantization params because quantization data is missing for
[03/29/2024-17:37:33] [TRT] [V] Tactic: 0x0000000000000000 Time: 0.00409158
[03/29/2024-17:37:33] [TRT] [V] Optimizer Reformat( -> output) (Reformat[0x80000006]) profiling completed in 0.0299125 seconds. Fastest Tactic: 0x00000000000003e8 Time: 0.00406451
[03/29/2024-17:37:33] [TRT] [V] Adding reformat layer: Reformatted Output Tensor 0 to {ForeignNode[onnx::Gather_401...(Unnamed Layer* 3201) [ElementWise]]} (output) from Half(49,1) to Float(49,1)
[03/29/2024-17:37:33] [TRT] [V] Formats and tactics selection completed in 302.827 seconds.
[03/29/2024-17:37:33] [TRT] [V] After reformat layers: 3 layers
[03/29/2024-17:37:33] [TRT] [V] Total number of blocks in pre-optimized block assignment: 3
[03/29/2024-17:37:33] [TRT] [I] Detected 2 inputs and 1 output network tensors.
[03/29/2024-17:39:34] [TRT] [V] Setting a default quantization params because quantization data is missing for [ShapeHostToDeviceCopy 0]
[03/29/2024-17:39:34] [TRT] [V] Setting a default quantization params because quantization data is missing for {ForeignNode[onnx::Gather_401...(Unnamed Layer* 3201) [ElementWise]]}
[03/29/2024-17:39:34] [TRT] [V] Layer: [ShapeHostToDeviceCopy 0] Host Persistent: 4 Device Persistent: 0 Scratch Memory: 0
[03/29/2024-17:39:34] [TRT] [V] Layer: {ForeignNode[onnx::Gather_401...(Unnamed Layer* 3201) [ElementWise]]} Host Persistent: 32 Device Persistent: 0 Scratch Memory: 1115742720
[03/29/2024-17:39:34] [TRT] [V] Skipped printing memory information for 1 layers with 0 memory size i.e. Host Persistent + Device Persistent + Scratch Memory == 0.
[03/29/2024-17:39:34] [TRT] [I] Total Host Persistent Memory: 48
[03/29/2024-17:39:34] [TRT] [I] Total Device Persistent Memory: 0
[03/29/2024-17:39:34] [TRT] [I] Total Scratch Memory: 1115742720
[03/29/2024-17:39:34] [TRT] [I] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 2166 MiB, GPU 6049 MiB
[03/29/2024-17:39:34] [TRT] [I] [BlockAssignment] Started assigning block shifts. This will take 3 steps to complete.
[03/29/2024-17:39:34] [TRT] [I] [BlockAssignment] Algorithm ShiftNTopDown took 0.014939ms to assign 3 blocks to 3 nodes requiring 1115743744 bytes.
[03/29/2024-17:39:34] [TRT] [V] Total number of blocks in optimized block assignment: 3
[03/29/2024-17:39:34] [TRT] [I] Total Activation Memory: 1115743744
[03/29/2024-17:39:34] [TRT] [V] Total number of generated kernels selected for the engine: 0
[03/29/2024-17:39:34] [TRT] [V] Disabling unused tactic source: EDGE_MASK_CONVOLUTIONS
[03/29/2024-17:39:34] [TRT] [V] Disabling unused tactic source: JIT_CONVOLUTIONS
[03/29/2024-17:39:34] [TRT] [V] Engine generation completed in 425.016 seconds.
[03/29/2024-17:39:34] [TRT] [W] TensorRT encountered issues when converting weights between types and that could affect accuracy.
[03/29/2024-17:39:34] [TRT] [W] If this is not the desired behavior, please modify the weights or retrain with regularization to adjust the magnitude of the weights.
[03/29/2024-17:39:34] [TRT] [W] Check verbose logs for the list of affected weights.
[03/29/2024-17:39:34] [TRT] [W] - 256 weights are affected by this issue: Detected subnormal FP16 values.

others:
when I set FP16 in front set INT8,it will fallback FP16

@bernardrb
Copy link

[03/29/2024-17:37:32] [TRT] [V] *************** Autotuning Reformat: Half(49,1) -> Float(49,1) ***************

[03/29/2024-17:37:33] [TRT] [V] Adding reformat layer: Reformatted Output Tensor 0 to {ForeignNode[onnx::Gather_401...(Unnamed Layer* 3201) [ElementWise]]} (output) from Half(49,1) to Float(49,1)

How many layers are affected? Since, it could be a necessary reformat layer that tensorrt adds at I/O. Refer to this for more info: https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#reformat-free-network-tensors

Please share the whole log, and .onnx file in a google drive for further help.

Had the same issue with a Reformat layer #2136

@kakascode kakascode reopened this Mar 30, 2024
@kakascode
Copy link
Author

[03/29/2024-17:37:32] [TRT] [V] *************** Autotuning Reformat: Half(49,1) -> Float(49,1) ***************

[03/29/2024-17:37:33] [TRT] [V] Adding reformat layer: Reformatted Output Tensor 0 to {ForeignNode[onnx::Gather_401...(Unnamed Layer* 3201) [ElementWise]]} (output) from Half(49,1) to Float(49,1)

How many layers are affected? Since, it could be a necessary reformat layer that tensorrt adds at I/O. Refer to this for more info: https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#reformat-free-network-tensors

Please share the whole log, and .onnx file in a google drive for further help.

Had the same issue with a Reformat layer #2136

I am sorry for my late response, i will check you method, thanks for help

@zerollzeng zerollzeng added the triaged Issue has been triaged by maintainers label Apr 2, 2024
@lix19937
Copy link

lix19937 commented Apr 2, 2024

When I use TensorRT for int8 quantization, I always encounter the accuracy fallback to fp32. The trt.BuilderFlag.OBEY_PRECISION_CONSTRAINTS parameter does not solve the issue. What should I do?"

What is your trtexec cmd ?

@kakascode
Copy link
Author

kakascode commented Apr 3, 2024

When I use TensorRT for int8 quantization, I always encounter the accuracy fallback to fp32. The trt.BuilderFlag.OBEY_PRECISION_CONSTRAINTS parameter does not solve the issue. What should I do?"

What is your trtexec cmd ?

I didn't use the trtexec command; instead, I used my own script.

builder = trt.Builder(logger)  
network = builder.create_network(1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH))  
parser  = trt.OnnxParser(network, logger)  
config  = builder.create_builder_config()  
config.max_workspace_size = (1 << 30) * 8  
config.set_flag(trt.BuilderFlag.FP16)  
config.set_flag(trt.BuilderFlag.INT8)  
config.set_flag(trt.BuilderFlag.OBEY_PRECISION_CONSTRAINTS)  
config.int8_calibrator = calib```

if I don't use `config.set_flag(trt.BuilderFlag.FP16)`, it will fallback fp32, otherwise, it will fallback fp16

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

4 participants