You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When I increase batch size, the inference time on TensorRT does not change. Basically if inference time on the batch with size 8 took 20ms. Inference on batch size 16 just takes 40ms. I am not sure why this is happening ...
I have converted EfficientNet backbone from TF to ONNX, and then to TensorRT. In TF I specified the batch size as follows:
import engine as eng
import argparse
from onnx import ModelProto
import tensorrt as trt
base_dir = "mods/effnet-l"
# base_dir = "mods/resnet152/"
onnx_path = base_dir+"/bbone.onnx"
engine_name = base_dir+"/bbone.plan"
batch_size = 8
model = ModelProto()
with open(onnx_path, "rb") as f:
model.ParseFromString(f.read())
shape = [batch_size, 256, 256, 3]
engine = eng.build_engine(onnx_path, shape=shape)
eng.save_engine(engine, engine_name)
Here is an inference code for TensorRT.
Everything works properly. The problem is the speed. Basically, if I increase batch size twice it will just increase inference time twice. Thus, it is not changing total inference time.
When I increase batch size, the inference time on TensorRT does not change.
-> With onnx, TensorRT uses explicit batch, which means if you want to use dynamic batch size, in your onnx model the batch dimension must be unknown, and you need to set the optimization profile for the inputs. Before calling Forward(), you need to set the profile for enqueue.
When I increase batch size, the inference time on TensorRT does not change. Basically if inference time on the batch with size 8 took 20ms. Inference on batch size 16 just takes 40ms. I am not sure why this is happening ...
I have converted EfficientNet backbone from TF to ONNX, and then to TensorRT. In TF I specified the batch size as follows:
Converting TensorFlow model to ONNX
Converting ONNX model to TensorRT and saving it.
Here is an inference code for TensorRT.
Everything works properly. The problem is the speed. Basically, if I increase batch size twice it will just increase inference time twice. Thus, it is not changing total inference time.
Screenshots
If applicable, add screenshots to help explain your problem.
System environment (please complete the following information):
The text was updated successfully, but these errors were encountered: