The example script infer.py
is for benchmarking and validating inference
on image classification models using TF-TRT in TensorFlow 2.0.
You can enable TF-TRT integration by passing the --use_tftrt
flag to the script.
This causes the script to apply TensorRT inference optimization to speed up
execution for portions of the model's graph where supported, and to fall back on
native TensorFlow for layers and operations which are not supported. See
Accelerating Inference In TensorFlow With TensorRT User
Guide for
more information.
When using the TF-TRT integration flag, you can use the precision option
(--precision
) to control precision. float32 is the default (--precision fp32
) with float16 (--precision fp16
) or int8 (--precision int8
) allowing
further performance improvements.
int8 mode requires a calibration step (which is done automatically), but you
also must specificy the directory in which the calibration dataset is stored
with --calib_data_dir /imagenet_validation_data
. You can use the same data
for both calibration and validation.
The example script supports either using a dataset (for validation
mode - TFRecord format, for benchmark mode - jpeg format) or using
autogenerated synthetic data (with the --use_synthetic_data
flag). If you use
TFRecord files, the script assumes that the TFRecords are named according to the
pattern: validation-*-of-00128
.
To download and process the ImageNet data, you can:
- Use the scripts provided in the
nvidia-examples/build_imagenet_data
directory in the NVIDIA TensorFlow Docker containerworkspace
directory. Follow theREADME
file in that directory for instructions on how to use these scripts.
or
- Use the scripts provided by TF Slim in the
tensorflow/models
repository atresearch/slim
. Consult theREADME
file under `research/slim for instructions on how to use these scripts. Also please note that these scripts download both the training and validation sets, and this example only requires the validation set.
Also see Obtaining The ImageNet Data for more information.
If the above procedure fails in TF2.x, build the dataset with TF1.x (or a container that comes with TF1.x), and then use that dataset in TFv2.x.
The main Python script is infer.py
. Assuming that the ImageNet
validation data are located under /data/imagenet/train-val-tfrecord
, you can
evaluate inference with TF-TRT integration using the pre-trained ResNet V1.5 50
model as follows:
python infer.py \
--data_dir /data/imagenet/train-val-tfrecord \
--calib_data_dir /data/imagenet/train-val-tfrecord \
--saved_model_dir /models/resnet_v1.5_50_saved_model/ \
--model resnet_v1.5_50_tfv2 \
--num_warmup_iterations 50 \
--num_calib_batches 128
--display_every 10 \
--use_tftrt \
--optimize_offline \
--precision INT8 \
--max_workspace_size $((2**32)) \
--batch_size 128
Where:
--saved_model_dir
: Input model to optimize with TF-TRT
--model
: Name of the model (only used to get the right preprocessing)
--data_dir
: Path to the ImageNet TFRecord validation files.
--use_tftrt
: Convert the graph to a TensorRT graph.
--precision
: Precision mode to use, in this case FP16.
--mode
: Which mode to use (validation or benchmark). In validation we run inference with accuracy and performance measurments, in benchmark only performance.
Run with --help
to see all available options.
# Tensorflow - FP32
./models/resnet_v1_50/run_inference.sh \
--use_xla --no_tf32 \
--data_dir=/data/imagenet/train-val-tfrecord --input_saved_model_dir=/models
# Tensorflow - TF32 (identical to FP32 on an NVIDIA Turing GPU or older)
./models/resnet_v1_50/run_inference.sh \
--use_xla \
--data_dir=/data/imagenet/train-val-tfrecord --input_saved_model_dir=/models
# TF-TRT - FP32
./models/resnet_v1_50/run_inference.sh \
--use_xla --no_tf32 \
--data_dir=/data/imagenet/train-val-tfrecord --input_saved_model_dir=/models \
--use_tftrt --precision="FP32"
# TF-TRT - TF32 (identical to FP32 on an NVIDIA Turing GPU or older)
./models/resnet_v1_50/run_inference.sh \
--use_xla \
--data_dir=/data/imagenet/train-val-tfrecord --input_saved_model_dir=/models \
--use_tftrt --precision="FP32"
# TF-TRT - FP16
./models/resnet_v1_50/run_inference.sh \
--use_xla \
--data_dir=/data/imagenet/train-val-tfrecord --input_saved_model_dir=/models \
--use_tftrt --precision="FP16"
# TF-TRT - INT8
./models/resnet_v1_50/run_inference.sh \
--use_xla \
--data_dir=/data/imagenet/train-val-tfrecord --input_saved_model_dir=/models \
--use_tftrt --precision="INT8"
# Tensorflow - FP32
./models/resnet_v1.5_50_tfv2/run_inference.sh \
--use_xla --no_tf32 \
--data_dir=/data/imagenet/train-val-tfrecord --input_saved_model_dir=/models
# Tensorflow - TF32 (identical to FP32 on an NVIDIA Turing GPU or older)
./models/resnet_v1.5_50_tfv2/run_inference.sh \
--use_xla \
--data_dir=/data/imagenet/train-val-tfrecord --input_saved_model_dir=/models
# TF-TRT - FP32
./models/resnet_v1.5_50_tfv2/run_inference.sh \
--use_xla --no_tf32 \
--data_dir=/data/imagenet/train-val-tfrecord --input_saved_model_dir=/models \
--use_tftrt --precision="FP32"
# TF-TRT - TF32 (identical to FP32 on an NVIDIA Turing GPU or older)
./models/resnet_v1.5_50_tfv2/run_inference.sh \
--use_xla \
--data_dir=/data/imagenet/train-val-tfrecord --input_saved_model_dir=/models \
--use_tftrt --precision="FP32"
# TF-TRT - FP16
./models/resnet_v1.5_50_tfv2/run_inference.sh \
--use_xla \
--data_dir=/data/imagenet/train-val-tfrecord --input_saved_model_dir=/models \
--use_tftrt --precision="FP16"
# TF-TRT - INT8
./models/resnet_v1.5_50_tfv2/run_inference.sh \
--use_xla \
--data_dir=/data/imagenet/train-val-tfrecord --input_saved_model_dir=/models \
--use_tftrt --precision="INT8"
# Tensorflow - FP32
./models/resnet_v2_50/run_inference.sh \
--use_xla --no_tf32 \
--data_dir=/data/imagenet/train-val-tfrecord --input_saved_model_dir=/models
# Tensorflow - TF32 (identical to FP32 on an NVIDIA Turing GPU or older)
./models/resnet_v2_50/run_inference.sh \
--use_xla \
--data_dir=/data/imagenet/train-val-tfrecord --input_saved_model_dir=/models
# TF-TRT - FP32
./models/resnet_v2_50/run_inference.sh \
--use_xla --no_tf32 \
--data_dir=/data/imagenet/train-val-tfrecord --input_saved_model_dir=/models \
--use_tftrt --precision="FP32"
# TF-TRT - TF32 (identical to FP32 on an NVIDIA Turing GPU or older)
./models/resnet_v2_50/run_inference.sh \
--use_xla \
--data_dir=/data/imagenet/train-val-tfrecord --input_saved_model_dir=/models \
--use_tftrt --precision="FP32"
# TF-TRT - FP16
./models/resnet_v2_50/run_inference.sh \
--use_xla \
--data_dir=/data/imagenet/train-val-tfrecord --input_saved_model_dir=/models \
--use_tftrt --precision="FP16"
# TF-TRT - INT8
./models/resnet_v2_50/run_inference.sh \
--use_xla \
--data_dir=/data/imagenet/train-val-tfrecord --input_saved_model_dir=/models \
--use_tftrt --precision="INT8"
# Tensorflow - FP32
./models/inception_v3/run_inference.sh \
--use_xla --no_tf32 \
--data_dir=/data/imagenet/train-val-tfrecord --input_saved_model_dir=/models
# Tensorflow - TF32 (identical to FP32 on an NVIDIA Turing GPU or older)
./models/inception_v3/run_inference.sh \
--use_xla \
--data_dir=/data/imagenet/train-val-tfrecord --input_saved_model_dir=/models
# TF-TRT - FP32
./models/inception_v3/run_inference.sh \
--use_xla --no_tf32 \
--data_dir=/data/imagenet/train-val-tfrecord --input_saved_model_dir=/models \
--use_tftrt --precision="FP32"
# TF-TRT - TF32 (identical to FP32 on an NVIDIA Turing GPU or older)
./models/inception_v3/run_inference.sh \
--use_xla \
--data_dir=/data/imagenet/train-val-tfrecord --input_saved_model_dir=/models \
--use_tftrt --precision="FP32"
# TF-TRT - FP16
./models/inception_v3/run_inference.sh \
--use_xla \
--data_dir=/data/imagenet/train-val-tfrecord --input_saved_model_dir=/models \
--use_tftrt --precision="FP16"
# TF-TRT - INT8
./models/inception_v3/run_inference.sh \
--use_xla \
--data_dir=/data/imagenet/train-val-tfrecord --input_saved_model_dir=/models \
--use_tftrt --precision="INT8"
# Tensorflow - FP32
./models/inception_v4/run_inference.sh \
--use_xla --no_tf32 \
--data_dir=/data/imagenet/train-val-tfrecord --input_saved_model_dir=/models
# Tensorflow - TF32 (identical to FP32 on an NVIDIA Turing GPU or older)
./models/inception_v4/run_inference.sh \
--use_xla \
--data_dir=/data/imagenet/train-val-tfrecord --input_saved_model_dir=/models
# TF-TRT - FP32
./models/inception_v4/run_inference.sh \
--use_xla --no_tf32 \
--data_dir=/data/imagenet/train-val-tfrecord --input_saved_model_dir=/models \
--use_tftrt --precision="FP32"
# TF-TRT - TF32 (identical to FP32 on an NVIDIA Turing GPU or older)
./models/inception_v4/run_inference.sh \
--use_xla \
--data_dir=/data/imagenet/train-val-tfrecord --input_saved_model_dir=/models \
--use_tftrt --precision="FP32"
# TF-TRT - FP16
./models/inception_v4/run_inference.sh \
--use_xla \
--data_dir=/data/imagenet/train-val-tfrecord --input_saved_model_dir=/models \
--use_tftrt --precision="FP16"
# TF-TRT - INT8
./models/inception_v4/run_inference.sh \
--use_xla \
--data_dir=/data/imagenet/train-val-tfrecord --input_saved_model_dir=/models \
--use_tftrt --precision="INT8"
# Tensorflow - FP32
./models/mobilenet_v1/run_inference.sh \
--use_xla --no_tf32 \
--data_dir=/data/imagenet/train-val-tfrecord --input_saved_model_dir=/models
# Tensorflow - TF32 (identical to FP32 on an NVIDIA Turing GPU or older)
./models/mobilenet_v1/run_inference.sh \
--use_xla \
--data_dir=/data/imagenet/train-val-tfrecord --input_saved_model_dir=/models
# TF-TRT - FP32
./models/mobilenet_v1/run_inference.sh \
--use_xla --no_tf32 \
--data_dir=/data/imagenet/train-val-tfrecord --input_saved_model_dir=/models \
--use_tftrt --precision="FP32"
# TF-TRT - TF32 (identical to FP32 on an NVIDIA Turing GPU or older)
./models/mobilenet_v1/run_inference.sh \
--use_xla \
--data_dir=/data/imagenet/train-val-tfrecord --input_saved_model_dir=/models \
--use_tftrt --precision="FP32"
# TF-TRT - FP16
./models/mobilenet_v1/run_inference.sh \
--use_xla \
--data_dir=/data/imagenet/train-val-tfrecord --input_saved_model_dir=/models \
--use_tftrt --precision="FP16"
# TF-TRT - INT8
./models/mobilenet_v1/run_inference.sh \
--use_xla \
--data_dir=/data/imagenet/train-val-tfrecord --input_saved_model_dir=/models \
--use_tftrt --precision="INT8"
# Tensorflow - FP32
./models/mobilenet_v2/run_inference.sh \
--use_xla --no_tf32 \
--data_dir=/data/imagenet/train-val-tfrecord --input_saved_model_dir=/models
# Tensorflow - TF32 (identical to FP32 on an NVIDIA Turing GPU or older)
./models/mobilenet_v2/run_inference.sh \
--use_xla \
--data_dir=/data/imagenet/train-val-tfrecord --input_saved_model_dir=/models
# TF-TRT - FP32
./models/mobilenet_v2/run_inference.sh \
--use_xla --no_tf32 \
--data_dir=/data/imagenet/train-val-tfrecord --input_saved_model_dir=/models \
--use_tftrt --precision="FP32"
# TF-TRT - TF32 (identical to FP32 on an NVIDIA Turing GPU or older)
./models/mobilenet_v2/run_inference.sh \
--use_xla \
--data_dir=/data/imagenet/train-val-tfrecord --input_saved_model_dir=/models \
--use_tftrt --precision="FP32"
# TF-TRT - FP16
./models/mobilenet_v2/run_inference.sh \
--use_xla \
--data_dir=/data/imagenet/train-val-tfrecord --input_saved_model_dir=/models \
--use_tftrt --precision="FP16"
# TF-TRT - INT8
./models/mobilenet_v2/run_inference.sh \
--use_xla \
--data_dir=/data/imagenet/train-val-tfrecord --input_saved_model_dir=/models \
--use_tftrt --precision="INT8"
# Tensorflow - FP32
./models/nasnet_large/run_inference.sh \
--use_xla --no_tf32 \
--data_dir=/data/imagenet/train-val-tfrecord --input_saved_model_dir=/models
# Tensorflow - TF32 (identical to FP32 on an NVIDIA Turing GPU or older)
./models/nasnet_large/run_inference.sh \
--use_xla \
--data_dir=/data/imagenet/train-val-tfrecord --input_saved_model_dir=/models
# TF-TRT - FP32
./models/nasnet_large/run_inference.sh \
--use_xla --no_tf32 \
--data_dir=/data/imagenet/train-val-tfrecord --input_saved_model_dir=/models \
--use_tftrt --precision="FP32"
# TF-TRT - TF32 (identical to FP32 on an NVIDIA Turing GPU or older)
./models/nasnet_large/run_inference.sh \
--use_xla \
--data_dir=/data/imagenet/train-val-tfrecord --input_saved_model_dir=/models \
--use_tftrt --precision="FP32"
# TF-TRT - FP16
./models/nasnet_large/run_inference.sh \
--use_xla \
--data_dir=/data/imagenet/train-val-tfrecord --input_saved_model_dir=/models \
--use_tftrt --precision="FP16"
# TF-TRT - INT8
./models/nasnet_large/run_inference.sh \
--use_xla \
--data_dir=/data/imagenet/train-val-tfrecord --input_saved_model_dir=/models \
--use_tftrt --precision="INT8"
# Tensorflow - FP32
./models/nasnet_mobile/run_inference.sh \
--use_xla --no_tf32 \
--data_dir=/data/imagenet/train-val-tfrecord --input_saved_model_dir=/models
# Tensorflow - TF32 (identical to FP32 on an NVIDIA Turing GPU or older)
./models/nasnet_mobile/run_inference.sh \
--use_xla \
--data_dir=/data/imagenet/train-val-tfrecord --input_saved_model_dir=/models
# TF-TRT - FP32
./models/nasnet_mobile/run_inference.sh \
--use_xla --no_tf32 \
--data_dir=/data/imagenet/train-val-tfrecord --input_saved_model_dir=/models \
--use_tftrt --precision="FP32"
# TF-TRT - TF32 (identical to FP32 on an NVIDIA Turing GPU or older)
./models/nasnet_mobile/run_inference.sh \
--use_xla \
--data_dir=/data/imagenet/train-val-tfrecord --input_saved_model_dir=/models \
--use_tftrt --precision="FP32"
# TF-TRT - FP16
./models/nasnet_mobile/run_inference.sh \
--use_xla \
--data_dir=/data/imagenet/train-val-tfrecord --input_saved_model_dir=/models \
--use_tftrt --precision="FP16"
# TF-TRT - INT8
./models/nasnet_mobile/run_inference.sh \
--use_xla \
--data_dir=/data/imagenet/train-val-tfrecord --input_saved_model_dir=/models \
--use_tftrt --precision="INT8"
# Tensorflow - FP32
./models/vgg_16/run_inference.sh \
--use_xla --no_tf32 \
--data_dir=/data/imagenet/train-val-tfrecord --input_saved_model_dir=/models
# Tensorflow - TF32 (identical to FP32 on an NVIDIA Turing GPU or older)
./models/vgg_16/run_inference.sh \
--use_xla \
--data_dir=/data/imagenet/train-val-tfrecord --input_saved_model_dir=/models
# TF-TRT - FP32
./models/vgg_16/run_inference.sh \
--use_xla --no_tf32 \
--data_dir=/data/imagenet/train-val-tfrecord --input_saved_model_dir=/models \
--use_tftrt --precision="FP32"
# TF-TRT - TF32 (identical to FP32 on an NVIDIA Turing GPU or older)
./models/vgg_16/run_inference.sh \
--use_xla \
--data_dir=/data/imagenet/train-val-tfrecord --input_saved_model_dir=/models \
--use_tftrt --precision="FP32"
# TF-TRT - FP16
./models/vgg_16/run_inference.sh \
--use_xla \
--data_dir=/data/imagenet/train-val-tfrecord --input_saved_model_dir=/models \
--use_tftrt --precision="FP16"
# TF-TRT - INT8
./models/vgg_16/run_inference.sh \
--use_xla \
--data_dir=/data/imagenet/train-val-tfrecord --input_saved_model_dir=/models \
--use_tftrt --precision="INT8"
# Tensorflow - FP32
./models/vgg_19/run_inference.sh \
--use_xla --no_tf32 \
--data_dir=/data/imagenet/train-val-tfrecord --input_saved_model_dir=/models
# Tensorflow - TF32 (identical to FP32 on an NVIDIA Turing GPU or older)
./models/vgg_19/run_inference.sh \
--use_xla \
--data_dir=/data/imagenet/train-val-tfrecord --input_saved_model_dir=/models
# TF-TRT - FP32
./models/vgg_19/run_inference.sh \
--use_xla --no_tf32 \
--data_dir=/data/imagenet/train-val-tfrecord --input_saved_model_dir=/models \
--use_tftrt --precision="FP32"
# TF-TRT - TF32 (identical to FP32 on an NVIDIA Turing GPU or older)
./models/vgg_19/run_inference.sh \
--use_xla \
--data_dir=/data/imagenet/train-val-tfrecord --input_saved_model_dir=/models \
--use_tftrt --precision="FP32"
# TF-TRT - FP16
./models/vgg_19/run_inference.sh \
--use_xla \
--data_dir=/data/imagenet/train-val-tfrecord --input_saved_model_dir=/models \
--use_tftrt --precision="FP16"
# TF-TRT - INT8
./models/vgg_19/run_inference.sh \
--use_xla \
--data_dir=/data/imagenet/train-val-tfrecord --input_saved_model_dir=/models \
--use_tftrt --precision="INT8"