MXNet C++ Package Inference Workflow Examples

Building C++ Inference examples

The examples in this folder demonstrate the inference workflow. Please build the MXNet C++ Package as explained in the README File. You can get the executable files by just copying them from mxnet/build/cpp-package/example

Examples demonstrating inference workflow

This directory contains following examples. In order to run the examples, ensure that the path to the MXNet shared library is added to the OS specific environment variable viz. LD_LIBRARY_PATH for Linux, Mac and Ubuntu OS and PATH for Windows OS.

imagenet_inference.cpp

This example demonstrates image classification workflow with pre-trained models using MXNet C++ API. Now this script also supports inference with quantized CNN models generated by oneDNN (see this quantization flow). By using C++ API, the latency of most models will be reduced to some extent compared with current Python implementation.

Most of CNN models have been tested on Linux systems. And 50000 images are used to collect accuracy numbers. Please refer to this README for more details about accuracy.

The following performance numbers are collected via using C++ inference API on AWS EC2 C5.12xlarge. The environment variables are set like below:

export KMP_AFFINITY=granularity=fine,noduplicates,compact,1,0
export OMP_NUM_THREADS=$(vCPUs/2)
export MXNET_ENGINE_TYPE=NaiveEngine

Also users are recommended to use numactl or taskset to bind a running process to the specified cores.

Model	Dataset	BS=1 (imgs/sec)	BS=64 (imgs/sec)
		FP32 / INT8	FP32 / INT8
ResNet18-V1	Validation Dataset	369.00 / 778.82	799.7 / 2598.04
ResNet50-V1	Validation Dataset	160.72 / 405.84	349.73 / 1297.65
ResNet101-V1	Validation Dataset	89.56 / 197.55	193.25 / 740.47
Squeezenet 1.0	Validation Dataset	294.46 / 899.28	857.70 / 3065.13
MobileNet 1.0	Validation Dataset	554.94 / 676.59	1279.44 / 3393.43
MobileNetV2 1.0	Validation Dataset	303.40 / 776.40	994.25 / 4227.77
Inception V3	Validation Dataset	108.20 / 219.20	232.22 / 870.09
ResNet152-V2	Validation Dataset	52.28 / 64.62	107.03 / 134.04
Inception-BN	Validation Dataset	211.86 / 306.37	632.79 / 2115.28

The command line to launch inference by this script can accept are as shown below:

./imagenet_inference --help
Usage:
imagenet_inference  --symbol_file <model symbol file in json format>
                    --params_file <model params file>
		    --dataset <dataset used to benchmark>
		    --data_nthreads <number of threads for data decoding, default: 60>
		    --input_shape <shape of input image e.g "3 224 224">]
		    --rgb_mean <mean value to be subtracted on R/G/B channel e.g "0 0 0">
		    --rgb_std <standard deviation on R/G/B channel. e.g "1 1 1">
		    --batch_size <number of images per batch>
		    --num_skipped_batches <skip the number of batches for inference>
		    --num_inference_batches <number of batches used for inference>
		    --data_layer_type <default: "float32", choices: ["float32", "int8", "uint8"]>
		    --gpu <whether to run inference on GPU, default: false>
		    --enableTRT  <whether to run inference with TensorRT, default: false>"
		    --benchmark <whether to use dummy data to run inference, default: false>

Follow the below steps to do inference with more models.

Download the pre-trained FP32 models into ./model directory.
Refer this README to generate the corresponding quantized models and also put them into ./model directory.
Prepare validation dataset and put it into ./data directory.

The below command lines show how to run inference with FP32/INT8 resnet50_v1 model. Because the C++ inference script provides the almost same command line as this Python script and then users can easily go from Python to C++.


# FP32 inference
./imagenet_inference --symbol_file "./model/resnet50_v1-symbol.json" --params_file "./model/resnet50_v1-0000.params" --dataset "./data/val_256_q90.rec" --rgb_mean "123.68 116.779 103.939" --rgb_std "58.393 57.12 57.375" --batch_size 64 --num_skipped_batches 50 --num_inference_batches 500

# INT8 inference
./imagenet_inference --symbol_file "./model/resnet50_v1-quantized-5batches-naive-symbol.json" --params_file "./model/resnet50_v1-quantized-0000.params" --dataset "./data/val_256_q90.rec" --rgb_mean "123.68 116.779 103.939" --rgb_std "58.393 57.12 57.375" --batch_size 64 --num_skipped_batches 50 --num_inference_batches 500

# FP32 dummy data
./imagenet_inference --symbol_file "./model/resnet50_v1-symbol.json" --batch_size 64 --num_inference_batches 500 --benchmark

# INT8 dummy data
./imagenet_inference --symbol_file "./model/resnet50_v1-quantized-5batches-naive-symbol.json" --batch_size 64 --num_inference_batches 500 --benchmark

For a quick inference test, users can directly run unit_test_imagenet_inference.sh by using the below command. This script will automatically download the pre-trained Inception-Bn and resnet50_v1_int8 model and validation dataset which are required for inference.

./unit_test_imagenet_inference.sh

And you may get the similiar outputs like below:

>>> INFO: FP32 real data
imagenet_inference.cpp:282: Loading the model from ./model/Inception-BN-symbol.json
imagenet_inference.cpp:295: Loading the model parameters from ./model/Inception-BN-0126.params
imagenet_inference.cpp:443: INFO:Dataset for inference: ./data/val_256_q90.rec
imagenet_inference.cpp:444: INFO:label_name = softmax_label
imagenet_inference.cpp:445: INFO:rgb_mean: (123.68, 116.779, 103.939)
imagenet_inference.cpp:447: INFO:rgb_std: (1, 1, 1)
imagenet_inference.cpp:449: INFO:Image shape: (3, 224, 224)
imagenet_inference.cpp:451: INFO:Finished inference with: 500 images
imagenet_inference.cpp:453: INFO:Batch size = 1 for inference
imagenet_inference.cpp:454: INFO:Accuracy: 0.744
imagenet_inference.cpp:455: INFO:Throughput: xxxx images per second

>>> INFO: FP32 dummy data
imagenet_inference.cpp:282: Loading the model from ./model/Inception-BN-symbol.json
imagenet_inference.cpp:372: Running the forward pass on model to evaluate the performance..
imagenet_inference.cpp:387: benchmark completed!
imagenet_inference.cpp:388: batch size: 1 num batch: 500 throughput: xxxx imgs/s latency:xxxx ms

>>> INFO: INT8 dummy data
imagenet_inference.cpp:282: Loading the model from ./model/resnet50_v1_int8-symbol.json
imagenet_inference.cpp:372: Running the forward pass on model to evaluate the performance..
imagenet_inference.cpp:387: benchmark completed!
imagenet_inference.cpp:388: batch size: 1 num batch: 500 throughput: xxxx imgs/s latency:xxxx ms

For running this example with TensorRT, you can quickly try the following example to run a benchmark test for testing Inception BN:

./imagenet_inference --symbol_file "./model/Inception-BN-symbol.json" --params_file "./model/Inception-BN-0126.params" --batch_size 16 --num_inference_batches 500 --benchmark --enableTRT

Sample output will looks like this (the example is running on a AWS P3.2xl machine):

imagenet_inference.cpp:302: Loading the model from ./model/Inception-BN-symbol.json
build_subgraph.cc:686: start to execute partition graph.
imagenet_inference.cpp:317: Loading the model parameters from ./model/Inception-BN-0126.params
imagenet_inference.cpp:424: Running the forward pass on model to evaluate the performance..
imagenet_inference.cpp:439:  benchmark completed!
imagenet_inference.cpp:440:  batch size: 16 num batch: 500 throughput: 6284.78 imgs/s latency:0.159115 ms

sentiment_analysis_rnn.cpp

This example demonstrates how you can load a pre-trained RNN model and use it to predict the sentiment expressed in the given movie review with the MXNet C++ API. The example is capable of processing variable legnth inputs. It performs the following tasks

Loads the pre-trained RNN model.
Loads the dictionary file containing the word to index mapping.
Splits the review in multiple lines separated by "."
The example predicts the sentiment score for individual lines and outputs the average score.

The example is capable of processing variable length input by implementing following technique:

The example creates executors for pre-determined input lenghts such as 5, 10, 15, 20, 25, etc called buckets.
Each bucket is identified by bucket-key representing the length on input required by corresponding executor.
For each line in the review, the example finds the number of words in the line and tries to find a closest bucket or executor.
If the bucket key does not match the number of words in the line, the example pads or trims the input line to match the required length.

The example uses a pre-trained RNN model trained with a IMDB dataset. The RNN model was built by exercising the GluonNLP Sentiment Analysis Tutorial. The tutorial uses 'standard_lstm_lm_200' available in Gluon Model Zoo and fine tunes it for the IMDB dataset The model consists of :

Embedding Layer
2 LSTM Layers with hidden dimension size of 200
Average pooling layer
Sigmoid output layer The model was trained for 10 epochs to achieve 85% test accuracy. The visual representation of the model is here.

The model files can be found here.

sentiment_analysis-symbol.json
sentiment_analysis-0010.params
sentiment_token_to_idx.txt Each line of the dictionary file contains a word and a unique index for that word, separated by a space, with a total of 32787 words generated from the training dataset. The example downloads the above files while running.

The example's command line parameters are as shown below:

./sentiment_analysis_rnn --help
Usage:
sentiment_analysis_rnn
--input Input movie review. The review can be single line or multiline.e.g. "This movie is the best." OR  "This movie is the best. The direction is awesome."
[--gpu]  Specify this option if workflow needs to be run in gpu context
If the review is multiline, the example predicts sentiment score for each line and the final score is the average of scores obtained for each line.

The following command line shows running the example with the movie review containing only one line.

./sentiment_analysis_rnn --input "This movie has the great story"

The above command will output the sentiment score as follows:

sentiment_analysis_rnn.cpp:346: Input Line : [This movie has the great story] Score : 0.999898
sentiment_analysis_rnn.cpp:449: The sentiment score between 0 and 1, (1 being positive)=0.999898

The following command line shows invoking the example with the multi-line review.

./sentiment_analysis_rnn --input "This movie is the best. The direction is awesome."

The above command will output the sentiment score for each line in the review and average score as follows:

Input Line : [This movie is the best] Score : 0.964498
Input Line : [ The direction is awesome] Score : 0.968855
The sentiment score between 0 and 1, (1 being positive)=0.966677

Alternatively, you can run the unit_test_sentiment_analysis_rnn.sh script.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

MXNet C++ Package Inference Workflow Examples

Building C++ Inference examples

Examples demonstrating inference workflow

imagenet_inference.cpp

sentiment_analysis_rnn.cpp

Files

README.md

Latest commit

History

README.md

File metadata and controls

MXNet C++ Package Inference Workflow Examples

Building C++ Inference examples

Examples demonstrating inference workflow

imagenet_inference.cpp

sentiment_analysis_rnn.cpp