GluonCV 0.6.0 Release

Highlights

GluonCV v0.6.0 added more video classification models, added pose estimation models that are suitable for mobile inference, added quantized models for video classification and pose estimation, and we also included multiple usability and code improvements.

More video action recognition models

https://gluon-cv.mxnet.io/model_zoo/action_recognition.html

We now provide state-of-the-art video classification networks, such as I3D, I3D-Nonlocal and SlowFast. We have a complete model zoo over several widely adopted video datasets. We provide a general video dataloader (which can handle both frame format and raw video format). Users can do training, fine-tuning, prediction and feature extraction without writing complicate code. Just prepare a text file containing the video information is enough.

Below is the table of new models included in this release.

Name	Pretrained	Segments	Clip Length	Top-1	Hashtag
inceptionv1_kinetics400	ImageNet	7	1	69.1	6dcdafb1
inceptionv3_kinetics400	ImageNet	7	1	72.5	8a4a6946
resnet18_v1b_kinetics400	ImageNet	7	1	65.5	46d5a985
resnet34_v1b_kinetics400	ImageNet	7	1	69.1	8a8d0d8d
resnet50_v1b_kinetics400	ImageNet	7	1	69.9	cc757e5c
resnet101_v1b_kinetics400	ImageNet	7	1	71.3	5bb6098e
resnet152_v1b_kinetics400	ImageNet	7	1	71.5	9bc70c66
i3d_inceptionv1_kinetics400	ImageNet	1	32 (64/2)	71.8	81e0be10
i3d_inceptionv3_kinetics400	ImageNet	1	32 (64/2)	73.6	f14f8a99
i3d_resnet50_v1_kinetics400	ImageNet	1	32 (64/2)	74.0	568a722e
i3d_resnet101_v1_kinetics400	ImageNet	1	32 (64/2)	75.1	6b69f655
i3d_nl5_resnet50_v1_kinetics400	ImageNet	1	32 (64/2)	75.2	3c0e47ea
i3d_nl10_resnet50_v1_kinetics400	ImageNet	1	32 (64/2)	75.3	bfb58c41
i3d_nl5_resnet101_v1_kinetics400	ImageNet	1	32 (64/2)	76.0	fbfc1d30
i3d_nl10_resnet101_v1_kinetics400	ImageNet	1	32 (64/2)	76.1	59186c31
slowfast_4x16_resnet50_kinetics400	ImageNet	1	36 (64/1)	75.3	9d650f51
slowfast_8x8_resnet50_kinetics400	ImageNet	1	40 (64/1)	76.6	d6b25339
slowfast_8x8_resnet101_kinetics400	ImageNet	1	40 (64/1)	77.2	fbde1a7c
resnet50_v1b_ucf101	ImageNet	3	1	83.7	d728ecc7
i3d_resnet50_v1_ucf101	ImageNet	1	32 (64/2)	83.9	7afc7286
i3d_resnet50_v1_ucf101	Kinetics400	1	32 (64/2)	95.4	760d0981
resnet50_v1b_hmdb51	ImageNet	3	1	55.2	682591e2
i3d_resnet50_v1_hmdb51	ImageNet	1	32 (64/2)	48.5	0d0ad559
i3d_resnet50_v1_hmdb51	Kinetics400	1	32 (64/2)	70.9	2ec6bf01
resnet50_v1b_sthsthv2	ImageNet	8	1	35.5	80ee0c6b
i3d_resnet50_v1_sthsthv2	ImageNet	1	16 (32/2)	50.6	01961e4c

We include tutorials for how to fine-tune a pre-trained model on users' own dataset.
https://gluon-cv.mxnet.io/build/examples_action_recognition/finetune_custom.html

We include tutorials for introducing a new efficient video reader, Decord.
https://gluon-cv.mxnet.io/build/examples_action_recognition/decord_loader.html

We include tutorials for how to extract features from a pre-trained model.
https://gluon-cv.mxnet.io/build/examples_action_recognition/feat_custom.html

We include tutorials for how to make predictions from a pre-trained model.
https://gluon-cv.mxnet.io/build/examples_action_recognition/demo_custom.html

We include tutorials for how to perform distributed training on deep video models.
https://gluon-cv.mxnet.io/build/examples_distributed/distributed_slowfast.html

We include tutorials for how to prepare HMDB51 and Something-something-v2 dataset.
https://gluon-cv.mxnet.io/build/examples_datasets/hmdb51.html
https://gluon-cv.mxnet.io/build/examples_datasets/somethingsomethingv2.html

We will provide Kinetics600 and Kinetics700 pre-trained models in the next release, please stay tuned.

Mobile pose estimation models

https://gluon-cv.mxnet.io/model_zoo/pose.html#mobile-pose-models

Model	OKS AP	OKS AP (with flip)	Hashtag
mobile_pose_resnet18_v1b	66.2/89.2/74.3	67.9/90.3/75.7	dd6644eb
mobile_pose_resnet50_v1b	71.1/91.3/78.7	72.4/92.3/79.8	ec8809df
mobile_pose_mobilenet1.0	64.1/88.1/71.2	65.7/89.2/73.4	b399bac7
mobile_pose_mobilenetv2_1.0	63.7/88.1/71.0	65.0/89.2/72.3	4acdc130
mobile_pose_mobilenetv3_large	63.7/88.9/70.8	64.5/89.0/72.0	1ca004dc
mobile_pose_mobilenetv3_small	54.3/83.7/59.4	55.6/84.7/61.7	b1b148a9

By replacing the backbone network, and use pixel shuffle layer instead of deconvolution, we can have models that are very fast. These models are suitable for edge device applications, tutorials on deployment will come soon.

More Int8 quantized models

https://gluon-cv.mxnet.io/build/examples_deployment/int8_inference.html
Below CPU performance is benchmarked on AWS EC2 C5.12xlarge instance with 24 physical cores.
Note that you will need nightly build of MXNet to properly use these new features.

Model	Dataset	Batch Size	Speedup (INT8/FP32)	FP32 Accuracy	INT8 Accuracy
simple_pose_resnet18_v1b	COCO Keypoint	128	2.55	66.3	65.9
simple_pose_resnet50_v1b	COCO Keypoint	128	3.50	71.0	70.6
simple_pose_resnet50_v1d	COCO Keypoint	128	5.89	71.6	71.4
simple_pose_resnet101_v1b	COCO Keypoint	128	4.07	72.4	72.2
simple_pose_resnet101_v1d	COCO Keypoint	128	5.97	73.0	72.7
vgg16_ucf101	UCF101	64	4.46	81.86	81.41
inceptionv3_ucf101	UCF101	64	5.16	86.92	86.55
resnet18_v1b_kinetics400	Kinetics400	64	5.24	63.29	63.14
resnet50_v1b_kinetics400	Kinetics400	64	6.78	68.08	68.15
inceptionv3_kinetics400	Kinetics400	64	5.29	67.93	67.92

For pose-estimation models, the accuracy metric is OKS AP w/o flip. Quantized 2D video action recognition models are calibrated with num-segments=3 (7 is for ResNet-based models).

Bug fixes and Improvements

Performance of PSPNet using ResNet101 as backbone on Cityscapes (semantic segmentation) is improved from mIoU 77.1% to 79.9%, higher than the number reported in original paper.
We will deprecate Python2 support in the next release.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GluonCV 0.6.0 Release

GluonCV 0.6.0 Release

Highlights

More video action recognition models

Mobile pose estimation models

More Int8 quantized models

Bug fixes and Improvements