Skip to content

GluonCV 0.6.0 Release

Compare
Choose a tag to compare
@bryanyzhu bryanyzhu released this 13 Jan 23:48
· 252 commits to master since this release
ce90b0e

GluonCV 0.6.0 Release

Highlights

GluonCV v0.6.0 added more video classification models, added pose estimation models that are suitable for mobile inference, added quantized models for video classification and pose estimation, and we also included multiple usability and code improvements.

More video action recognition models

https://gluon-cv.mxnet.io/model_zoo/action_recognition.html

We now provide state-of-the-art video classification networks, such as I3D, I3D-Nonlocal and SlowFast. We have a complete model zoo over several widely adopted video datasets. We provide a general video dataloader (which can handle both frame format and raw video format). Users can do training, fine-tuning, prediction and feature extraction without writing complicate code. Just prepare a text file containing the video information is enough.

Below is the table of new models included in this release.

Name Pretrained Segments Clip Length Top-1 Hashtag
inceptionv1_kinetics400 ImageNet 7 1 69.1 6dcdafb1
inceptionv3_kinetics400 ImageNet 7 1 72.5 8a4a6946
resnet18_v1b_kinetics400 ImageNet 7 1 65.5 46d5a985
resnet34_v1b_kinetics400  ImageNet 7 1 69.1 8a8d0d8d
resnet50_v1b_kinetics400  ImageNet 7 1 69.9 cc757e5c
resnet101_v1b_kinetics400  ImageNet 7 1 71.3 5bb6098e
resnet152_v1b_kinetics400  ImageNet 7 1 71.5 9bc70c66
i3d_inceptionv1_kinetics400  ImageNet 1 32 (64/2) 71.8 81e0be10
i3d_inceptionv3_kinetics400  ImageNet 1 32 (64/2) 73.6 f14f8a99
i3d_resnet50_v1_kinetics400  ImageNet 1 32 (64/2) 74.0 568a722e
i3d_resnet101_v1_kinetics400  ImageNet 1 32 (64/2) 75.1 6b69f655
i3d_nl5_resnet50_v1_kinetics400  ImageNet 1 32 (64/2) 75.2 3c0e47ea
i3d_nl10_resnet50_v1_kinetics400  ImageNet 1 32 (64/2) 75.3 bfb58c41
i3d_nl5_resnet101_v1_kinetics400  ImageNet 1 32 (64/2) 76.0 fbfc1d30
i3d_nl10_resnet101_v1_kinetics400  ImageNet 1 32 (64/2) 76.1 59186c31
slowfast_4x16_resnet50_kinetics400  ImageNet 1 36 (64/1) 75.3 9d650f51
slowfast_8x8_resnet50_kinetics400  ImageNet 1 40 (64/1) 76.6 d6b25339
slowfast_8x8_resnet101_kinetics400  ImageNet 1 40 (64/1) 77.2 fbde1a7c
resnet50_v1b_ucf101  ImageNet 3 1 83.7 d728ecc7
i3d_resnet50_v1_ucf101 ImageNet 1 32 (64/2) 83.9 7afc7286
i3d_resnet50_v1_ucf101  Kinetics400 1 32 (64/2) 95.4 760d0981
resnet50_v1b_hmdb51  ImageNet 3 1 55.2 682591e2
i3d_resnet50_v1_hmdb51  ImageNet 1 32 (64/2) 48.5 0d0ad559
i3d_resnet50_v1_hmdb51  Kinetics400 1 32 (64/2) 70.9 2ec6bf01
resnet50_v1b_sthsthv2  ImageNet 8 1 35.5 80ee0c6b
i3d_resnet50_v1_sthsthv2  ImageNet 1 16 (32/2) 50.6 01961e4c

We include tutorials for how to fine-tune a pre-trained model on users' own dataset.
https://gluon-cv.mxnet.io/build/examples_action_recognition/finetune_custom.html

We include tutorials for introducing a new efficient video reader, Decord.
https://gluon-cv.mxnet.io/build/examples_action_recognition/decord_loader.html

We include tutorials for how to extract features from a pre-trained model.
https://gluon-cv.mxnet.io/build/examples_action_recognition/feat_custom.html

We include tutorials for how to make predictions from a pre-trained model.
https://gluon-cv.mxnet.io/build/examples_action_recognition/demo_custom.html

We include tutorials for how to perform distributed training on deep video models.
https://gluon-cv.mxnet.io/build/examples_distributed/distributed_slowfast.html

We include tutorials for how to prepare HMDB51 and Something-something-v2 dataset.
https://gluon-cv.mxnet.io/build/examples_datasets/hmdb51.html
https://gluon-cv.mxnet.io/build/examples_datasets/somethingsomethingv2.html

We will provide Kinetics600 and Kinetics700 pre-trained models in the next release, please stay tuned.

Mobile pose estimation models

https://gluon-cv.mxnet.io/model_zoo/pose.html#mobile-pose-models

Model OKS AP OKS AP (with flip) Hashtag
mobile_pose_resnet18_v1b  66.2/89.2/74.3 67.9/90.3/75.7 dd6644eb
mobile_pose_resnet50_v1b  71.1/91.3/78.7 72.4/92.3/79.8 ec8809df
mobile_pose_mobilenet1.0  64.1/88.1/71.2 65.7/89.2/73.4 b399bac7
mobile_pose_mobilenetv2_1.0  63.7/88.1/71.0 65.0/89.2/72.3 4acdc130
mobile_pose_mobilenetv3_large  63.7/88.9/70.8 64.5/89.0/72.0 1ca004dc
mobile_pose_mobilenetv3_small  54.3/83.7/59.4 55.6/84.7/61.7 b1b148a9

By replacing the backbone network, and use pixel shuffle layer instead of deconvolution, we can have models that are very fast. These models are suitable for edge device applications, tutorials on deployment will come soon.

More Int8 quantized models

https://gluon-cv.mxnet.io/build/examples_deployment/int8_inference.html
Below CPU performance is benchmarked on AWS EC2 C5.12xlarge instance with 24 physical cores.
Note that you will need nightly build of MXNet to properly use these new features.

Model Dataset Batch Size Speedup (INT8/FP32) FP32 Accuracy INT8 Accuracy
simple_pose_resnet18_v1b COCO Keypoint 128 2.55 66.3 65.9
simple_pose_resnet50_v1b COCO Keypoint 128 3.50 71.0 70.6
simple_pose_resnet50_v1d COCO Keypoint 128 5.89 71.6 71.4
simple_pose_resnet101_v1b COCO Keypoint 128 4.07 72.4 72.2
simple_pose_resnet101_v1d COCO Keypoint 128 5.97 73.0 72.7
vgg16_ucf101 UCF101 64 4.46 81.86 81.41
inceptionv3_ucf101 UCF101 64 5.16 86.92 86.55
resnet18_v1b_kinetics400 Kinetics400 64 5.24 63.29 63.14
resnet50_v1b_kinetics400 Kinetics400 64 6.78 68.08 68.15
inceptionv3_kinetics400 Kinetics400 64 5.29 67.93 67.92

For pose-estimation models, the accuracy metric is OKS AP w/o flip. Quantized 2D video action recognition models are calibrated with num-segments=3 (7 is for ResNet-based models).

Bug fixes and Improvements

  • Performance of PSPNet using ResNet101 as backbone on Cityscapes (semantic segmentation) is improved from mIoU 77.1% to 79.9%, higher than the number reported in original paper.
  • We will deprecate Python2 support in the next release.