Performance results pivot table:
Model Name | Complexity (GFLOPs) | Size (Mp) | AVG mAP (%) | AVG Top-1 (%) | AVG Top-5 (%) | Links |
---|---|---|---|---|---|---|
EfficientNet b0 | 0.76 | 4.14 | 92.75 | 89.14 | 97.79 | imagenet snapshot, model template |
MobilenetV3 large x1.0 | 0.44 | 4.33 | 91.98 | 88.30 | 97.35 | imagenet snapshot, model template |
MobilenetV3 large x0.75 | 0.308 | 2.84 | 91.14 | 87.60 | 96.97 | imagenet snapshot, model template |
MobilenetV3 small x1.0 | 0.112 | 1.56 | 87.81 | 84.99 | 96.15 | imagenet snapshot, model template |
All of the above metrics were obtained on eleven different datasets, on which an extensive amount of research has been made. To provide a generalized performance metric, we averaged the metrics across all datasets. For additional information about performance on each dataset in comparison with baseline, you can refer to this spreadsheet.
The following datasets were used in experiments:
- Describable Textures (DTD)1
- Caltech 1011
- Oxford 102 Flowers
- Oxford-IIIT Pets
- CIFAR100
- SVHN (w/o additional data)
- Fashion MNIST
- FOOD1011
- SUN3971
- Birdsnap1
- Cars Dataset
1 these datasets have custom splits (random stratified splits) and cannot be compared straightforwardly with other research results
Training recipes:
We pretrained all models with imagenet weights and fine-tuned on specific tasks without freezing any layers.
The following parameters and techniques were used for training:
Baselines:
- softmax loss
- most appropriate average learning rate for all datasets (0.013 for MobileNetV3 and 0.003 for EfficientNet)
- cosine scheduler
- basic augmentations
- SGD with momentum optimizer
MobilenetV3:
- Mutual learning approach
- Softmax loss for the main model, Additive Margin softmax for the auxiliary model
- Learning rate found by LR Finder
- Reduce on plateau scheduler which allows getting rid of epochs search
- Augmix pipeline for augmentations
- Sharpness aware minimization optimizer
- No bias decay method
- Exponentially Moving Average (EMA) of the weights
EfficientNet_b0:
- [Additive Margin softmax] loss (https://www.semanticscholar.org/paper/Additive-Margin-Softmax-for-Face-Verification-Wang-Cheng/9fc17fa5708584fa848164461f82a69e97f6ed69)
- Learning rate found by LR Finder
- Reduce on plateau scheduler which allows getting rid of epochs search
- Augmix pipeline for augmentations
- Sharpness aware minimization optimizer
- No bias decay method
- Exponentially Moving Average (EMA) of the weights
All of the models were initially trained on ImageNet, but can be trained from scratch or fine tuned to classify arbitrary images.
Information about LR Finder:
There are three algorithms for estimating an optimal learning rate available: Optuna's grid search, TPE and by fast.ai approach imported from torch-lr-finder with some modifications.
Recommended parameters for the automatic mode in case of fine-tuning:
Mobilenet_v3 backbones:
- min_lr = 0.005
- max_lr = 0.03
- warmup = 1
efficientnet_b0:
- min_lr = 0.001
- max_lr = 0.01
- warmup = 1
The decision will be made automatically by the steepest gradient of the loss function changing.
Also, you can stop after searching learning rate (stop_after=True
), build a graphic of the loss function (path_to_savefig: 'some/path/to/figure'
), and make your own decision about choosing learning rate.
cd models/image_classification
If You have not created virtual environment yet:
./init_venv.sh
Activate virtual environment:
source venv/bin/activate
export MODEL_TEMPLATE=`realpath ./model_templates/custom-classification/mobilenet_v3_large_1/template.yaml`
export WORK_DIR=/tmp/my_model
python ../../tools/instantiate_template.py ${MODEL_TEMPLATE} ${WORK_DIR} --do-not-load-snapshot
The training script assumes the data for classification is divided by folders in such a way when each class has its own folder. The script automatically computes number of subfolders in the train dataset directory and assumes each directory to represent one class. Indexes of classes are assigned according to alphabetically sorted list of folders.
An example of the directory structure:
DATA_DIR
├── train
│ ├── Class1
| | └── <train images that belong to class 1>
│ ├── Class2
| | └── <train images that belong to class 2>
│ |── ....
│ └── ClassN
| └── <train images that belong to class N>
│
└── val
├── Class1
| └── <val images that belong to class 1>
├── Class2
| └── <val images that belong to class 2>
└── ....
└── ClassN
└── <val images that belong to class N>
After the data was arranged, export the variables required for launching training and evaluation scripts:
export TRAIN_DATA_ROOT=${DATA_DIR}/train
export VAL_DATA_ROOT=${DATA_DIR}/val
export TEST_DATA_ROOT=${DATA_DIR}/val
cd ${WORK_DIR}
Try both following variants and select the best one:
- Training from scratch from scratch or pre-trained weights. Only if you have a lot of data, let's say tens of thousands or even more images. This variant assumes long training process starting from big values of learning rate and eventually decreasing it according to a training schedule.
- Fine-tuning from pre-trained weights. If the dataset is not big enough, then the model tends to overfit quickly, forgetting about the data that was used for pre-training and reducing the generalization ability of the final model. Hence, small starting learning rate and short training schedule are recommended.
python train.py \
--train-ann-files '' \
--train-data-roots ${TRAIN_DATA_ROOT} \
--val-ann-files '' \
--val-data-roots ${VAL_DATA_ROOT} \
--save-checkpoints-to ${WORK_DIR}/outputs
NOTE: It's recommended during fine-tuning to decrease the
--base-learning-rate
parameter compared with default value (see${MODEL_TEMPLATE}
) to prevent from forgetting during the first iterations.
Also you can use parameters such as --epochs
, --batch-size
, --gpu-num
, --base-learning-rate
, otherwise default values will be loaded from ${MODEL_TEMPLATE}
.
Evaluation procedure allows us to get quality metrics values and complexity numbers such as number of parameters and FLOPs.
To compute mean accuracy metric run:
python eval.py \
--load-weights ${WORK_DIR}/outputs/model/model.pth.tar-150 \
--test-ann-files ${TEST_ANN_FILE} \
--test-data-roots ${TEST_DATA_ROOT} \
--save-metrics-to ${WORK_DIR}/metrics.yaml
To convert PyTorch* model to the OpenVINO™ IR format run the export.py
script:
python export.py \
--load-weights ${WORK_DIR}/outputs/latest.pth \
--save-model-to ${WORK_DIR}/export
This produces model model.xml
and weights model.bin
in single-precision floating-point format
(FP32). The obtained model expects normalized image in planar BGR format.
The models can be optimized -- compressed by NNCF framework.
To use NNCF to compress an image classification model, you should go to the root folder of this git repository and install compression requirements in your virtual environment by the command
pip install -r external/deep-object-reid/compression_requirements.txt
At the moment, only one compression method is supported for image classification models: int8 quantization.
To compress the model, 'compress.py' script should be used.
Please, note that NNCF framework requires a dataset for compression, since it makes several steps of fine-tuning after
compression to restore the quality of the model, so the command line parameters of the script compress.py
are closer
to the command line parameter of the training script for fine-tuning scenario from the section 5 stated above:
python compress.py \
--load-weights ${SNAPSHOT} \
--train-ann-files '' \
--train-data-roots ${TRAIN_DATA_ROOT} \
--val-ann-files '' \
--val-data-roots ${VAL_DATA_ROOT} \
--save-checkpoints-to outputs \
--nncf-quantization
Note that the number of epochs required for NNCF compression should not be set by command line parameter, since it is
calculated by the script compress.py
itself.
The compressed model can be evaluated and exported to the OpenVINO™ format by the same commands as non-compressed model, see the sections 6 and 7 above.