The goal of this repo is:
- to help to reproduce research papers results (transfer learning setups for instance),
- to access pretrained ConvNets with a unique interface/API inspired by torchvision.
News:
- 16/11/2017: nasnet-a-large pretrained model ported by T. Durand and R. Cadene
- 22/07/2017: torchvision pretrained models
- 22/07/2017: momentum in inceptionv4 and inceptionresnetv2 to 0.1
- 17/07/2017: model.input_range attribut
- 17/07/2017: BNInception pretrained on Imagenet
- Installation
- Toy example
- Evaluation on ImageNet
- Documentation
- Available models
- NasNetLarge
- BNInception
- InceptionV3
- InceptionV4
- InceptionResNetV2
- ResNeXt101_64x4d
- ResNeXt101_32x4d
- ResNet18
- ResNet34
- ResNet50
- ResNet101
- ResNet152
- FBResNet152
- DenseNet121
- DenseNet161
- DenseNet169
- DenseNet201
- SqueezeNet1_0
- SqueezeNet1_1
- AlexNet
- VGG11
- VGG13
- VGG16
- VGG19
- VGG11_BN
- VGG13_BN
- VGG16_BN
- VGG19_BN
- Model API
- Available models
- Reproducing porting
- python3 with anaconda
- pytorch with/out CUDA
git clone https://github.com/Cadene/pretrained-models.pytorch.git
- See test/toy-example.py to compute logits of classes appearance with pretrained models on imagenet.
python test/toy-example.py -a fbresnet152
from PIL import Image
import torch
import torchvision.transforms as transforms
import sys
sys.path.append('yourdir/pretrained-models.pytorch') # if needed
import pretrainedmodels
#Â Load Model
model_name = 'inceptionresnetv4' #fbresnet152
model = pretrainedmodels.__dict__[model_name](num_classes=1000, pretrained='imagenet')
model.eval()
# Load One Input Image
path_img = 'data/cat.jpg'
with open(path_img, 'rb') as f:
with Image.open(f) as img:
input_data = img.convert(model.input_space)
tf = transforms.Compose([
transforms.Scale(round(max(model.input_size)*1.143)),
transforms.CenterCrop(max(model.input_size)),
transforms.ToTensor(),
transforms.Normalize(mean=model.mean, std=model.std)
])
input_data = tf(input_data) # 3x400x225 -> 3x299x299
input_data = input_data.unsqueeze(0) # 3x299x299 -> 1x3x299x299
input = torch.autograd.Variable(input_data)
# Load Imagenet Synsets
with open('data/imagenet_synsets.txt', 'r') as f:
synsets = f.readlines()
# len(synsets)==1001
# sysnets[0] == background
synsets = [x.strip() for x in synsets]
splits = [line.split(' ') for line in synsets]
key_to_classname = {spl[0]:' '.join(spl[1:]) for spl in splits}
with open('data/imagenet_classes.txt', 'r') as f:
class_id_to_key = f.readlines()
class_id_to_key = [x.strip() for x in class_id_to_key]
# Make predictions
output = model(input) # size(1, 1000)
max, argmax = output.data.squeeze().max(0)
class_id = argmax[0]
class_key = class_id_to_key[class_id]
classname = key_to_classname[class_key]
print(path_img, 'is a', classname)
- See also test/imagenet.py to evaluate pretrained models on imagenet.
Model | Version | Acc@1 | Acc@5 |
---|---|---|---|
NASNet-A-Large | Tensorflow | 82.693 | 96.163 |
NASNet-A-Large | Our porting | 82.566 | 96.086 |
InceptionResNetV2 | Tensorflow | 80.4 | 95.3 |
InceptionV4 | Tensorflow | 80.2 | 95.3 |
InceptionResNetV2 | Our porting | 80.170 | 95.234 |
InceptionV4 | Our porting | 80.062 | 94.926 |
ResNeXt101_64x4d | Torch7 | 79.6 | 94.7 |
ResNeXt101_64x4d | Our porting | 78.956 | 94.252 |
ResNeXt101_32x4d | Torch7 | 78.8 | 94.4 |
ResNet152 | Pytorch | 78.428 | 94.110 |
ResNeXt101_32x4d | Our porting | 78.188 | 93.886 |
FBResNet152 | Torch7 | 77.84 | 93.84 |
DenseNet161 | Pytorch | 77.560 | 93.798 |
FBResNet152 | Our porting | 77.386 | 93.594 |
InceptionV3 | Pytorch | 77.294 | 93.454 |
DenseNet201 | Pytorch | 77.152 | 93.548 |
ResNet101 | Pytorch | 77.438 | 93.672 |
DenseNet169 | Pytorch | 76.026 | 92.992 |
ResNet50 | Pytorch | 76.002 | 92.980 |
DenseNet121 | Pytorch | 74.646 | 92.136 |
VGG19_BN | Pytorch | 74.266 | 92.066 |
ResNet34 | Pytorch | 73.554 | 91.456 |
BNInception | Caffe | 73.522 | 91.560 |
VGG16_BN | Pytorch | 73.518 | 91.608 |
VGG19 | Pytorch | 72.080 | 90.822 |
VGG16 | Pytorch | 71.636 | 90.354 |
VGG13_BN | Pytorch | 71.508 | 90.494 |
VGG11_BN | Pytorch | 70.452 | 89.818 |
ResNet18 | Pytorch | 70.142 | 89.274 |
VGG13 | Pytorch | 69.662 | 89.264 |
VGG11 | Pytorch | 68.970 | 88.746 |
SqueezeNet1_1 | Pytorch | 58.250 | 80.800 |
SqueezeNet1_0 | Pytorch | 58.108 | 80.428 |
Alexnet | Pytorch | 56.432 | 79.194 |
Note: the Pytorch version of ResNet152 is not a porting of the Torch7 but has been retrained by facebook.
Beware, the accuracy reported here is not always representative of the transferable capacity of the network on other tasks and datasets. You must try them all! :P
Download the ImageNet dataset and move validation images to labeled subfolders
python test/imagenet.py /local/data/imagenet_2012/images --arch resnext101_32x4d -e
Source: TensorFlow Slim repo
nasnetlarge(num_classes=1000, pretrained='imagenet')
nasnetlarge(num_classes=1001, pretrained='imagenet+background')
Source: Torch7 repo of FaceBook
There are a bit different from the ResNet* of torchvision. ResNet152 is currently the only one available.
fbresnet152(num_classes=1000, pretrained='imagenet')
Source: TensorFlow Slim repo and Pytorch/Vision repo for inceptionv3
inceptionresnetv2(num_classes=1000, pretrained='imagenet')
inceptionresnetv2(num_classes=1001, pretrained='imagenet+background')
inceptionv4(num_classes=1000, pretrained='imagenet')
inceptionv4(num_classes=1001, pretrained='imagenet+background')
inceptionv3(num_classes=1000, pretrained='imagenet')
Source: Trained with Caffe by Xiong Yuanjun
bninception(num_classes=1000, pretrained='imagenet')
Source: ResNeXt repo of FaceBook
resnext101_32x4d(num_classes=1000, pretrained='imagenet')
resnext101_62x4d(num_classes=1000, pretrained='imagenet')
Source: Pytorch/Vision repo
(inceptionv3
included in Inception*)
resnet18(num_classes=1000, pretrained='imagenet')
resnet34(num_classes=1000, pretrained='imagenet')
resnet50(num_classes=1000, pretrained='imagenet')
resnet101(num_classes=1000, pretrained='imagenet')
resnet152(num_classes=1000, pretrained='imagenet')
densenet121(num_classes=1000, pretrained='imagenet')
densenet161(num_classes=1000, pretrained='imagenet')
densenet169(num_classes=1000, pretrained='imagenet')
densenet201(num_classes=1000, pretrained='imagenet')
squeezenet1_0(num_classes=1000, pretrained='imagenet')
squeezenet1_1(num_classes=1000, pretrained='imagenet')
alexnet(num_classes=1000, pretrained='imagenet')
vgg11(num_classes=1000, pretrained='imagenet')
vgg13(num_classes=1000, pretrained='imagenet')
vgg16(num_classes=1000, pretrained='imagenet')
vgg19(num_classes=1000, pretrained='imagenet')
vgg11_bn(num_classes=1000, pretrained='imagenet')
vgg13_bn(num_classes=1000, pretrained='imagenet')
vgg16_bn(num_classes=1000, pretrained='imagenet')
vgg19_bn(num_classes=1000, pretrained='imagenet')
Once a pretrained model has been loaded, you can use it that way.
Important note: All image must be loaded using PIL
which scales the pixel values between 0 and 1.
Attribut of type list
composed of 3 numbers:
- number of color channels,
- height of the input image,
- width of the input image.
Example:
[3, 299, 299]
for inception* networks,[3, 224, 224]
for resnet* networks.
Attribut of type str
representating the color space of the image. Can be RGB
or BGR
.
Attribut of type list
composed of 2 numbers:
- min pixel value,
- max pixel value.
Example:
[0, 1]
for resnet* and inception* networks,[0, 255]
for bninception network.
Attribut of type list
composed of 3 numbers which are used to normalize the input image (substract "color-channel-wise").
Example:
[0.5, 0.5, 0.5]
for inception* networks,[0.485, 0.456, 0.406]
for resnet* networks.
Attribut of type list
composed of 3 numbers which are used to normalize the input image (divide "color-channel-wise").
Example:
[0.5, 0.5, 0.5]
for inception* networks,[0.229, 0.224, 0.225]
for resnet* networks.
/!\ work in progress (may not be available)
Method which is used to extract the features from the image.
Example when the model is loaded using fbresnet152
:
print(input_224.size()) # (1,3,224,224)
output = model.features(input_224)
print(output.size()) # (1,2048,1,1)
# print(input_448.size()) # (1,3,448,448)
output = model.features(input_448)
# print(output.size()) # (1,2048,7,7)
/!\ work in progress (may not be available)
Method which is used to classify the features from the image.
Example when the model is loaded using fbresnet152
:
output = model.features(input_224)
output = output.view(1,-1)
print(output.size()) # (1,2048)
output = model.classif(output)
print(output.size()) # (1,1000)
Method used to call model.features
and model.classif
. It can be overwritten as desired.
Important note: A good practice is to use model.__call__
as your function of choice to forward an input to your model. See the example bellow.
# Without model.__call__
output = model.forward(input_224)
print(output.size()) # (1,1000)
# With model.__call__
output = model(input_224)
print(output.size()) # (1,1000)
th pretrainedmodels/fbresnet/resnet152_dump.lua
python pretrainedmodels/fbresnet/resnet152_load.py
https://github.com/clcarwin/convert_torch_to_pytorch
https://github.com/Cadene/tensorflow-model-zoo.torch
Thanks to the deep learning community and especially to the contributers of the pytorch ecosystem.