Skip to content

BesanHalwa/Modified-YOLO

Repository files navigation

Modified-YOLO

We take inspiration from convolution operation and present neighbourhood block subtraction technique. Convolution is the weighted summation of neighbourhood, used to extract certain specific features in image (depending the convolutional kernel), our neighbourhood block subtraction enhances the corner and borders in the image while suppressing the background noise in the image.

We train YOLOv3 as Modified YOLOv3 with neighbourhood block subtraction on PASCAL VOC 2007 train and val data and 2012 train data. We test our model on PASCAL VOC 2007 test data. Our model archives 73% mAP with 0.25 threshold IOU and 64.02% mAP with threshold 0.5 IOU. When compared with the original implementation of YOLOv3 trained on MS COCO our modified YOLOv3 archives 3.4% higher mAP with negligible difference in fps. Our modified YOLOv3 out performs almost all the entries of the Pascal VOC 2007 challenge. In 12 of the 20 classes our model exceeds the best entry the 2007 challenge.

We believe that neighbourhood block subtraction has the potential to be used as a feature detector in classification related tasks.

Modified YOLO Architecture

We add neighbourhood block subtraction of block size 4X4 and pre process the train and test data.

Training Log

We start training our YOLO model with darknet53conv74 weights.
Hyper parameters settings are as follows:
batch = 64,
subdivisions = 16,
height = 416, width = 416, channels = 3,
momentum = 0.9,
decay = 0.0005,
saturation = 1.5,
exposure = 1.5, hue = 0.1,
learning rate = 0.001,
maxbatches = 50200.
We trained for 50200 iterations on Pascal VOC 2007 tarin and val, 2012 train data, this took us approximately 65 hours on Nvidia GTX 1080 Ti.
We validated our model on Pascal VOC 2007 test set.
We build the darknet with GPU = 1, CUDNN = 1, OPENCV = 0, OPENMP = 0, DEBUG = 0.

Training log can be obtained from Google Drive Training log

Results

Result 0

On Pascal VOC 2007 test we achieve 63.93% mAP at 0.5 IOU thresh and 74% mAP with 0.25 IOU.

Class wise mAP precision are as follows

class_id = 0, name = aeroplane, ap = 72.05% (TP = 188, FP = 34)
class_id = 1, name = bicycle, ap = 75.63% (TP = 240, FP = 37)
class_id = 2, name = bird, ap = 54.69% (TP = 225, FP = 69)
class_id = 3, name = boat, ap = 49.49% (TP = 125, FP = 76)
class_id = 4, name = bottle, ap = 35.34% (TP = 151, FP = 124)
class_id = 5, name = bus, ap = 75.93% (TP = 147, FP = 53)
class_id = 6, name = car, ap = 74.70% (TP = 833, FP = 187)
class_id = 7, name = cat, ap = 76.18% (TP = 259, FP = 86)
class_id = 8, name = chair, ap = 47.52% (TP = 341, FP = 307)
class_id = 9, name = cow, ap = 68.70% (TP = 167, FP = 86)
class_id = 10, name = diningtable, ap = 60.45% (TP = 117, FP = 75)
class_id = 11, name = dog, ap = 72.58% (TP = 346, FP = 180)
class_id = 12, name = horse, ap = 81.63% (TP = 268, FP = 97)
class_id = 13, name = motorbike, ap = 76.99% (TP = 225, FP = 66)
class_id = 14, name = person, ap = 71.15% (TP = 3012, FP = 739)
class_id = 15, name = pottedplant, ap = 31.79% (TP = 142, FP = 91)
class_id = 16, name = sheep, ap = 56.86% (TP = 146, FP = 119)
class_id = 17, name = sofa, ap = 66.59% (TP = 152, FP = 84)
class_id = 18, name = train, ap = 71.41% (TP = 195, FP = 58)
class_id = 19, name = tvmonitor, ap = 58.94% (TP = 175, FP = 55)

for thresh = 0.25, precision = 0.74, recall = 0.62, F1-score = 0.67
for thresh = 0.25, TP = 7454, FP = 2623, FN = 4578, average IoU = 57.08 %

IoU threshold = 50 %, used Area-Under-Curve for each unique Recall
mean average precision ([email protected]) = 0.639302, or 63.93 %

loss for iterations 1 to 50200
loss for iterations 1 to 50200

loss for iterations 150 to 50200
loss for iterations 150 to 50200

loss for iterations 10000 to 50200
loss for iterations 10000 to 50200

loss for iterations 20000 to 50200
loss for iterations 20000 to 50200

Result 1

On Pascal VOC 2007 test we achieve 64.02% mAP at 0.5 IOU thresh and 73% mAP with 0.25 IOU.

Class wise mAP precision are as follows

class_id = 0, name = aeroplane, ap = 75.32% (TP = 199, FP = 35)
class_id = 1, name = bicycle, ap = 75.19% (TP = 231, FP = 42)
class_id = 2, name = bird, ap = 55.15% (TP = 225, FP = 76)
class_id = 3, name = boat, ap = 50.42% (TP = 138, FP = 92)
class_id = 4, name = bottle, ap = 36.54% (TP = 154, FP = 88)
class_id = 5, name = bus, ap = 73.14% (TP = 154, FP = 70)
class_id = 6, name = car, ap = 74.03% (TP = 828, FP = 156)
class_id = 7, name = cat, ap = 77.94% (TP = 262, FP = 79)
class_id = 8, name = chair, ap = 45.78% (TP = 333, FP = 360)
class_id = 9, name = cow, ap = 66.06% (TP = 159, FP = 109)
class_id = 10, name = diningtable, ap = 59.19% (TP = 116, FP = 71)
class_id = 11, name = dog, ap = 72.63% (TP = 345, FP = 171)
class_id = 12, name = horse, ap = 81.56% (TP = 269, FP = 102)
class_id = 13, name = motorbike, ap = 76.66% (TP = 227, FP = 77)
class_id = 14, name = person, ap = 70.90% (TP = 3031, FP = 752)
class_id = 15, name = pottedplant, ap = 32.35% (TP = 153, FP = 104)
class_id = 16, name = sheep, ap = 56.96% (TP = 141, FP = 110)
class_id = 17, name = sofa, ap = 68.81% (TP = 153, FP = 88)
class_id = 18, name = train, ap = 73.14% (TP = 195, FP = 49)
class_id = 19, name = tvmonitor, ap = 58.58% (TP = 164, FP = 65)

for thresh = 0.25, precision = 0.73, recall = 0.62, F1-score = 0.67
for thresh = 0.25, TP = 7477, FP = 2696, FN = 4555, average IoU = 56.73 %

IoU threshold = 50 %, used Area-Under-Curve for each unique Recall
mean average precision ([email protected]) = 0.640169, or 64.02 %

Weight files

Weight file for Result 0 Weight

Weight file for Result 1 Weight

Project Report

Project Report on Modified-YOLO

Result Comparison

Comparison of our modified model with entries of VOC 2007 challenge. The highlighted entries are our results. Bold and underlined entries represent the best result in class. Out of 20 classes in Pascal Voc, our model perform produced highest mAP for 12 classes.

Result Comparison

Possible Improvements

1. Neighbourhood Block Subtraction:

The neighbourhood block subtraction gives us hope for potential research direction. We would try out blocks of varying sizes and compare the results, this will give us an idea about the ideal size of the block. We also would like to try out consecutive neighbourhood block subtraction layer by layer. As of now we sequentially perform the block subtraction, however this process could be more efficiently realised (in terms of time) by parallel execution with cuda. One other implementation we would like to make is include the block subtraction mechanism in the model architecture itself. We would also test this approach with other detection architectures like SSD and fast R-CNN. More experimentation will help us to develop more reasoning for the process.

2. Spatial Transformations

3. Spatial Transformer Networks: Paper

4. Inverse Compositional Spatial Transformer Networks: Paper