This text detector acts as text localization and uses the structure of RetinaNet and applies the techniques used in textboxes++.
SynthText
[raw data & tfrecord](https://drive.google.com/drive/folders/1Nj07w3DEL95R3qaIJl8qv6Z9pRb2H405?usp=sharing)
```
cd text_detector/sample/SynthText
python3 train.py --train_dataset="/path/to/tfrecord/"
```
balloon from Mask_RCNN
[raw data & tfrecord](https://drive.google.com/drive/folders/1lUrDCWLtj2oL78SRIgwgwtIl1iA6CuHT?usp=sharing)
```
cd text_detector/sample/balloon
python3 train.py --train_dataset="/path/to/tfrecord/"
```
- SSD structure is used, and vertical offset is added to make bbox proposal.
- The structure is the same as TextBoxes, but the offset for the QuadBox has been added.
- 4d-anchor box(xywh) offset -> (4+8)-d anchor box(xywh + x0y0x1y1x2y2x3y3) offset
- last conv : 3x5 -> To have a receptive field optimized for the quad box
- Simple one-stage object detection and good performance
- FPN (Feature Pyramid Network) allows various levels of features to be used.
- output : 1-d score + 4-d anchor box offset
- cls loss = focal loss, loc loss = smooth L1 loss
- Define anchor boxes for each grid.
- Obtain the IoU between the GT box and the anchor box.
- Each anchor box is assigned to the largest GT box with IoU.
- At this time, IoU> 0.5: Text (label = 1) / 0.4 <IoU <0.5: Ignore (label = -1) / IoU <0.4: non-text (label = 0).
- Training
- Training Code
- Model Save
- Step Decay Learning Rate
- Multiple GPU
- Make Data
- Make SynthText tfrecord
- Make ICDAR13 tfrecord
- Make ICDAR15 tfrecord
- Make toy dataset(balloon) from Mask_RCNN
- Network
- ResNet50,ResNet101
- Feature Pyramid Network
- Task Specific Network
- Trainable BatchNorm (?
- Freeze BatchNorm (?
- GroupNorm
- (binary) focal loss
- Slim Backbone pretrained weight
- Utils
- Add vertical offset
- Validation infernece image visualization using Tensorboard
- Add augmentation
- Add evaluation code (mAP) ==> Unstable
- QUAD version NMS (numpy version)
- Combine two NMS method as paper describe
- Visualization
- os : Ubuntu 16.04.4 LTS
- GPU : Nvidia GTX 1080ti (12GB)
- Python : 3.6.6
- Tensorflow : 1.4.0
- Polygon