Text Detector for OCR

This text detector acts as text localization and uses the structure of RetinaNet and applies the techniques used in textboxes++.

Train

SynthText

[raw data & tfrecord](https://drive.google.com/drive/folders/1Nj07w3DEL95R3qaIJl8qv6Z9pRb2H405?usp=sharing) ``` cd text_detector/sample/SynthText python3 train.py --train_dataset="/path/to/tfrecord/" ```

balloon from Mask_RCNN

[raw data & tfrecord](https://drive.google.com/drive/folders/1lUrDCWLtj2oL78SRIgwgwtIl1iA6CuHT?usp=sharing) ``` cd text_detector/sample/balloon python3 train.py --train_dataset="/path/to/tfrecord/" ```

TextBoxes++

SSD structure is used, and vertical offset is added to make bbox proposal.
The structure is the same as TextBoxes, but the offset for the QuadBox has been added.
4d-anchor box(xywh) offset -> (4+8)-d anchor box(xywh + x0y0x1y1x2y2x3y3) offset
last conv : 3x5 -> To have a receptive field optimized for the quad box

RetinaNet

Simple one-stage object detection and good performance
FPN (Feature Pyramid Network) allows various levels of features to be used.
output : 1-d score + 4-d anchor box offset
cls loss = focal loss, loc loss = smooth L1 loss

Encode

Define anchor boxes for each grid.
Obtain the IoU between the GT box and the anchor box.
Each anchor box is assigned to the largest GT box with IoU.
At this time, IoU> 0.5: Text (label = 1) / 0.4 <IoU <0.5: Ignore (label = -1) / IoU <0.4: non-text (label = 0).

Todo list:

Training
- Training Code
- Model Save
- Step Decay Learning Rate
- Multiple GPU
Make Data
- Make SynthText tfrecord
- Make ICDAR13 tfrecord
- Make ICDAR15 tfrecord
- Make toy dataset(balloon) from Mask_RCNN
Network
- ResNet50,ResNet101
- Feature Pyramid Network
- Task Specific Network
- Trainable BatchNorm (?
- Freeze BatchNorm (?
- GroupNorm
- (binary) focal loss
- Slim Backbone pretrained weight
Utils
- Add vertical offset
- Validation infernece image visualization using Tensorboard
- Add augmentation
- Add evaluation code (mAP) ==> Unstable
- QUAD version NMS (numpy version)
- Combine two NMS method as paper describe
- Visualization

Environment

os : Ubuntu 16.04.4 LTS
GPU : Nvidia GTX 1080ti (12GB)
Python : 3.6.6
Tensorflow : 1.4.0
Polygon

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
network		network
samples		samples
tfrecord		tfrecord
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text Detector for OCR

Train

TextBoxes++

RetinaNet

Encode

Todo list:

Environment

About

Releases

Packages

Languages

world4jason/text_detector

Folders and files

Latest commit

History

Repository files navigation

Text Detector for OCR

Train

TextBoxes++

RetinaNet

Encode

Todo list:

Environment

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages